Measure the quality of your RAG pipeline
Retrieval quality directly impacts the quality of output from your AI application. While testing prompts, Maxim allows you to connect your RAG pipeline via a simple API endpoint and evaluates the retrieved context for every run. Context specific evaluators for precision, recall and relevance make it easy to see where retrieval quality is low.
Fetch retrieved context while running prompts
To mimic the real output that your users would see when sending a query, it is necessary to consider what context is being retrieved and fed to the LLM. To make this easier in Maxim’s playground, we allow you to attach the Context Source and fetch the relevant chunks. Follow the steps given below to use context in the prompt playground.
Create a Context Source
Create a new Context Source in the Library of type API.
Configure your RAG endpoint
Set up the API endpoint of your RAG pipeline that provides the response of the final chunks for any given input.
Add context variable to your prompt
Reference a variable {{context}}
in your prompt to provide instructions on using this dynamic data.
Link the Context Source
Connect the Context Source as the dynamic value of the context variable in the variables table.
Test with real-time retrieval
Run your prompt to see the retrieved context that is fetched for that input.
Test different inputs iteratively and make improvements to your RAG pipeline’s performance.
Evaluate retrieval at scale
While the playground experience allows you to experiment and debug when retrieval is not working well, it is important to do this at scale across multiple inputs and with a set of defined metrics. Follow the steps given below to run a test and evaluate context retrieval.
Initiate prompt testing
Click on test for a prompt that has an attached context (as explained in the previous section).
Select your test dataset
Select your dataset which has the required inputs.
Choose context evaluation source
For the context to evaluate
, select the dynamic Context Source
Add retrieval quality evaluators
Select context specific evaluators - e.g. Context recall, context precision or context relevance and trigger the test
Review retrieved context results
Once the run is complete, the retrieved context column will be filled for all inputs.
Examine detailed chunk information
View complete details of retrieved chunks by clicking on any entry.
Analyze evaluator feedback
Evaluator scores and reasoning for every entry can be checked under the evaluation
tab. Use this to debug retrieval issues.
By running experiments iteratively as you are making changes to your AI application, you can check for any regressions in the retrieval pipeline and continue to test for new test cases.