Learn how to evaluate the quality and reliability of AI healthcare assistants using Maxim’s evaluation suite, ensuring patient safety and clinical reliability.
{{query}}
.Evaluator | Type | Purpose |
---|---|---|
Output Relevance | LLM-as-a-judge | Validates that the generated output is relevant to the input |
Clarity | LLM-as-a-judge | Validates that the generated output is clear and easily understandable |
Vertex Question Answering Helpfulness | LLM-as-a-judge (3rd-Party) | Assesses how helpful the answer is in addressing the question |
Vertex Question Answering Relevance | LLM-as-a-judge (3rd-Party) | Determines how relevant the answer is to the posed question |
Correctness | Human eval | Collects human annotation on the correctness of the information |
Semantic Similarity | Statistical | Validates that the generated output is semantically similar to expected output |