Documentation Index
Fetch the complete documentation index at: https://gtm-resouces.getmaxim.ai/docs/llms.txt
Use this file to discover all available pages before exploring further.
- Balance auto-evals with the last mile of human reviews: While LLM-judges or programmatic evals provide scale, human evaluations capture nuanced quality signals that auto evals might miss.
- Curate golden datasets: Human-annotated datasets are key to defining what “good” means for your specific use case, forming the foundation for effective offline evaluation.
- Align LLM judges: LLM judges must be aligned with human preferences continuously to ensure they are tuned to your agent-specific outcomes.