Skip to main content

Documentation Index

Fetch the complete documentation index at: https://gtm-resouces.getmaxim.ai/docs/llms.txt

Use this file to discover all available pages before exploring further.

  • Balance auto-evals with the last mile of human reviews: While LLM-judges or programmatic evals provide scale, human evaluations capture nuanced quality signals that auto evals might miss.
  • Curate golden datasets: Human-annotated datasets are key to defining what “good” means for your specific use case, forming the foundation for effective offline evaluation.
  • Align LLM judges: LLM judges must be aligned with human preferences continuously to ensure they are tuned to your agent-specific outcomes.