A
Agent Observability
AAgent observability is the practice of monitoring, tracing, and analyzing the internal states, decision-making processes, and outputs of AI agents in real-time including any interactions that the AI agent may have with large language models and external tools.
Additional Resources:
Additional Resources:
Agentic workflow
AAn agentic workflow is an AI-driven process in which autonomous AI agents make decisions, call external tools, execute tasks, and coordinate actions with minimal human intervention to achieve specific goals. These workflows employ detailed instructions and operational sequences to solve complex tasks across diverse domains.
Additional Resources:
Additional Resources:
AI Gateway
AAn AI Gateway is a specialized middleware layer designed explicitly for managing and securing interactions with Large Language Models and other AI-powered services. It sits between applications and AI models (like ChatGPT, Gemini, Claude), handling essential functions such as request routing, authentication, authorization, rate limiting, and traffic monitoring.
Key Capabilities:
Key Capabilities:
- Unified API Abstraction: Provides a single, consistent interface to access multiple AI providers, preventing vendor lock-in and simplifying integration.
- Security and Compliance: Enforces managed identities and PII (Personally Identifiable Information) redaction to prevent unauthorized data exposure.
- Traffic & Policy Management: Controls costs and performance through rate limiting, smart request routing, and automated model fallbacks.
- Observability & Cost Governance: Centralizes the tracking of token usage, quotas, and error rates to provide clear oversight of AI spend.
- Model Orchestration: Centralizes management of multiple AI models including access controls, monitoring, and deployment.
Audit Logs
AAudit logs are chronological records that document “who, what, and when” regarding activities within a system. These logs capture events, timestamps, responsible users or services, and the specific entities affected, creating an audit trail for transparency and accountability.
Additional Resources:
Additional Resources:
C
CI/CD
CCI/CD is the combined practices of continuous integration (CI) and continuous delivery (CD). It uses automation to bridge the gaps between development and deployments, allowing teams to release high-quality code faster and more reliably.
Additional Resources:
Additional Resources:
Cosine Similarity
CCosine similarity is a mathematical metric used to measure how similar two vectors are, regardless of their size. It calculates the cosine of the angle between them, where a value of 1 means the vectors are identical in direction, and 0 means they are completely dissimilar.
D
Distributed Tracing
DDistributed tracing is a method for tracking a single request as it moves through a distributed system, recording every step between the initial request and its final response. It captures timing, relationships, errors, and events across services, helping debug, monitor, and evaluate complex workflows. Each trace contains spans (logical units of work) that reveal the detailed path of a request.
Core components:
Core components:
- Traces: The end-to-end record of a single request’s journey through the system.
- Spans: The fundamental unit of a trace representing a specific operation (e.g., an API call, a database query, or an LLM generation).
- Event: Events mark significant points within a span or a trace recording instantaneous occurrences that provide additional context for understanding system behavior.
Drift
DIn LLMs, drift refers to changes in the text distribution compared to the model’s initial training data, causing the training data to become less representative of real-world usage over time. This phenomenon results in degraded model performance, producing less coherent, accurate, or contextually relevant outputs.
Mitigation:
Mitigation:
- Performance Monitoring: Comparing model predictions against “ground truth” (actual outcomes) to identify deviations.
- Quality Metrics: To detect concept drift, monitor model quality using metrics such as accuracy or mean error.
- Observability Platforms: Utilizing tools like Maxim to track model metrics in real-time and automate the re-evaluation of production data.
F
F1 score
FThe F1 score is a the harmonic mean of precision and recall, providing a single metric to evaluate a model’s accuracy more holistically than simple accuracy scores. It is especially useful for understanding model performance on imbalanced datasets where one class is much more frequent than another.
How is it calculated:
How is it calculated:

G
Guardrails
GGuardrails are safety mechanisms designed to ensure artificial intelligence applications, particularly large language models (LLMs), deliver trustworthy outputs while protecting against vulnerabilities such as harmful content, sensitive data exposure, and malicious prompt engineering practices like jailbreaking or injection attacks.
Types of Guardrails:
Types of Guardrails:
- Appropriateness Guardrails: Check if the content generated by AI agent is toxic, harmful, biased, or based on stereotypes and filter out any such inappropriate content before it reaches customers
- Hallucination Guardrails: Ensure that AI-generated content doesn’t contain information that is factually wrong or misleading
- Regulatory Compliance Guardrails: Validate that generated content meets regulatory requirements, whether those requirements are general or specific to the industry or use case.
- Security Guardrails: Ensure the app complies with laws and regulations, handling personal data and protecting individuals’ rights.
H
Hallucination
HAI hallucinations occur when a large language model (LLM) perceives patterns or objects that are nonexistent, creating nonsensical or inaccurate outputs. Large language models are prone to hallucinations, generating plausible yet nonfactual content, which raises significant concerns over the reliability of LLMs in real-world.
Detection and mitigation:
Detection and mitigation:
- Automated Evaluations: Utilizing scalable frameworks such as “LLM-as-a-Judge,” statistical metrics, and reference-based scoring to identify non-factual content.
- Human-in-the-loop evaluations: Leveraging domain experts to validate outputs for contextual relevance and subjective accuracy that automated tools might miss.
- Real-Time Monitoring and Observability: Implementing continuous monitoring of production logs and agent traces.
L
Latency
LLatency is a measurement of delay in a system. Network latency is the amount of time it takes for data to travel from one point to another across a network.
A network with high latency will have slower response times, while a low-latency network will have faster response times.
A network with high latency will have slower response times, while a low-latency network will have faster response times.
M
MCP server
MMCP server is the external service that provides context, data, or capabilities to the LLM using a standardized protocol. It acts as a bridge between the AI and external systems such as databases, local files, or web APIs translating complex data into a format the LLM can immediately process and use.
N
Noisy data
NNoise is random, irrelevant, or corrupted information within a dataset that obscures the underlying patterns or signals an AI model is trying to learn. In machine learning, noisy data introduces statistical noise, leading to unpredictable model behavior, and reduced accuracy.
Non Determinism
NIn computer science, non-determinism describes a system or algorithm that can produce different outputs or exhibit different behaviors across multiple runs, even when provided with the same input.
P
PII
PPersonal data, also known as personal information or personally identifiable information (PII), is any information related to an identifiable person. In the context of AI, protecting PII is a critical requirement for regulatory compliance (such as GDPR, HIPAA, and CCPA) and for maintaining user trust.
Precision
PPrecision (also called positive predictive value) is the fraction of relevant instances among the retrieved instances. Mathematically, precision is defined as the number of true positives (Tp) over the number of true positives plus the number of false positives (Fp). Perfect precision, indicated by a value of 1, means that every object identified as positive was classified correctly and no false positives exist.
How is it calculated:
How is it calculated:

Prompt Engineering
PPrompt engineering is the practice of writing clear, effective instructions that guide AI models to produce accurate, consistent, and useful outputs.
Additional Resources:
Additional Resources:
- See how you can do prompt experimentation with Maxim.
Prompt Injection
PPrompt injection is a cybersecurity exploit in which adversaries craft inputs that appear legitimate but are designed to cause unintended behavior in machine learning models, particularly large language models (LLMs).
Additional Resources:
Additional Resources:
R
R Squared
RIn statistics, the Coefficient of Determination (R Squared), is the proportion of the variation in the dependent variable that is predictable from the independent variable(s).
How is it calculated:
How is it calculated:

Recall
RRecall (also known as sensitivity) is the fraction of relevant instances that were retrieved. Perfect recall, indicated by a value of 1, means that every relevant observation was identified as such and no positives were ignored.
How is it calculated:
How is it calculated:

S
Semantic Caching
SA semantic cache provides a way for you to use prior user prompts and LLM completions to address similar user prompts using vector similarity search. A semantic cache can reduce latency and save costs in your GenAI applications as making calls to LLMs is often the most costly and highest latency service in such applications.
Stress-testing
SStress testing is a form of deliberately intense or thorough testing, used to determine the stability of a given system, critical infrastructure or entity. It involves testing beyond normal operational capacity, often to a breaking point, in order to observe the results.
T
Throughput
TThroughput refers to the rate of message delivery over a communication channel in a communication network, such as TCP/IP. Throughput is usually measured in bits per second (bit/s, sometimes abbreviated bps), and sometimes in packets per second (p/s or pps) or data packets per time slot.
V
Vector Embedding
VVector embeddings are numerical representations of data points that translate complex information including non-mathematical data like words, images, and audio into arrays of numbers. This allows machine learning models to process and understand the semantic relationships between different pieces of data in a high-dimensional space.