Latest

Post-Training Doesn't Create Your Model's Character. It Inherits One

Post-Training Doesn't Create Your Model's Character. It Inherits One

Introduction Every team building on top of LLMs has a version of the same mental model: pretraining teaches the model what it knows, and post-training teaches it how to behave. Don't want it to say harmful things? Train that out. Want it to be more helpful? Push that

The Attention Arms Race: How Modern Open-Source LLMs Are Reinventing the Transformer's Core

The Attention Arms Race: How Modern Open-Source LLMs Are Reinventing the Transformer's Core

Introduction If you follow the LLM space, you've probably heard a lot about parameter counts, context windows, and benchmark scores. What gets discussed far less often is the mechanism that makes all of it possible: attention. Every major language model (GPT, Llama, Gemini, Qwen, DeepSeek) is built on

Your Base Model Is Smarter Than You Think: And Here's How to Prove It

Your Base Model Is Smarter Than You Think: And Here's How to Prove It

There's a quiet assumption baked into most of the recent excitement around reasoning models: that the impressive gains you see from systems like DeepSeek-R1 or similar RL-trained models come from something genuinely new: novel capabilities that the base model simply didn't have before training. A new

PersonaPlex: Full-Duplex Voice Without the Fixed Persona

PersonaPlex: Full-Duplex Voice Without the Fixed Persona

Introduction Voice AI hit a genuine inflection point when full-duplex models arrived. Systems like Moshi finally cracked the core problem with conversational speech: the awkward cascade of listen, then transcribe, then think, then speak. Full-duplex models [models that listen and speak simultaneously over a continuous audio stream, the same way

Can LLMs Actually Judge Web Development Quality? Spoiler: Not Really

Can LLMs Actually Judge Web Development Quality? Spoiler: Not Really

I recently came across a fascinating paper at ICLR’26 that tackles a question many of us AI developers have been wrestling with: can we trust LLMs to evaluate complex, interactive task? The authors focus on the domain of web development, and the short answer: we've got a

Beyond Autoregression: LLaDA2.1 and the Case for Self-Editing Language Models

Beyond Autoregression: LLaDA2.1 and the Case for Self-Editing Language Models

Introduction Every mainstream large language model today generates text the same way: one token at a time, left to right, no looking back. It works remarkably well, but it has a structural flaw that's easy to overlook until you care about speed at scale. The model can never

Building the Future of Music Education: Yousician’s Journey with Maxim AI

Building the Future of Music Education: Yousician’s Journey with Maxim AI

About Yousician Yousician is the world's leading music education platform, helping over 20 million people learn to play instruments. The company has built one of the world’s largest interactive music learning ecosystems, combining structured lessons, real-time feedback, and practice-driven progression. As Yousician looks ahead, the team is

xMemory: Why Top-k Retrieval Breaks for Agent Memory

xMemory: Why Top-k Retrieval Breaks for Agent Memory

Introduction LLM agents no longer begin and end in a single context window. We’re now in the era of cross-session, long-running agents. Products like Claude Code, OpenClaw, and other agentic workflows are built to carry context across days of work, not minutes. The bottleneck is not context length anymore.

Beyond the Benchmark: Why TruthTensor Might Be the Eval Framework We've Been Missing

Beyond the Benchmark: Why TruthTensor Might Be the Eval Framework We've Been Missing

When was the last time you confidently trusted a benchmark to tell you how an LLM would actually perform in production? The gap between benchmark performance and real-world reliability is significant, and it's a problem that deserves more attention. I recently read through this paper by Inference Labs,

The Skills vs MCP Debate: Understanding Two Layers of the Same Stack

The Skills vs MCP Debate: Understanding Two Layers of the Same Stack

How coding agents reshaped the tool integration landscape and what actually survived We're in an interesting moment for AI's application layer. Agents can now write code (better than most programmers), call APIs, query databases, and orchestrate complex workflows. But the infrastructure underneath - how agents actually

Making Voice Assistants Faster Without Losing Accuracy

Making Voice Assistants Faster Without Losing Accuracy

Have you ever noticed how some voice assistants seem to understand you instantly, while others leave you waiting? That delay isn't random. It's the result of a fundamental trade-off that has plagued speech recognition for years. Systems could either be fast or accurate, but rarely both.

Semantic Highlighting: Making RAG Cheaper Without Compromises

Semantic Highlighting: Making RAG Cheaper Without Compromises

Recent research from the Zilliz team tackles a problem that shows up constantly in production RAG systems: how do you actually show users why a document is relevant to their query? Consider a typical scenario. A user asks: "How can I speed up my Python code?" The vector

Voice Simulation: Testing Voice Agents the Way Users Experience Them

Voice Simulation: Testing Voice Agents the Way Users Experience Them

Introduction Voice is rapidly becoming the next frontier of AI interaction (along with physical AI). As more companies deploy voice agents for customer support, sales, and service operations, the stakes have never been higher. A poorly tested voice agent doesn't just frustrate users - it can damage your

December 2025 - Updates

Logging and observability overhaul, MCP gateway, Evals on file attachments, and more

🎙️ Feature spotlight 🔀 Collaborative conflict resolution for Prompt changes To help teams collaborate on prompts without accidentally overwriting each other’s work, we’ve introduced session conflict resolution in the prompt playground. Here’s what’s new: * You’ll now land on your last active session instead of the prompt’s

The Discipline Layer: Harnesses as the Missing Piece in Autonomous Coding

The Discipline Layer: Harnesses as the Missing Piece in Autonomous Coding

Introduction If you've been working with AI agents on longer tasks, you've probably developed your own tricks for dealing with context window limits. Maybe you hit /summarize in Cursor when things get bloated or you ask the agent to write a summary.md file at the

December 2025 - Updates

Logging and observability overhaul, MCP gateway, Evals on file attachments, and more

🎙️ Feature spotlight 🔀 Collaborative conflict resolution for Prompt changes To help teams collaborate on prompts without accidentally overwriting each other’s work, we’ve introduced session conflict resolution in the prompt playground. Here’s what’s new: * You’ll now land on your last active session instead of the prompt’s

The Discipline Layer: Harnesses as the Missing Piece in Autonomous Coding

The Discipline Layer: Harnesses as the Missing Piece in Autonomous Coding

Introduction If you've been working with AI agents on longer tasks, you've probably developed your own tricks for dealing with context window limits. Maybe you hit /summarize in Cursor when things get bloated or you ask the agent to write a summary.md file at the

Breaking the Context Window: How Recursive Language Models Handle Infinite Input

Breaking the Context Window: How Recursive Language Models Handle Infinite Input

Long-context understanding has been a persistent challenge in language model research. Despite architectural innovations (ALiBi, YaRN, RoPE variants) and massive context window expansions (Claude 3.5 at 200k tokens, GPT-5 at 256k+), models still exhibit performance degradation on long inputs, a phenomenon known as "context rot." The community

Scaling Personalized Sleep Coaching: Rise Science's Journey with Maxim AI

Scaling Personalized Sleep Coaching: Rise Science's Journey with Maxim AI

About Rise Science Rise Science is a sleep management platform that helps people understand and address the root causes of their energy issues. The platform focuses on two key principles: sleep debt (how much sleep a person owes their body) and circadian rhythm (their body’s natural energy schedule). This

Beyond the SDK: Why AI Teams Love HTTP Endpoint-Based Evals

Beyond the SDK: Why AI Teams Love HTTP Endpoint-Based Evals

Since the beginning, HTTP Endpoint-Based Offline Evals have been a core feature of the Maxim platform and a favorite among our users. While our SDKs allow engineers to integrate evaluations directly into their codebase, a purely code-based approach introduces friction, often limiting who can run them and how they are

November 2025 Updates - Maxim AI

✨ Flexible data curation, Cost charts, Reasoning column, and more

🎙️ Feature spotlight 🧩 Fully flexible data curation flows While curating and refining test datasets from logs and test runs, you can now reference and modify any data point from a trace or test run entry; without being limited to predefined fields like input or output. Use Maxim’s DSL in the

Streaming Speech Synthesis Without the Trade-offs: Meet StreamFlow

Streaming Speech Synthesis Without the Trade-offs: Meet StreamFlow

The last few years of neural speech synthesis have been wild. Flow matching models, diffusion transformers, and insanely natural TTS systems keep raising the bar. The catch? Most of these models expect to see the whole audio sequence at once . While this is great for offline generation, but if you’

Building a Customer Support AI Agent with AWS Bedrock and Testing It at Scale

Building a Customer Support AI Agent with AWS Bedrock and Testing It at Scale

Introduction Customer support is one of the most impactful use cases for AI agents. A well-designed support agent can handle thousands of inquiries simultaneously, provide instant responses, and maintain context across complex conversations. But how do you ensure your agent actually works before unleashing it on real customers? In this

What are Offline Evaluations and How to Set Them Up for Your AI System Using Maxim AI

What are Offline Evaluations and How to Set Them Up for Your AI System Using Maxim AI

Introduction Before deploying your AI system to production, you need confidence that it performs well across various scenarios, maintains quality standards, and produces consistent results. This is where offline evaluations become essential. Offline evaluations use curated datasets, scenario simulations, and evaluators to benchmark prompts, workflows, and agents before deployment. They

Basics of AI Observability: Sessions, Traces, and Spans

Basics of AI Observability: Sessions, Traces, and Spans

Observability in AI applications differs fundamentally from traditional application monitoring. While conventional systems deal with deterministic request-response cycles, AI applications involve multi-turn conversations, complex reasoning chains, multiple model invocations, and retrieval operations - all of which need visibility for debugging, optimization, and understanding system behavior. Maxim's observability platform

VITA-Audio: Making AI Voice Assistants Actually Feel Instant

VITA-Audio: Making AI Voice Assistants Actually Feel Instant

Have you ever noticed that frustrating pause when you ask your voice assistant a question? You speak, there's a beat of silence, and then it finally starts responding. That delay might seem minor, but it breaks the natural flow of conversation and reminds you that you're

Kimi K2 Thinking: Engineering Deep Reasoning at Scale

Kimi K2 Thinking: Engineering Deep Reasoning at Scale

Introduction Moonshot AI recently open-sourced Kimi K2 and its reasoning-optimized variant, K2 Thinking. As someone who works with large language models, I wanted to break down what makes this release interesting and where it pushes forward the state of open-source AI. K2 Thinking is a 1-trillion parameter model that can

AgentFold : What If AI Agents Managed Memory Like Humans Do?

AgentFold : What If AI Agents Managed Memory Like Humans Do?

Introduction If you've spent time working with LLM agents for web research, coding assistance in cursor or even extended conversations in ChatGPT, you've probably noticed something: as tasks or multi turn conversations grow longer and more complex, the quality of responses deteriorates - essentially because of

FastLongSpeech: 30x Compression That Doesn't Murder Your Context

FastLongSpeech: 30x Compression That Doesn't Murder Your Context

Speech models are having a moment and seems like they’re here to stay. They can transcribe your rambling, understand your questions, and even tell when you're being sarcastic. But ask them to process anything longer than a TikTok video and they straight-up collapse. The problem? Speech can

What are Online Evaluations and How to Set Them Up for Your AI System Using Maxim AI

What are Online Evaluations and How to Set Them Up for Your AI System Using Maxim AI

Introduction Building an LLM-powered application is one thing; ensuring it performs optimally in production is another challenge entirely. In this blog we will go deeper into Online Evaluations & setting them up for production usecases. Let's start by understanding the difference between Online & Offline Evals. Online vs.