Building the Future of Music Education: Yousician’s Journey with Maxim AI

Building the Future of Music Education: Yousician’s Journey with Maxim AI

About Yousician

Yousician is the world's leading music education platform, helping over 20 million people learn to play instruments. The company has built one of the world’s largest interactive music learning ecosystems, combining structured lessons, real-time feedback, and practice-driven progression.

As Yousician looks ahead, the team is actively reimagining the future of music education by moving beyond fixed syllabi and one-size-fits-all learning paths toward more adaptive, personalized experiences.

Building the AI Guitar Teacher

To push music education forward, Yousician is developing an AI-powered Guitar Teacher - a conversational, adaptive music tutor that learners can interact with naturally.

At the core of this experience is a system designed to:

  • Understand a learner’s musical goals and current skill level
  • Adapt guidance dynamically instead of following a rigid syllabus
  • Provide contextual, personalized instruction during practice
  • Pull up music-specific tools like the guitar tuner and song exploration when needed

The AI Guitar Teacher is already live in the App Store and is being rolled out incrementally, allowing the team to learn from real-world usage while continuing to refine the experience.

Challenge

As the team began scaling the AI Guitar Teacher, they faced significant hurdles in maintaining quality across complex, non-deterministic conversations:

  • Prompt quality was hard to measure across models and scenarios. The team needed a way to evaluate whether prompt changes actually improved behavior consistently, and to compare how different model versions performed with the same prompt.
  • Small changes caused unexpected regressions. Improvements for one use case often introduced failures in many others, making prompt iteration risky without systematic evaluation.
  • Prompt iteration revealed engineering bottlenecks. Testing changes required backend deployments, slowing experimentation and reducing iteration velocity.
  • The existing tools they used didn’t match the problem. Most platforms focused on training or evaluating custom models, rather than assessing the combined behavior of prompts, models, and tools.
  • Limited visibility into production behavior. Without structured logging and debugging, it was difficult to understand how the AI performed with real users and why it didn’t deliver the wow-factor.

This made it clear that the team needed a more systemic approach that could evaluate prompts, models, and real-world AI behavior.

Yousician x Maxim: From Intuition to Science

To transition from a "feelings-based" workflow to a reliable engineering process, Yousician adopted Maxim as the quality and observability layer for their AI workflows. This transition allowed the team to rigorously test their prompt logic against real-world musical scenarios before shipping to production, effectively transforming their development approach to a more scientific, data-driven one.

Dataset-Driven Evaluation

Yousician maintains an evolving "golden dataset" in Maxim containing realistic user queries that the AI Guitar Teacher needs to handle. This includes edge cases like users requesting chords for songs that don't have any, asking technique questions, or wanting to tune their guitar mid-conversation.

Using Maxim's Datasets and Evaluators, the team runs regression tests on every prompt update. If a change improves one scenario but breaks others, the evaluations catch it before deployment.

"Maxim turned our workflow from a feelings-based approach to a scientific one. It's not just about thinking a prompt is better; it's about knowing it works in 99% of cases before we deploy.”

– Adam Kapos, Lead Architect, Yousician

Prompt CMS for Cross-Functional Ownership

Previously, improving prompts required engineering involvement for every change. Using Maxim's Prompt CMS, Yousician separated prompt logic from code. Non-technical team members can now iterate on the AI's persona, instructions, and tool descriptions without touching code.

"We have non-technical people who can now improve our prompts or try out new descriptions for tools. It allows us to simulate conversations and iterate without deploying new code."

The team organizes multiple prompts in Maxim, each optimized for a specific purpose. The main prompt handles conversational flow and user interactions, while a separate prompt generates song play feedback. By treating each prompt as an independent component, the team can refine one without affecting the others.

Rapid Deployment Without Code Changes

When a prompt passes evaluation, the team deploys directly from Maxim, making changes live to users in seconds. This eliminated the traditional cycle of modifying code, deploying to backend infrastructure, and testing in a separate environment.

Production Logs for Qualitative Insights

Maxim's production logs have been a critical source of improvement opportunities for Yousician. The team regularly reviews real conversations to identify where users didn't get what they expected.

Since the AI Guitar Teacher relies heavily on tool calls rather than just text responses, the team needs visibility into these interactions. When a user says, "I want to tune my guitar," the AI calls the guitar tuner tool. Seeing these tool calls in production logs alongside conversational elements gives the team complete context for understanding user behavior.

When they spot a gap, they anonymize the conversation, add it to their evaluation dataset, and update the system prompt to handle similar cases. For example, when a user requested chords for a song that doesn't have any, the team taught the AI to acknowledge this and offer to reconstruct the melody instead.

This creates a continuous improvement loop: production issue → dataset addition → prompt update → evaluation → deployment.

Observability for Performance Monitoring

Beyond functionality, Yousician uses Maxim's observability suite to monitor costs and latency. The team was initially concerned about expenses as they scaled, but real-time cost tracking showed their implementation was surprisingly efficient.

Latency monitoring proved especially valuable. When response times spiked, the team could immediately see whether the issue was their system or their AI provider. This visibility helps them communicate accurately with stakeholders and understand when problems are likely to resolve themselves.

Impact 

Adopting Maxim has significantly changed how Yousician builds and ships AI features. In the past month alone, the team deployed 60 new versions of their primary system prompt, averaging nearly two prompt updates per day - including during holiday periods. This level of iteration would not have been feasible with traditional deploy-test cycles.

The shift to prompt management in Maxim removed engineering bottlenecks and distributed ownership across the team. Product managers, designers, and growth team members can all improve the AI's behavior as long as evaluations show improvement, creating a culture of continuous optimization.

“Everyone on the team uses Maxim - engineers, designers, product, and even marketing. If someone sees a weird conversation, they can just share the link instead of screenshots or copy-pasting logs.”

The observability features provide confidence in production performance. Real-time monitoring of latency and costs helps the team understand system behavior, diagnose issues quickly, and make informed decisions about their infrastructure.

"Being able to monitor latency helped us quickly realize when slowdowns were coming from the model provider rather than our own system."

Beyond velocity and visibility, Maxim's evaluation framework gives the team confidence that their AI improvements are real. They can validate that changes work across diverse use cases before deployment, preventing the quality regressions that would erode user trust.

Conclusion

Yousician’s experience highlights the importance of treating prompt engineering as an iterative, measurable discipline rather than an intuition-driven craft.

For teams building AI products, Adam offers clear guidance: leverage existing specialized tools rather than building evaluation infrastructure from scratch. The ecosystem has matured significantly in the past year, making it faster than ever to build AI products. His second piece of advice: bring scientific rigor to prompt engineering. Test the techniques shared online against your full dataset rather than assuming they'll work for your use case.

As Yousician continues to scale personalized music education, Maxim provides the infrastructure to iterate quickly while maintaining quality. By investing in proper evaluation and observability from the start, Yousician has positioned itself to innovate rapidly in a fast-moving space.

At Maxim, we're committed to helping AI teams ship products that users trust and love. If you're building conversational AI or personalized learning experiences and need to move faster without sacrificing quality, let's connect.

Learn more about Maxim here: https://www.getmaxim.ai/
Learn more about Yousician here: https://www.yousician.com/