Elevating Conversational Banking: Clinc's Path to AI Confidence with Maxim

About Clinc

Clinc provides a sophisticated conversational AI platform, primarily focused on the banking industry, enabling financial institutions to build advanced virtual assistants. Recognizing that state-of-the-art AI results are increasingly obtained by orchestrating multiple components into Compound AI Systems, rather than relying on single large models, Clinc is specifically designed as a framework to facilitate the creation and orchestration of these complex systems. At the heart of this approach is the platform's proprietary “state graph”, which serves as the visual architecture for designing the flow and interaction of various AI elements, allowing for the creation of virtual assistants that support natural, context-aware conversations using everyday language.

The state graph allows users to engineer sophisticated conversational experiences that combine the capabilities of traditional NLP models, modern generative AI models, and other external tools or services. These capabilities empower financial institutions to create virtual assistants that can effectively manage account inquiries, execute transactions, and interact with relevant documents. By providing a platform for building these systems, Clinc helps banks quickly deploy production-ready conversational AI. This results in significant benefits for major financial institutions worldwide, including enhanced customer experiences, reduced operational costs, and measurable ROI.

Challenge

As the Clinc team continued to advance their platform, they began exploring new capabilities to make their conversational AI even more powerful and adaptable. This included experimenting with Retrieval-Augmented Generation (RAG) for document-based Q&A and expanding their natural language understanding (NLU) features to better interpret and extract details from user conversations.

As they did this, they ran into challenges common to fast-moving AI teams:

Their initial RAG implementation relied heavily on public benchmarks, which didn't fully align with their specific production data needs
Initial RAG evaluation datasets required refinement to better represent real-world conversational scenarios.
The rapid emergence of new AI models/architectures created a need for systematic evaluation of individual AI capabilities as well as to determine optimal combinations of those capabilities for various pipelines.
With the expansion of both RAG and NLU capabilities, Clinc needed a more comprehensive framework to evaluate different approaches like classification, entity extraction & mapping, context retention, retrieval, response generation, to name a few.
As they analyzed metrics and results, they discovered gaps in their testing methodology that required continuous refinement and improvement of their benchmark datasets.

The Clinc team needed a more robust evaluation framework as they worked to provide their banking clients with the most effective conversational AI solutions possible.

Clinc’s Approach to AI Quality with Maxim

Integrating Maxim into Clinc’s research workflow has transformed how the team develops, benchmarks, and refines their conversational AI platform—ensuring every component is robust before reaching production.

Modular Evaluation

Clinc runs experiments in dedicated Maxim workspaces, focusing on each part of the conversational pipeline. This modular approach enables targeted evaluation, making it easy to set up focused runs, use real banking data, and compare model predictions against expected outputs.
Attach targeted metrics to each run, such as recall for retrieval or answer faithfulness for generation, ensuring evaluations are precise and actionable.
Iterate on one component at a time, locking in the best-performing version before moving to the next stage.

Enhanced Dataset Management

The team builds and maintains sophisticated evaluation datasets that accurately reflect the complexity of real banking conversations, including negative cases and conflicting information.
Using Maxim, Clinc systematically identifies dataset gaps and iterates on them—adding new examples and challenging edge cases as they arise.
All datasets are versioned, and evaluations are re-run across model updates to ensure ongoing robustness and relevance.

Rapid Model and Prompt Experimentation

With Maxim’s prompt-based testing and Ollama integration, the team quickly experiments with prompt structures and configurations on custom models, optimizing performance for specialized banking use cases.
Clinc team benchmarks both classical machine learning models (such as SVMs and linear regression) and advanced LLM-based models (including zero-shot and few-shot approaches) within the same evaluation framework, enabling side-by-side comparisons and fast iteration as new models emerge.

Insights and Iterative Improvement Loop

As experimentation scales, Maxim’s organizational features become critical:

Insights from Maxim evaluations feed directly into Clinc’s development process. Datasets are refined as new gaps or edge cases are discovered, and models and prompts are updated and re-evaluated based on observed performance.
Dashboards: Provide instant visual comparisons of metrics across different models, runs, or configurations. The team relies on these visualizations to track progress, share findings internally, and make data-driven decisions about model selection and pipeline updates.

This loop repeats, with each iteration producing more robust, reliable, and context-aware conversational AI components—enabling Clinc to deliver the highest quality solutions to banking clients, with faster development cycles and greater confidence in system performance.

Impact

Clinc’s AI research and development workflow has become much faster and more organized with Maxim. The modular approach helps the team focus on improving one part of their conversational AI at a time without getting lost in the complexity of full pipeline testing. What used to require approximately 40 hours of manual work to create polished, shareable reports and charts now takes less than 5 minutes with Maxim's dashboard functionality.This shift has streamlined internal reporting and made progress easier to track.The platform's comprehensive dataset and results management—which would have required an estimated 80 hours to develop internally—comes built-in, saving approximately 4 hours per month previously spent on file organization and preventing errors from outdated information.

“Maxim AI has significantly accelerated our testing cycles for evaluating RAG pipelines and benchmarking new LLMs, enabling faster iteration in our development process. The ability to compare LLM performances using their dashboards has proven very helpful for our internal reporting and decision-making."

- Rajan Mehta, Senior Software Engineer, AI

The Evaluation Toolkit and prompt-based testing have enabled rapid model swaps and quick reruns, helping Clinc keep pace with new AI releases and refine their datasets on the fly. Beyond speed, the shift to a GUI-driven evaluation environment has eliminated tedious scripting and reduced debugging time—removing the need to manage Python scripts or spreadsheets for every experiment. With fewer errors and a cleaner interface, Clinc's team can focus on high-impact work like model improvement and gap analysis.The platform has also enhanced team collaboration through shared workspaces, provided faster onboarding for new team members, and enabled more efficient knowledge sharing across the organization—qualitative benefits that, while harder to measure in hours saved, have substantially improved the team's workflow.

“The turnaround from the Maxim team has also been excellent; whenever we’ve requested new features or shared feedback, they’ve been quick to respond and address our needs. That responsiveness has made for a really positive experience.”

- Rajan Mehta, Senior Software Engineer, AI

Overall, these improvements have allowed Clinc to experiment, learn, and refine their conversational AI more efficiently, giving them greater confidence as they deliver advanced solutions for the banking industry.

Conclusion

Clinc’s journey highlights the value of building a strong, modular foundation for conversational AI—one that enables rapid iteration, targeted evaluation, and continuous improvement. By integrating Maxim into their workflow, Clinc has moved from spreadsheet-heavy, end-to-end testing to a streamlined, workspace-driven process that empowers the team to experiment, benchmark, and refine each component of their platform with clarity and speed.

This approach has not only accelerated their development cycles but has also given the team greater confidence in the quality and reliability of their solutions as they scale to meet the evolving needs of the banking industry. As Clinc continues to innovate in conversational AI, robust evaluation and flexible experimentation remain central to their strategy.

At Maxim, we’re proud to support teams like Clinc, who are shaping the future of AI-powered customer experiences. If you’re looking to set a strong foundation or level up your AI evaluation process–and accelerate your path to production, let’s connect.

Learn more about Maxim here: getmaxim.ai

Learn more about Clinc here: clinc.com