
Today, the Copenhagen-based healthcare AI I cut is launching Symphony for Speech-to-Text, a new generation of clinical-grade speech recognition models designed specifically for real-time dictation, conversation transcription, and batch audio processing, and its accuracy rate is the highest recorded for this specific use case yet.
"We are focused on ensuring that doctors, medical professionals and patients can trust our AI scribes… the entire healthcare system." said Andreas Cleve, co-founder and CEO of Corti, in an exclusive video call interview with VentureBeat.
The performance data the company is presenting paints a bleak picture of the current state of enterprise AI: when it comes to specialized and highly regulated industries, domain-specific models can outperform basic model vendors.
In a recently published research articleCorti revealed that its new clinical-grade speech models reduced word error rates (WER) by up to 93% compared to leading generalist speech models. and API on medical terminology.
Regarding English medical terminology, its Symphony for Speech-to-Text achieved a remarkably low WER of 1.4%. In comparison, OpenAI’s speech model recorded a WER of 17.7%, ElevenLabs reached 18.1%, Whisper recorded 17.4%and The parakeet obtained 18.9%.
Corti’s announcement serves as a critical turning point for healthcare advocates. While general-purpose APIs like OpenAI Whisper are sufficient for broad-domain transcription, they frequently stumble over medical acronyms, complex medication dosing, shorthand, and noisy emergency room environments. Symphony for Speech-to-Text aims to solve this by providing developers with a highly specialized, production-grade API designed from the ground up for clinical workflows.
The agent era demands impeccable data entries
The launch of Symphony for Speech-to-Text highlights a fundamental shift in the way healthcare uses voice technology. For decades, medical speech recognition primarily consisted of generating a static text document for human doctors to review—a digital replacement for a notepad.
But as the healthcare industry hurtles toward what technologists call the "he was an agent" where autonomous AI agents actively assist in clinical decision making, EHR navigation, and real-time support, transcription is no longer the end product. It is the fundamental data layer.
“Speech has always been one of the most important contributions to healthcare,” Cleve said in a statement provided to VentureBeat. “What’s changing is what happens after the words are captured. In the era of agency, speech recognition requires more than simply producing a transcript: we need to give AI systems accurate clinical facts to reason with. If a model mishears a medication, a dose, or a symptom, each subsequent step becomes less reliable. Symphony for Speech-to-Text gives healthcare creators a speech layer accurate enough to thrive in clinical reality.”
This is where the compounded danger of high word error rates comes into play. If a general-purpose AI model hallucinates a transcript, becoming "hyperthyroidism" in "hypothyroidism," or misinterpreting a critical drug dose – every subsequent AI agent that relies on that transcript will operate with corrupted data. Corti’s architecture mitigates this risk by producing structured, clinically usable results directly from the API, helping downstream AI applications reason about clear facts rather than messy, plain text.
Nowhere is this more evident than in Corti’s entity retirement benchmarks. Symphony for Speech-to-Text reached an astonishing level 98.3% recovery rate in formatted clinical entities.such as doses, measurements and dates. In contrast, Corti reported that the strongest general-purpose reference model peaked at just 44.3% recovery.or the same entities.
For developers creating environmental AI documentation tools, that 54% gap is the difference between a tool that saves a clinician’s time and a tool that constitutes a medical liability.
Dethroning the industry darlings
While Corti’s benchmarks against modern LLM creators like OpenAI and ElevenLabs are striking, the company is also taking aim at the legacy giants of medical transcription.
For years, the gold standard for dedicated medical dictation has been Dragon Medical One. However, these legacy systems have historically been optimized strictly for intentional physician dictation, not as underlying infrastructure for ambient AI, complex multi-party conversations, or real-time clinical support tools.
In real-world English medical dictation evaluations, Corti achieved a WER of 4.6%, besting Dragon’s 5.7% (a 19% relative improvement).
Additionally, Corti demonstrated greater recall of medical terms than Dragon (93.5% vs. 92.9%).
By providing this level of precision through an API endpoint, Corti enables third-party developers, EHR vendors, and virtual care platforms to create their own custom ambient listening and dictation tools that surpass industry legacy.
"We want people to create applications on our models," Cleve said. "The goal is to disseminate the technology as widely as necessary so that it can be as useful as possible to patients, their doctors and professionals."
For Cleve and his co-founders, the mission is personal: Cleve’s own mother was a healthcare professional attacked by a patient and spent years fighting to recover. He sought to improve healthcare processes as a way to honor their sacrifice.
Solving the health care model puzzle
The demands of healthcare extend far beyond English-speaking hospitals, and global healthcare systems have historically been underserved by NLP clinical models. Early adopters are already taking advantage of Corti’s new models in linguistically demanding environments, demonstrating the viability of the technology in complex international markets.
Switzerland, for example, requires the provision of care in multiple languages, often simultaneously within a single medical institution. It serves as one of the world’s most stringent testing grounds for multilingual medical speech models. Corti’s Symphony models demonstrated huge performance improvements in these non-English tests, achieving a WER of 2.4% in German (compared to 13.0% for the next best system) and a WER of 3.9% in French (compared to 10.6%).
“In a clinical conversation, every word matters: an omitted medication name, a misheard dose, or a poorly transcribed symptom can change the meaning of an encounter." said Pierre Corboz, Director of Solutions and Business Development at Voicepoint, a Swiss healthcare technology provider, in a statement provided to VentureBeat. "Symphony’s precision in clinical terminology gives us the foundation to incorporate more reliable AI capabilities into clinical workflows with our Voicepoint Xenon platform. When Corti enhances the speech layer, the workflows we build together become cleaner, safer and more useful for doctors in Switzerland.”
Virtualization and specialization of AI are giving results
Today’s announcement of Symphony for Speech-to-Text is not an isolated event; is the culmination of a strategic narrative that Corti has been aggressively pushing over the past few weeks.
The broader Symphony platform, which powers clinical and administrative applications for a global network of EHR vendors and life sciences organizations, has consistently demonstrated the defense of vertical AI labs against horizontal technology giants.
This is the third major benchmark Corti has released in just six weeks, touching on different layers of AI performance in healthcare.
In April, the company revealed that its Symphony for medical coding The system outperformed general-purpose models by more than 25% in clinical accuracy benchmarks, addressing one of the most notoriously complex workflows in healthcare.
And just last week, Corti announced that its flagship clinical-grade model beat OpenAI on HealthBench ProfessionalOpenAI’s own healthcare benchmark.
Together, these three data points—medical coding, clinical reasoning, and speech-to-text accuracy—illustrate a growing consensus in the enterprise technology sector: pervasive models are reaching a ceiling in regulated industries.
Models implemented in hospitals must inherently comprise complex acronyms, sudden interruptions, medical shorthand, specialty-specific language, and strict compliance restrictions. By training specifically on these unique edge cases, vertical AI labs like Corti are building a formidable moat that companies that rely solely on API calls to large, generalized language models cannot easily cross.
Availability and product line
Developers are clearly realizing the performance gap. According to momentum data provided to VentureBeat, Corti is seeing 30% growth in new signups for its platform in quarter-to-date comparisons, indicating that healthcare developers and creators are actively gravitating toward clinical-grade vertical models rather than generalist APIs.
Corti, which already serves more than 100 million patients annually across major healthcare systems, including the UK’s National Health Service (NHS), is positioning Symphony for Speech-to-Text as the default engine for the next generation of healthcare software.
It’s important to note that Corti will not be launching the overall Symphony platform today; rather, Symphony for Speech-to-Text operates as a new and distinct capability within that broader ecosystem, accessible through its own API endpoints.
Symphony for Speech-to-Text is generally available starting today. Developers and enterprise architects can access the models through the Corti API console, with complete technical documentation available to help integrate the clinical-grade voice layer into their existing applications.
In a move toward research transparency, Corti also published its full research paper detailing its methodology, along with a separate comparison tool designed to support transparent evaluation of medical voice recognition systems across the industry.
As the healthcare industry continues to rapidly adopt AI-powered automation, the foundational data layer has never been more critical. Corti’s latest release is a stark reminder that in the medical field, generic AI simply isn’t good enough. The future belongs to the specialists.





