Every language app you have in your pocket inherited a teaching method created for Latin. Understanding why this happened is a more useful design lesson than anything the apps themselves teach you.
Article continues below
In 1788, Prussia introduced the Abitur, a standardized national exam required for entry into universities and the civil service. To pass, students had to demonstrate measurable and gradeable knowledge. The system needed to teach languages in large classrooms, produce consistent results, and do so with one teacher and thirty students. The educators responsible for designing this system turned to the only teaching model they had, one that had been used in European schools for two centuries: the method developed for teaching Latin.
In 1788, Latin was a dead language. Nobody needed to talk about it. The scholars who studied him read Cicero and Virgil, they did not have conversations. The method built around it, memorizing grammatical rules, constructing translations, analyzing written texts, reflected exactly that reality. Oral skills were irrelevant. Understanding the written form was everything. The method was not designed to produce loudspeakers. It was designed to produce readers of texts in a language that no one spoke.
When Prussia applied this model to French and German, living languages spoken by living people, the premise did not change. Johann Valentin Meidinger’s textbook Practical French GrammarPublished in 1804, it had 37 editions throughout Europe in 1857 (1). Karl Plotz formalized the approach into what became the dominant model for modern language teaching throughout Europe and eventually in the United States, where it became known simply as the Prussian Method (2). Each institution that adopted it trained teachers in it, who trained students who became teachers. The constraint that the method created, how to rate language at scale with limited resources, became invisible within the method itself. What remained was the assumption: language is a set of rules that must be consciously learned and measured. It was a design decision disguised, over time, as pedagogical truth.
The observation that should have put an end to it.#section2
There are people in the world who cannot read or write a language and speak it fluently. There are children who carry on entire conversations years before they can read a single word. There are immigrants who arrive in a country without knowing its language and leave, years later, speaking it naturally, not because they have studied it, but because they lived within it. Literacy and fluency are separate things produced by completely separate mechanisms. The Grammar-Translation method, as it became known, assumed that they were the same. That assumption was inherited from a method designed for a language that no one needed to speak, and was wrong the moment it was applied to a language that people actually used.
The evidence against him slowly accumulated. In the mid- to late 19th century, reformers such as François Gouin in France and Maximilian Berlitz in the United States independently argued that language should be taught the way it is actually acquired, through immersive exposure to actual communication in the target language, not through analysis of its rules. Berlitz built an entire school network around this principle. The reformers were right. They were also largely ignored by conventional educational systems, because the grammar-translation method had a decisive advantage that direct immersion did not: it could be graded.
In 1982, linguist Stephen Krashen gave the argument its most formal articulation in what he called the Monitor Model of Second Language Acquisition. Their distinction was precise: language acquisition, the unconscious process through which children absorb their native language and through which adults succeed in immersive environments, is categorically different from language learning, the conscious study of grammatical rules and vocabulary taught in classrooms (3). Acquisition produces fluidity. Learning, at its best, produces the ability to pass a test. The evidence supporting this distinction and the observation that immersive exposure to actual native speaker communication is the mechanism that produces genuine fluency has only increased since then.
I went to Brazil without a word of Portuguese and left speaking it. I studied French in a classroom for years and today I can’t hold a conversation in French. This is not an unusual experience. It is the expected result, and it has been the expected result since we have had formal language education.
The same decision, made again in a different environment#section3
Prussian educators were faced with the question: How to deliver language learning at scale, measure progress, and retain users over time? The answer he arrived at was structurally identical to the one he arrived at in 1788. Duolingo gamified the grammar exercise in a flash. Anki formalized the translation exercise into a spaced repetition flashcard. Babbel organized grammar lessons into structured modules. The interfaces were new. The underlying assumption that language is something that is studied and not an environment that is inhabited was not.
This was not a failure in design skill. The products that emerged from these decisions are, in many ways, genuinely well made. Duolingo’s retention mechanics are sophisticated. Anki Spaced Repetition is based on real cognitive science. They are excellent at what they really do. The problem is what they actually do: produce a measurable commitment to a substitute for language rather than the conditions that produce language itself. A streak is measurable. Vocabulary score is measurable. The moment a user leaves an app and has a real conversation in another language, that happens in the world, outside the product, and cannot be instrumented.
When the outcome a user needs is difficult to measure directly, the design process tends to look for something that can be measured. The proxy becomes the goal. The interface is optimized for this. The gap between what the product offers and what the user really needs grows. This is not a pattern unique to language learning. It’s a pattern that repeats itself across product categories whenever a design constraint (the need to measure, the need to scale, the need to produce quality) becomes so deeply embedded in a system that it stops being visible as a constraint and begins to be confused with a truth about the problem itself.
What happens when the restriction changes?#section4
The restriction that made the Grammar-Translation method necessary in 1788 was real and rational. A teacher. Thirty students. A standardized test. You can’t rate a conversation on a scale. You can grade a translation exercise. The method was not chosen because it produced fluency. He was chosen because he produced a score.
That restriction no longer exists in the same way. Technology has made it possible to offer immersive, real-time conversational practice to anyone with a smartphone, at a cost that continues to fall. The design problem is no longer how to make language learning gradeable at scale. It is about how to make the conditions for genuine language acquisition accessible to people who cannot move to another country or afford a native tutor.
The products that are now closer to solving the real problem are not the ones that invented a new pedagogy. They are the ones who removed the access barrier to an ancient one. Praktika creates AI conversation partners with distinct personalities, regional dialects, and cultural context, replicating the specificity of a real native speaker rather than a generic language learning voice. Langua clones the voices of native speakers to make the interaction seem like a real conversation instead of a lesson. Rosetta Stone’s fundamental methodology, the association of images in the target language without translation, was based on the same idea that Berlitz arrived at in the 19th century: language is acquired through immersive exposure, not through analysis of its rules (4). A 2025 study found that students who used AI conversation practice tools showed a 75 percent improvement in conversation scores over eight weeks, a result that no flashcard optimization has consistently produced (5).
None of these products invented a new theory of language acquisition. They translated an existing one into something more people could reach.
The design question this leaves#section5
The Grammar-Translation method persisted not because educators got the design wrong, but because a design decision made under a specific constraint became, over two centuries, indistinguishable from the thing itself. The restriction on how to rate language on a scale was forgotten. The method he produced was inherited as if it were a description of how language works, passed from Prussia to Europe, from America to the App Store, from grammatical exercise to streak.
Every time a design team optimizes a metric because the actual result is difficult to measure, they are making a version of the same decision. It is often the right decision given the real constraints. The question worth asking is whether the constraint that made it necessary still exists or whether it has simply become invisible within the system it originally produced.
Before addressing what can be measured, it is worth asking what the user really needs to do and what prevented them from doing it before. Sometimes the answer is a new solution. Most often it is something old that has always been out of reach.




