Company
Various
Title
LLM Integration in EdTech: Lessons from Duolingo, Brainly, and SoloLearn
Industry
Education
Year
2023
Summary (short)
Leaders from three major EdTech companies share their experiences implementing LLMs in production for language learning, coding education, and homework help. They discuss challenges around cost-effective scaling, fact generation accuracy, and content personalization, while highlighting successful approaches like retrieval-augmented generation, pre-generation of options, and using LLMs to create simpler production rules. The companies focus on using AI not just for content generation but for improving the actual teaching and learning experience.
## Overview This case study is drawn from a panel discussion featuring AI and product leaders from three prominent EdTech companies: Duolingo (Clinton Bicknell, Head of AI), Brainly (Bill Slawski, CTO), and SoloLearn (Yeva Hyusyan, Founder and CEO). Together, these platforms serve close to a billion learners globally. The discussion, framed around the theme of "LLMs in Production," provides a candid view of the challenges, lessons learned, and practical strategies these companies have developed while integrating large language models into their educational products. The panelists share a healthy skepticism about LLM capabilities while also expressing cautious optimism about the transformative potential of these technologies in education. A recurring theme throughout the discussion is the gap between what LLMs can do out of the box versus what is required to build genuinely useful, personalized, and pedagogically sound learning experiences. ## Company Contexts and Use Cases ### SoloLearn SoloLearn is a mobile-first platform for learning to code, with over 30 million registered users. Their approach emphasizes bite-sized, practice-heavy content with strong community support. Yeva Hyusyan describes how the company has been experimenting with AI for approximately 18 months, with some features now in production (notably an AI assistant that explains coding concepts in human language), while other experiments were discontinued when they didn't work. A particularly interesting project currently in development involves training models not just to produce content, but to produce it in a format that actually teaches. This reflects a key insight: generating factual content is relatively easy, but generating content that follows sound pedagogical principles is a much harder problem that LLMs don't solve out of the box. ### Brainly Brainly operates as a community learning platform—essentially the world's largest study group—with hundreds of millions of learners monthly. Their goal is to provide explanations that match what a learner's teacher would have given, accounting for vocabulary, concepts the learner has been exposed to, and personalization at the individual level. Bill Slawski emphasizes that Brainly doesn't own the curriculum—they support any school system and any curriculum. This creates unique challenges for personalization, as they must infer what each learner is learning based on behavioral patterns, cohort analysis, and content traversal patterns. ### Duolingo Duolingo, primarily known for language learning but expanding into math and music, has been using AI for personalization since nearly its founding about a decade ago. Clinton Bicknell describes how the latest generation of LLMs has enabled new interactive features, such as having conversations in the language being learned—something that wasn't technically feasible before. Key use cases include interactive conversation practice (e.g., ordering coffee at a restaurant), explanations for incorrect answers, and curriculum generation. However, Clinton is careful to note that generating content that teaches well is fundamentally different from just generating content. ## LLMOps Challenges and Lessons Learned ### The Fact Generation Problem A fascinating point of discussion emerged around where LLMs should and shouldn't be trusted. Bill Slawski was clear that relying on LLMs as "fact generation engines" is where disappointment lies. Instead, Brainly finds success using LLMs to synthesize or reframe information that they've already validated. The model is put on "rails" through augmented prompts and provided with trusted information to work with. Clinton Bicknell noted some nuance here, suggesting that both generating facts and presenting them pedagogically are hard problems, but generating reliably accurate facts is often the harder challenge to solve completely. This led to an interesting exchange where both panelists acknowledged that work is needed on both ends—neither pure fact generation nor pure presentation is solved out of the box. ### Retrieval-Augmented Generation (RAG) Bill Slawski described how Brainly's journey with LLMs progressed from constrained use cases to more sophisticated RAG implementations. They started with simple features like "simplify this explanation" or "explain this in more detail"—cases where the facts were already established and trusted, and the LLM was just reformulating the presentation. Only after gaining confidence with these constrained applications did they move into RAG use cases for generating more complete responses. RAG at Brainly involves understanding learner behavior patterns, identifying which cohort of learners a user resembles, and using that understanding to augment prompts with relevant context. Bill compares this to recommendation systems at Netflix or YouTube but acknowledges it is actually quite difficult to implement well. ### Cost Management at Scale A critical operational challenge that all three companies face is the cost of running LLM inference at scale. Clinton Bicknell was particularly emphatic that even cheaper models become cost-prohibitive when you're processing every user response for hundreds of millions of learners. Duolingo has developed several strategies to address this: - **Pre-generation of options**: Using LLMs to generate many possible responses or content pieces ahead of time, then selecting from these at runtime rather than generating on-the-fly - **Rule generation**: Using LLMs to generate cheap, simple rules or detectors that can be run at scale without the models themselves. For example, generating rules to detect specific grammatical errors that can then be applied in real-time without LLM inference - **Custom model training**: SoloLearn specifically mentioned training their own models to make certain features cost-effective after initial experiments with larger models proved too expensive at scale Yeva Hyusyan from SoloLearn recounted a specific lesson where they got excited about initial results from a small cohort test, did the math on scaling costs, and realized they couldn't scale the approach. This led to training custom models to achieve cost efficiency. ### Differentiation and Competitive Advantage Bill Slawski made a strategic point about differentiation: relying on LLMs to do what everyone else can do with them puts you at the "lowest common denominator." True competitive advantage comes from your proprietary data—what you know about your learners, their behavior patterns, the content they engage with, and how you augment LLM capabilities with this unique knowledge. This perspective reframes the value proposition: LLMs are tools that amplify your existing assets rather than replacements for domain expertise. Companies that treat LLMs as black-box solutions without layering their own insights will produce generic, undifferentiated experiences. ### The Teaching vs. Content Generation Problem Perhaps the most pedagogically interesting insight came from the discussion about what it means to actually teach versus simply generating content. Yeva Hyusyan noted that the last decade of EdTech has focused heavily on content generation—creating video lessons on any topic. But replicating a great classroom experience online has been much harder. The companies are now working on training models to produce content in formats that actually teach, incorporating feedback from real teachers into the training loop. This represents a shift from viewing LLMs as content factories to viewing them as apprentice tutors that need to learn pedagogical principles. Clinton Bicknell echoed this, noting that LLMs can generate content easily, but generating content that teaches well requires understanding that "large language models don't know how to do out of the box, except in kind of obvious common sense ways." ### Language Level Calibration Duolingo faces a specific challenge in language learning: keeping generated content at the appropriate proficiency level for each learner. While you can add instructions to prompts like "write at a third-grade level," this only gets you so far. More sophisticated AI work is required to properly calibrate the language difficulty—fine-tuning has proven helpful but the problem remains challenging. ### Interactive Features vs. Chatbots Clinton Bicknell raised an important product insight: it's easy to make a chatbot, but creating a compelling interactive product based on chat that users want to return to daily is a fundamentally different challenge. Some of this is product work (figuring out what experiences resonate with users), but there are also AI problems embedded within—like generating goal-directed conversations with character personalities while staying at the right language level for the learner. ## Production Considerations The discussion touched on several operational realities of running LLMs in production at EdTech scale: - **Feature constraints**: Starting with highly constrained use cases and expanding carefully rather than launching ambitious AI features immediately - **Iterative development**: Building features, learning from real usage, and sometimes discontinuing experiments that don't work - **Evaluation challenges**: Multiple panelists acknowledged the difficulty of evaluating whether AI-generated content is truly pedagogically effective - **Hybrid approaches**: Combining LLM capabilities with traditional ML (for personalization), rule-based systems (for cost efficiency), and human feedback (for quality control) - **User acceptance**: Bill Slawski noted that societal acceptance of AI in educational contexts has improved significantly since the release of ChatGPT, expanding the solution space for what's possible to deploy ## Looking Forward The panelists expressed cautious optimism about the future while acknowledging that progress will take time and continued iteration. Key areas of excitement include: - Interactive learning experiences that go beyond chatbots to create compelling, repeatable engagement - Solving fundamental educational access problems (underfunding, understaffing, lack of one-on-one attention) - Better personalization through understanding individual learner contexts - Using AI to create the equivalent of great one-on-one tutoring at scale The overall message is one of measured optimism tempered by hard-won lessons: LLMs offer genuine potential to transform education, but realizing that potential requires deep domain expertise, careful operational planning, significant investment in evaluation and iteration, and a clear-eyed view of where these models excel versus where they fall short.

Start deploying reproducible AI workflows today

Enterprise-grade MLOps platform trusted by thousands of companies in production.