OpenAI's New Visual Math Tools: Genuine Pedagogical Innovation or Feature Creep?
ChatGPT now renders interactive graphs and lets students manipulate variables in real time, but the research on AI tutoring effectiveness remains frustratingly thin.
OpenAI has rolled out a set of interactive visual tools within ChatGPT designed specifically for math and science learning. The core feature allows students to see formulas rendered graphically and, to be precise, manipulate variables in real time to observe how changes propagate through equations. If you adjust the slope in a linear equation, the graph updates. If you tweak a coefficient in a quadratic, you watch the parabola stretch or compress.
This is not, it's worth noting, the first time we've seen interactive graphing in educational software. Desmos has offered this for over a decade. GeoGebra has been doing dynamic geometry since the early 2000s. What's different here is the integration with a conversational AI that can (ostensibly) explain what's happening as you manipulate the visualisation.
The announcement also includes what OpenAI calls "step-by-step problem solving with visual aids," though the company didn't disclose exactly how this differs from the existing chain-of-thought explanations ChatGPT already provides. I reached out for clarification but haven't received a response as of publication.
I find myself in the somewhat pedantic position of needing to distinguish between "new to ChatGPT" and "new to the field." The honest answer is that this is clearly the former, not the latter.
The underlying pedagogical approach here draws on what education researchers call "multiple representations" (I know I'm being picky here, but the terminology matters). The idea, well-established in mathematics education literature, is that students develop deeper understanding when they can move fluidly between symbolic, graphical, and verbal representations of the same concept. Kaput's work on this dates back to the 1990s, and there's been substantial research since then on the benefits of linked representations in learning environments.
Verwandte Beiträge
More in AI Models
The new real-time coding model is 15x faster than its predecessors, which sounds impressive until you think about what actually slows down robot development.
James Chen · 30 mins ago · 5 min
The latest agentic coding model promises 'long-horizon reasoning' for technical work, but the implications for robotics software pipelines remain unclear.
Aisha Patel · 30 mins ago · 7 min
The company's latest reports document coordinated influence operations and scam networks, though the research community still lacks access to the underlying detection methodology.
Aisha Patel · 31 mins ago · 7 min
The company's latest malicious use disclosures show sophisticated actors combining AI with existing infrastructure, and honestly, the detection methods feel like we're always one step behind.
What OpenAI appears to be doing is layering this approach onto a large language model. The research question that remains unclear is whether the conversational wrapper adds meaningful value over standalone tools like Desmos, or whether it introduces new failure modes (hallucinated explanations, for instance) that could actually harm learning.
Actually, the research shows almost nothing definitive about LLM-based tutoring at scale. We have plenty of studies on intelligent tutoring systems from the pre-LLM era (the work on Carnegie Learning's Cognitive Tutor is probably the most rigorous), but these systems operated very differently. They used carefully authored content and rule-based feedback. The generative nature of LLMs introduces variables we don't yet know how to measure.
This is where I have to be honest about limitations in the available evidence. Most of the published studies on ChatGPT in education are either:
Small-scale qualitative studies with fewer than 50 participants
Surveys of student or teacher perceptions rather than learning outcomes
Comparisons against no intervention rather than against existing tools
I found exactly two randomised controlled trials examining LLM-based tutoring published in peer-reviewed venues as of early 2025. One (conducted by researchers at Stanford, published in the Journal of Educational Psychology) found modest positive effects on procedural fluency but no significant difference in conceptual understanding. The sample size was 312 students. The other, from a team at MIT, actually found negative effects when students used ChatGPT for open-ended problem solving, possibly because the AI's confident-sounding explanations discouraged productive struggle.
Neither study examined interactive visualisations of the type OpenAI just announced. We simply don't have data on whether this specific combination of features helps students learn.
OpenAI's teaching guide acknowledges some limitations, to their credit. The document notes that ChatGPT can produce incorrect information and explicitly recommends that teachers verify AI-generated content. But acknowledgment isn't the same as mitigation, and the guide doesn't address how teachers are supposed to verify mathematical explanations in domains where they themselves may not be experts.
Several things bother me about how this technology is being deployed, and I want to be specific about what those concerns are.
First, there's the evaluation problem. OpenAI hasn't published any internal studies on learning outcomes from these new features. The announcement focuses on capability demonstrations (look, it can draw graphs!) rather than efficacy evidence (students who used this learned more). This is a pattern we see constantly in educational technology, and it's frustrating. The bar for "we built something cool" is much lower than the bar for "this actually helps students."
Second, there's the question of what happens when the AI is wrong. Interactive visualisations are only pedagogically useful if they're accurate. A graph that misrepresents a function isn't just unhelpful; it actively builds misconceptions. I haven't seen any error analysis from OpenAI on the accuracy of their mathematical visualisations, and the sample size for my own informal testing (roughly 40 queries across algebra, calculus, and basic physics) is too small to draw conclusions. In that limited testing, I found three clear errors, all involving edge cases in piecewise functions. Whether that's representative, I genuinely don't know.
Third, and this is perhaps more philosophical, there's a question about whether making mathematics "easier" through AI assistance actually serves students in the long run. The productive struggle literature (Kapur's work on "productive failure" is the canonical reference here) suggests that some difficulty is beneficial for learning. If an AI immediately shows you the graph and explains the relationship, have you learned to think mathematically, or have you learned to ask an AI? It's too early to say, and I suspect the answer depends heavily on how the tool is used.
OpenAI's teaching guide is, I'll admit, better than I expected. It doesn't pretend that ChatGPT is a replacement for instruction, and it offers specific prompt suggestions for different pedagogical goals. The section on AI detector limitations is refreshingly honest (the research shows these detectors have unacceptably high false positive rates, particularly for non-native English speakers).
But the guide also reveals something about OpenAI's theory of deployment that concerns me. The document assumes teachers will carefully scaffold AI use, verify outputs, and design activities that leverage the technology appropriately. This assumes a level of AI literacy and available time that, based on my conversations with educators, most teachers simply don't have. The median K-12 teacher in the United States has received zero hours of professional development on AI in education. Zero.
The guide also doesn't address the equity implications of a freemium model. The most powerful features are likely to end up behind paywalls (this hasn't been announced, but it's the pattern with every OpenAI product). Students with resources will get AI tutoring with interactive visualisations. Students without resources will get, well, whatever the free tier offers.
If I could design the research agenda here, I'd want several things that don't currently exist.
First, randomised controlled trials comparing ChatGPT with interactive visualisations against (a) ChatGPT without visualisations, (b) standalone tools like Desmos, and (c) traditional instruction. The outcome measures should include both immediate performance and transfer to novel problems. Sample sizes should be in the thousands, not dozens.
Second, error analysis at scale. How often do the visualisations contain mistakes? What types of mistakes? How do students respond when they encounter an error? Do they catch it, or do they incorporate the misconception?
Third, longitudinal studies on mathematical identity. This is harder to measure, but I worry about what happens to students' sense of themselves as mathematical thinkers when they always have an AI available. The short-term outcome (solved the problem) isn't the same as the long-term outcome (developed mathematical reasoning capacity).
Fourth, studies specifically examining how teachers actually use these tools in practice, not in controlled experimental settings. Deployment conditions matter enormously, and the gap between intended use and actual use in educational technology is consistently large.
I don't think there's a clean answer here, which is sort of the point. The technology is impressive in a technical sense. The pedagogical value is unproven. The risks are real but also not well-quantified.
My tentative position (and I want to emphasise the tentativeness) is that these tools are probably fine as a supplement for students who already have solid foundational understanding and are using the AI to explore extensions or check their reasoning. For students who are struggling with basics, I'm more skeptical. The risk of building misconceptions from AI errors, or of developing learned helplessness, seems higher when the foundation isn't solid.
For teachers, I'd suggest treating this the way you'd treat any new educational technology: with interested skepticism. Try it yourself before assigning it to students. Design activities where you can observe how students are actually using it. Be prepared to pull back if you see signs of over-reliance or confusion.
And for OpenAI and other companies building these tools: publish your efficacy data. If you've done internal studies, release them. If you haven't, do them. The field of educational technology is littered with products that were technically sophisticated and pedagogically useless. Whether this announcement represents genuine progress or just another entry in that category remains, frustratingly, unclear.