Two New GPU-Native Solvers Promise to Close the Gap Between MPC and Modern Robotics Pipelines
TurboMPC and jaxipm tackle the same bottleneck from different angles: getting constrained optimization off the CPU and onto the GPU where the rest of modern robotics already lives.
By
·9 hours ago·8 min de leitura
58 times faster than the previous best GPU-based differentiable solver. That is the headline number from TurboMPC, a new model predictive control solver out of Toyota Research Institute, and it is the kind of figure that demands scrutiny before celebration.
This week saw two pre-prints land on arXiv that, taken together, describe a meaningful shift in how constrained optimization might integrate with GPU-accelerated robotics workflows. TurboMPC (arXiv:2606.24039) and jaxipm (arXiv:2606.26341) both address the same structural problem: mature, reliable solvers for nonlinear programs have historically been CPU-bound, single-problem tools, while the rest of modern robotics research has migrated almost entirely to GPU-batched pipelines. The mismatch has been an open sore in the field for years.
Neither paper claims to have solved robotics. But both make a credible case that the gap is narrower than it was last week.
TurboMPC reports speedups of up to 15 times over state-of-the-art CPU differentiable solvers and up to 58 times over GPU differentiable solvers in simulation benchmarks. It is worth noting that these figures come from the authors' own benchmarks, which is standard for pre-prints but also means independent replication has not happened yet. The comparison baseline matters enormously here, and the paper is specific: the GPU comparison is against existing GPU differentiable MPC solvers, not against IPOPT running on a well-tuned CPU cluster.
Cobertura relacionada
More in Research
New work on exoskeletons, hybrid supervision, humanoid data collection, and vibrotactile sensing all circle the same bottleneck: getting good demonstration data into dexterous robot hands.
Aisha Patel · 10 hours ago · 10 min
A flow-matching framework for cross-embodiment manipulation and a point-cloud feasibility predictor both land this week. One is genuinely novel. The other is incremental but useful.
Aisha Patel · 11 hours ago · 10 min
A cluster of new robotics research tackles cloth manipulation, VLA latency, and humanoid locomotion. The results are genuinely interesting, though production-ready is still a ways off.
James Chen · 17 hours ago · 7 min
The solver itself combines sequential quadratic programming (SQP) with an ADMM inner solver, implicit differentiation for gradient computation, and a co-designed JAX-CUDA implementation. It supports state and control inequality constraints, implicit integrators, cross-time-coupled costs, and slack variables. That last set of features is not trivial. Many fast MPC solvers achieve their speed by restricting the problem class they will accept. TurboMPC appears to maintain expressiveness while recovering speed, which is the harder engineering problem.
The planning horizon numbers are also notable. The paper reports stable vehicle control at planning horizons exceeding 8,000 knot points. For context, most real-time MPC implementations in robotics use horizons of 20 to 100 steps. 8,000 is not a number you see often, and the fact that it scales there while running on a GPU suggests the architecture is genuinely different from prior work rather than an incremental tuning exercise.
On the physical hardware side, TurboMPC was deployed on a full-scale car for minimum-time racing. Bayesian optimization over batched, GPU-accelerated MPC parameter tuning produced faster lap times than a hand-tuned baseline. The paper does not specify how many trials the Bayesian optimization ran or what the hand-tuning process involved, which would be useful context for evaluating how large that performance gap actually is.
jaxipm takes a different approach. Rather than building a new solver architecture, the team redesigned IPOPT, the gold-standard NLP solver used throughout academic and industrial robotics, to run in a GPU-batched mode using JAX. The key algorithmic contributions are what they call heterogeneous iteration fusion, which eliminates control flow that would otherwise serialize execution on GPU hardware, and iteration-level batching, which minimizes idle time across the GPU threads solving different problems concurrently.
The throughput improvement reported is up to 32.85 times over standard IPOPT on quadrotor benchmarks, including reference tracking with obstacle avoidance, multi-quadrotor collision-free navigation, and navigation in cluttered environments. The framing here is throughput, meaning problems solved per second across a batch, rather than latency on a single problem. That distinction matters practically. If you are running a learning pipeline that needs thousands of trajectory optimizations per training step, throughput is the relevant metric. If you are running a single robot in real time, latency is what you care about, and the paper is less explicit about that regime.
I want to be precise about the novelty claims, because this is a space where incremental work sometimes gets presented as a breakthrough.
The core problem both papers address is well-documented. CPU-bound NLP solvers like IPOPT and differentiable MPC solvers like Crocoddyl or ALTRO have been the standard tools for constrained trajectory optimization in robotics for years. They work well. They are mature. But they solve one problem at a time, and they do not fit natively into the GPU-batched simulation and learning frameworks that now dominate robotics research. Papers like the GPU-accelerated IsaacGym work from NVIDIA, and the broader push toward massively parallel simulation, have made this mismatch increasingly painful.
Prior work on GPU-accelerated MPC exists. Solvers like MPPI (model predictive path integral) are sampling-based and GPU-native, but they sacrifice hard constraint satisfaction. Differentiable MPC work in JAX and PyTorch has appeared over the past few years, but with significant limitations in problem expressiveness or speed. The contribution of TurboMPC is, to be precise, not that it puts MPC on a GPU (that has been done), but that it does so while maintaining the kind of constraint support and differentiability that makes it useful for the full range of problems robotics researchers actually care about.
jaxipm's novelty is arguably cleaner: it is, as the authors claim, the first GPU-batched NLP solver based on IPOPT. If that claim holds up under scrutiny, and I am not aware of a direct prior that contradicts it, that is a genuine first. The significance is that IPOPT comes with decades of algorithmic refinement, hard constraint satisfaction guarantees, and a user community that knows how to apply it. Porting its behavior to a GPU-batched setting without losing those properties is non-trivial work.
This is genuinely new, not just incremental over prior GPU MPC work, though it builds on a substantial foundation of JAX-based robotics infrastructure that has been accumulating over the past few years.
The practical implication of both papers is the same: constrained optimization can now, in principle, participate in GPU-batched learning loops without becoming the bottleneck.
This has downstream consequences for several active research areas. Reinforcement learning for robotics has largely avoided hard constraints because the solvers that enforce them were too slow to run inside a training loop. Imitation learning from demonstrations that involve constrained motion, think manipulation tasks with joint limits or collision avoidance, faces similar friction. Model-based RL approaches that want to use MPC as a policy or as a planning component inside a learned system have had to make compromises on either the constraint side or the speed side.
TurboMPC explicitly validates on humanoid imitation learning and reinforcement learning with neural network cost functions, which are exactly the kinds of tasks where this integration matters most. The humanoid result is worth watching. Humanoid robots have extremely complex constraint structures, joint limits, contact constraints, balance requirements, and getting MPC to run in real time on a humanoid while remaining differentiable for learning is an open problem that several major research groups are working on.
There is also a less obvious implication for sim-to-real transfer. If you can run batched constrained optimization at GPU speeds during training, you can potentially generate far more diverse, constraint-satisfying trajectories for training data, which has historically been a limiting factor in learning-based control.
Several things remain unclear from both pre-prints, and they matter for assessing how quickly this work translates into practice.
First, wall-clock latency for single-problem real-time control. Both papers emphasize throughput and batch performance. For deployment on a physical robot running at 100Hz or higher, the relevant question is whether a single MPC solve completes in under 10 milliseconds. TurboMPC's car racing result suggests it can run fast enough for real-time use, but the paper does not give explicit single-solve latency numbers in a way that generalizes to other hardware.
Second, robustness to initialization. NLP solvers are sensitive to warm-starting and initial guesses. GPU-batched settings may involve problems with diverse initializations, and it is too early to say how jaxipm handles pathological cases or how often it fails to converge compared to IPOPT running in its standard mode.
Third, the sample size concern is real for the physical experiments. TurboMPC's car racing results appear to come from a limited number of runs (the paper does not report statistics over many trials in the physical experiment section). Minimum-time racing is a high-variance task, and without variance estimates, the comparison to the hand-tuned baseline is suggestive rather than conclusive.
Both codebases are open-source. TurboMPC is available at the Toyota Research Institute GitHub, and jaxipm is at John Viljoen's repository. That is the right move, and it means the community can start stress-testing these claims relatively quickly. Independent benchmarking on standard robotics tasks would be the next meaningful signal.
What I would want to see next is a head-to-head comparison between TurboMPC and jaxipm on the same benchmark tasks, with explicit wall-clock latency numbers, convergence failure rates, and performance on tasks outside the ones each team designed their experiments around. Both papers are solving adjacent problems with different architectural choices, and understanding where each approach wins would be more useful than the current state of each paper evaluating against older baselines.
The broader point is this: the infrastructure gap between constrained optimization and GPU-native robotics has been a real friction point, and two serious attempts to close it appearing in the same week is not a coincidence. It reflects where the field's pressure has been building. Whether these specific implementations become the standard tools or get superseded by something better in eighteen months is less important than the fact that the problem is now being attacked with the right tools.
A pair of new arXiv preprints take different but complementary approaches to a problem the field has largely been avoiding: how do you formally guarantee the safety of a robot running a foundation model?