Robots Are Finally Learning to Feel. Here's Why That's Harder Than It Sounds.
A cluster of new research is tackling one of robotics' most stubborn problems: getting robots to actually use touch. The sim-to-real gap is the villain of the story.
By
·9 hours ago·7 min de lectura
What does it feel like to pick up an egg without breaking it?
You probably don't think about this. Your fingers just... know. There's this constant stream of pressure and texture information flowing from your fingertips to your brain, and your grip adjusts in real time without you consciously doing anything. It's so automatic it feels like nothing.
For robots, it's one of the hardest problems in the field.
I've been following tactile sensing research for a while now, and honestly, I'm not sure most people outside of robotics labs appreciate how deep this rabbit hole goes. It's not just "put sensors on the fingers." The real problem is getting robots to learn from touch, and that means simulation, and simulation of tactile sensors is, to put it mildly, a mess.
This past week, four new papers landed on arXiv that are all, in different ways, trying to fix this. They're worth looking at together.
Here's the thing about training robots in simulation. You build a virtual environment, run millions of trials, and then transfer the learned policy to a real robot. This works reasonably well for vision and for basic motion. It works much worse for touch.
Why? Because tactile sensors are weird and varied. A GelSight sensor works by pressing a gel against an object and imaging the deformation. A capacitive sensor measures electrical changes. A sensor using penetration depth in simulation measures something that has no clean real-world equivalent. When you train a robot policy using simulated touch data, the "texture" of that data looks completely different from what the real sensor produces. The policy falls apart.
Cobertura relacionada
More in Humanoids
A pair of freshly released robotics datasets tackle opposite ends of the same problem: teaching humanoids what to do, and teaching them what not to do.
Sarah Williams · 7 hours ago · 5 min
Three new robotics papers suggest we're past the proof-of-concept phase for humanoid loco-manipulation, and the numbers are starting to back that up.
Mark Kowalski · 8 hours ago · 7 min
A pair of robotics papers tackle two of the most practical blockers standing between vision-language-action models and real-world deployment: overconfidence and computational bloat.
Sarah Williams · Yesterday · 7 min
Two new papers tackle the problem of getting humanoid robots to gesture naturally during speech. It's a genuinely hard problem, and the solutions are more clever than the demos let on.
This is the sim-to-real gap for tactile sensing, and it's nastier than the equivalent problem for vision.
The four papers I want to talk about each take a different angle on it.
TactSpace: Just stop trying to simulate the raw signal
The first paper, arXiv, proposes something I find genuinely clever. Instead of trying to make simulation produce realistic sensor readings (which is hard), TactSpace learns a shared embedding space where simulated and real tactile data end up in the same neighborhood, even if the raw signals look nothing alike.
The idea is to use modality-specific encoders that project things like simulated penetration depth and real-world capacitance into a common latent space. Train with contrastive alignment objectives so that the representations converge. Then, when you deploy on a real robot, the policy is working in that shared space, not in the raw signal space where sim and real diverge.
The results are pretty striking. Zero-shot sim-to-real transfer on force prediction and shape reconstruction tasks, with a 16.7% reduction in force prediction error and a 45.8% reduction in shape reconstruction error compared to baseline approaches. They also released a Warp-based tactile simulation implementation for Isaac Lab, which is a nice practical contribution.
I initially thought this was sort of sidestepping the problem rather than solving it, but after reading more carefully, I think the framing is actually right. You don't need perfect simulation if you can learn representations that are invariant to the simulation-reality gap. That's a cleaner target.
TaCauchy: What if simulation was actually physically correct?
The opposite philosophy shows up in TaCauchy, which takes the position that we should just do the physics properly.
TaCauchy is a Finite Element Method framework that plugs into Isaac Sim and computes Cauchy stress tensors from hyperelastic constitutive laws. If that sentence means nothing to you, the short version is: it's doing real continuum mechanics to figure out how a tactile sensor deforms under contact, rather than using approximations or empirical fits.
The benchmarks are interesting. 33.40 FPS for a single environment, 555 FPS aggregate across 60 parallel environments. Stress extraction overhead under 1 millisecond. And in physical validation, simulated and real tactile responses agreed strongly across force ranges from 1.2556 N to 4.7332 N, with SSIM scores above 0.93.
The framework supports GelSight Mini, DIGIT, and 9DTact sensors out of the box, with what they describe as minimal configuration to add new ones.
This approach and TactSpace's approach are almost philosophically opposed. One says "make simulation accurate enough that the gap disappears." The other says "accept the gap and learn around it." Both seem to be working, which is interesting. It remains unclear which will scale better as manipulation tasks get more complex.
PTLD: Skip simulation entirely for the tactile part
The PTLD paper (arXiv) takes yet another route. The insight here is that collecting real-world tactile data isn't as hard as collecting real-world demonstrations, so why not use privileged real-world sensors to build a tactile state estimator, and then distill that into your policy?
The setup is a bit involved. You train a manipulation policy in simulation using reinforcement learning, without any tactile input. Then in the real world, you collect data using privileged sensors (sensors you might not have at deployment time, or that are easier to instrument temporarily) to train a robust state estimator that runs on tactile input. That estimator gets distilled back into the policy.
The numbers are hard to ignore. On the benchmark in-hand rotation task, PTLD achieves a 182% improvement over a proprioception-only policy. On tactile in-hand reorientation, a 57% improvement in goals reached. Those are large gains.
The thing I find most interesting about PTLD is that it doesn't require you to solve tactile simulation at all. You're using simulation for the policy structure and real-world data for the tactile component. It's a pragmatic split.
HT-Bench: Can we even measure progress properly?
This one's a bit different. The HT-Bench paper is less about a new technique and more about the infrastructure problem: we don't have a good shared benchmark for tactile representation learning, which makes it hard to know if the field is actually making progress.
HT-Bench is a large-scale dataset, 10 million RGB frames and 7.8 million tactile frames across 226 tasks, paired with four evaluation tasks: fine-grained tactile similarity retrieval, masked tactile inpainting, vision-to-tactile synthesis, and multimodal tactile frame prediction. The focus is on egocentric vision paired with full-hand tactile data, which is a specific and I think underexplored combination.
They also introduce HandTouch, a vector-quantized vision-tactile encoder trained progressively through spatial, cross-modal, and temporal objectives. The benchmark numbers improve substantially over prior baselines: Recall@5 on tactile similarity retrieval goes from 74.65% to 85.23%, RMSE on masked inpainting drops from 0.022 to 0.010.
Tbh, the benchmark contribution might matter more long-term than the model. Without shared evaluation, every paper is measuring itself against its own baselines, and that makes it genuinely hard to track where the field is.
What this cluster of papers suggests
So here's what I take away from reading these four together:
The sim-to-real gap for tactile sensing is being attacked from multiple directions simultaneously, which is usually a sign that the community thinks it's solvable
There's no consensus yet on the right approach. Physics-accurate simulation, representation alignment, real-world distillation, and benchmark standardization are all live bets
The gains being reported are large enough that at least some of this is going to matter for real robots doing real manipulation tasks
Full-hand tactile sensing (not just fingertip sensors) is getting more attention, which makes sense if you think about how humans actually use their hands
We still don't have great shared evaluation infrastructure, which is a genuine problem for measuring field-wide progress
You might be wondering why this matters for humanoids specifically. The answer is that dexterous manipulation is one of the hardest things to get right on a humanoid platform, and it's one of the things that most limits what these robots can actually do in homes and workplaces. A robot that can't reliably pick up a wine glass or button a shirt isn't very useful, no matter how well it walks.
The companies building humanoids right now are, as far as I can tell, mostly relying on vision and proprioception for manipulation. Some are starting to integrate tactile sensors, but the learning infrastructure to use that data well has lagged behind. These papers are part of what needs to exist before tactile sensing becomes a standard part of the stack.
How long that takes is genuinely hard to say. This is based on a snapshot of four papers, and the gap between a promising arXiv result and a deployed capability on a production robot is, historically, wider than it looks. But the research momentum here feels real. I'll be watching this area closely.