Crédito da imagem: Lottie animation by Centre Robotics (LottieFiles Free, used with credit). · source
A robot arm in a cluttered warehouse needs to know three things: where am I, what's around me, and what can I collide with. For years, the answer to all three has been some variant of point clouds, voxel grids, or increasingly, those trendy 3D Gaussian splats that took the computer vision world by storm. But two new papers suggest the field might be circling back to something older and, frankly, more practical.
The first, from researchers presenting Triangle Splatting SLAM, builds dense maps using plain old triangles. The second, from a team including contributors from the National University of Singapore, constructs hierarchical object representations that culminate in superquadrics, those mathematical primitives I haven't thought about since my days building collision systems at Fanuc. Both papers share a common thesis: the geometry that robots use internally should match the geometry that downstream systems actually expect.
Let's start with Triangle Splatting SLAM. The system performs dense RGB-D SLAM (simultaneous localization and mapping) using differentiable triangles as the map primitive. On the Replica and TUM-RGBD benchmarks, the authors claim it outperforms baselines on 3D geometry metrics while matching camera-tracking accuracy.
That's an ambitious claim, though the paper doesn't provide specific percentage improvements in the abstract. What's more interesting to me is the architectural choice. While 3D Gaussian Splatting has become the default for novel-view synthesis tasks, triangles remain the standard primitive for:
Traditional rendering hardware (GPUs are literally optimized for triangles)
Cobertura relacionada
More in Autonomy
Two new papers tackle the same old problem I've been griping about since my Kuka days: you can have accurate robot control or fast robot control, but getting both is still a pain.
Robert "Bob" Macintosh · 1 hour ago · 3 min
A flurry of new research papers claim big improvements in robot navigation. Some of it's genuinely clever, some of it's solving problems we created for ourselves.
Robert "Bob" Macintosh · 1 hour ago · 4 min
Two new papers show autonomous vehicle planners getting serious about safety constraints, and honestly it's about time.
Mark Kowalski · 1 hour ago · 4 min
Three new papers tackle the same problem from wildly different angles. The common thread? Making robots actually understand what they're looking at.
Game engines (Unity, Unreal, everything)
Simulation environments
Collision detection systems
Any downstream task requiring explicit geometry
The system maintains what the authors call a "triangle soup," an unstructured collection of triangles that can be optimized via differentiable rendering. On-the-fly, this soup gets converted into a connected mesh through restricted Delaunay triangulation. This enables capabilities that Gaussian splats simply can't provide: mesh deformation and collision checking during operation, not as a post-processing step.
From my time building hardware control systems, I can tell you that the gap between "pretty visualization" and "usable for planning" is where most perception systems fall apart. A Gaussian splat might render beautifully, but try running a collision query against it in real-time. You can't, not directly. You end up extracting a mesh anyway, which defeats the purpose.
The second paper, Hierarchical Object Representation, takes a different but complementary approach. The team introduces a four-layer representation that progressively abstracts from raw sensor data to dense 3D meshes to superquadrics.
Superquadrics are parametric surfaces that can represent a surprisingly wide range of shapes, boxes, cylinders, ellipsoids, and various rounded intermediates, with just a handful of parameters. They're analytically defined, which means collision checking becomes a mathematical operation rather than a geometric one. That matters when you're running a motion planner at hundreds of hertz.
The pipeline processes RGB-D image streams and was validated on several datasets:
HOPE dataset
ReplicaCAD
Kimera-Multi
NUS Campus Dataset (collected using a Unitree B2 robot)
The authors claim their superquadric-based map alignment method outperforms ROMAN, described as the current state-of-the-art for object-based map alignment. The code is available at their GitHub repository, which at least suggests confidence in reproducibility.
Look, the robotics research community has a habit of optimizing for benchmark metrics that don't translate to deployment. I've seen enough spec sheets to know that "state-of-the-art on Dataset X" often means "unusable in a real warehouse."
What makes both of these papers interesting isn't raw performance. It's the explicit focus on downstream usability.
Triangle Splatting SLAM addresses a real pain point: the disconnect between perception and planning pipelines. Most modern SLAM systems produce representations that require conversion before a robot can actually use them for navigation or manipulation. That conversion step introduces latency, errors, and architectural complexity. By building the map in triangles from the start, you skip the translation layer entirely.
The hierarchical object representation paper tackles a different problem: scalability and efficiency in cluttered environments. 3D Scene Graphs have become popular for long-term autonomy because they encode metric, semantic, and topological information together. But the geometric representation of objects within these graphs has been, as the authors put it, "overlooked." Most methods use partial point clouds or 3D bounding boxes, both of which are either too detailed for efficient planning or too coarse for accurate collision avoidance.
The four-layer hierarchy offers a practical compromise:
A robot can query the appropriate level depending on the task. Rough path planning? Use superquadrics. Fine manipulation near a specific object? Drop down to the dense mesh. This kind of multi-resolution approach is common in graphics but underutilized in robotics perception.
I should address the elephant in the room. 3D Gaussian Splatting has dominated recent perception research. It produces stunning reconstructions, trains quickly, and renders in real-time. So why would anyone go back to triangles?
The answer is basically, integration. Gaussian splats are great for visualization and novel-view synthesis. They're less great for everything else.
Consider what a robot actually needs to do with a 3D map:
Collision checking: Requires explicit geometry. Gaussian splats need conversion.
Physics simulation: Requires meshes or primitives. Gaussian splats need conversion.
Grasp planning: Requires surface normals and contact geometry. Gaussian splats need conversion.
Game engine integration: Requires triangles. Gaussian splats need conversion.
See the pattern? Every downstream task requires converting the Gaussian representation into something else. Triangle Splatting SLAM argues: why not just build the thing you actually need?
That said, it's too early to say whether triangle-based approaches will match Gaussian splats on raw reconstruction quality. The benchmarks cited are encouraging but limited. Replica and TUM-RGBD are useful but not exhaustive. Real-world deployment, with its lighting variations, sensor noise, and dynamic objects, remains the real test.
Both papers represent a broader trend I've been noticing: a pragmatic turn in robotics perception research. After years of chasing benchmark numbers with increasingly complex methods, some researchers are asking simpler questions. What representation does the robot actually need? What can existing tools consume without modification?
This doesn't mean Gaussian splats are going away. For applications where visualization is the end goal, or where you're not running real-time planning, they remain excellent. But for robots that need to interact with the physical world, not just render it, geometry-native approaches have obvious advantages.
The NUS team's decision to validate on a Unitree B2 in real outdoor environments is particularly notable. It's one thing to run experiments on curated indoor datasets. It's another to deploy on a quadruped navigating a campus. We don't know yet how robust either system is to the kinds of edge cases that matter in production, sensor failures, dynamic obstacles, long-term drift, but the direction is promising.
I'll be watching for follow-up work that addresses some open questions. For Triangle Splatting SLAM: how does performance scale with scene size? What's the memory footprint compared to Gaussian approaches? For the hierarchical representation: how accurate is the superquadric fitting for unusual object geometries? What's the computational cost of maintaining all four representation layers?
The code for the hierarchical approach is already available. The Triangle Splatting paper doesn't mention a public release in the abstract, which is unfortunate but not unusual for work at this stage.
For teams building production robotics systems, these papers are worth reading carefully. Not because they solve everything, they don't, but because they're asking the right questions about what perception systems should actually produce. Sometimes the most useful research isn't the most novel. Sometimes it's the work that remembers why we're building these systems in the first place.