Videos

Sparse-dense motion modelling and tracking for manipulation without prior object models

C. Rauch, R. Long, V. Ivan and S. Vijayakumar, “Sparse-dense motion modelling and tracking for manipulation without prior object models,” IEEE Robotics and Automation Letters (RA-L), 2022.

This work presents an approach for modelling and tracking previously unseen objects for robotic grasping tasks.

Using the motion of objects in a scene, our approach segments rigid entities from the scene and continuously tracks them to create a dense and sparse model of the object and the environment.

While the dense tracking enables interaction with these models, the sparse tracking makes this robust against fast movements and allows to redetect already modelled objects.

The evaluation on a dual-arm grasping task demonstrates that our approach 1) enables a robot to detect new objects online without a prior model and to grasp these objects using only a simple parameterisable geometric representation, and 2) is much more robust compared to the state of the art methods.

Panoptic multi-TSDFs: A flexible representation for online multi-resolution volumetric mapping and long-term dynamic scene consistency

L. M. Schmid, J. Delmerico, J. L. Schonberger, J. Nieto, M. Pollefeys, R. Siegwart and C. Cadena Lerma, “Panoptic multi-TSDFs: A flexible representation for online multi-resolution volumetric mapping and long-term dynamic scene consistency,” IEEE International Conference on Robotics and Automation (ICRA), 2022.

For robotic interaction in environments shared with other agents, access to volumetric and semantic maps of the scene is crucial.

However, such environments are inevitably subject to long-term changes, which the map needs to account for.

We thus propose panoptic multi-TSDFs as a novel representation for multi-resolution volumetric mapping in changing environments.

By leveraging high-level information for 3D reconstruction, our proposed system allocates high resolution only where needed.

Through reasoning on the object level, semantic consistency over time is achieved.

This enables our method to maintain up-to-date reconstructions with high accuracy while improving coverage by incorporating previous data.

We show in thorough experimental evaluation that our map can be efficiently constructed, maintained, and queried during online operation, and that the presented approach can operate robustly on real depth sensors using non-optimized panoptic segmentation as input.

Improving pedestrian prediction models with Self-supervised continual learning

L. Knoedler, C. Salmi, H. Zhu, B. Brito and J. Alonso-Mora, “Improving pedestrian prediction models with self-supervised continual learning”, IEEE Robotics and Automation Letters (RA-L), 2022.

Autonomous mobile robots require accurate human motion predictions to safely and efficiently navigate among pedestrians, whose behavior may adapt to environmental changes.

This letter introduces a self-supervised continual learning framework to improve data-driven pedestrian prediction models online across various scenarios continuously.

In particular, we exploit online streams of pedestrian data, commonly available from the robot’s detection and tracking pipeline, to refine the prediction model and its performance in unseen scenarios.

To avoid the forgetting of previously learned concepts, a problem known as catastrophic forgetting, our framework includes a regularization loss to penalize changes of model parameters that are important for previous scenarios and retrains on a set of previous examples to retain past knowledge.

Experimental results on real and simulation data show that our approach can improve prediction performance in unseen scenarios while retaining knowledge from seen scenarios when compared to naively training the prediction model online.

Metrics for 3D object pointing and manipulation in virtual reality

E. Triantafyllidis, W. Hu, C. McGreavy and Z. Li, “Metrics for 3D object pointing and manipulation in virtual reality,” IEEE Robotics & Automation Magazine (RAM), 2021.

Assessing the performance of human movements during teleoperation and virtual reality is a challenging problem, particularly in 3D space due to complex spatial settings.

Despite the presence of a multitude of metrics, a compelling standardized 3D metric is yet missing, aggravating inter-study comparability between different studies.

Hence, evaluating human performance in virtual environments is a long-standing research goal, and a performance metric that combines two or more metrics under one formulation remains largely unexplored, particularly in higher dimensions.

The absence of such a metric is primarily attributed to the discrepancies between pointing and manipulation, the complex spatial variables in 3D, and the combination of translational and rotational movements altogether.

In this work, four experiments were designed and conducted with progressively higher spatial complexity to study and compare existing metrics thoroughly.

The research goal was to quantify the difficulty of these 3D tasks and model human performance sufficiently in full 3D peripersonal space.

Consequently, a new model extension has been proposed and its applicability has been validated across all the experimental results, showing improved modelling and representation of human performance in combined movements of 3D object pointing and manipulation tasks than existing work.

Lastly, the implications on 3D interaction, teleoperation and object task design in virtual reality are discussed.

AcousticFusion: Fusing sound source localization to visual SLAM in dynamic environments

T. Zhang, H. Zhang, X. Li, J. Chen, T. L. Lam and S. Vijayakumar, “AcousticFusion: Fusing sound source localization to visual SLAM in dynamic environments,” IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2021.

Dynamic objects in the environment, such as people and other agents, lead to challenges for existing simultaneous localization and mapping (SLAM) approaches.

To deal with dynamic environments, computer vision researchers usually apply some learning-based object detectors to remove these dynamic objects.

However, these object detectors are computationally too expensive for mobile robot on-board processing.

In practical applications, these objects output noisy sounds that can be effectively detected by on-board sound source localization.

The directional information of the sound source object can be efficiently obtained by direction of sound arrival (DoA) estimation, but the depth estimation is difficult.

Therefore, in this paper, we propose a novel audio-visual fusion approach that fuses sound source direction into the RGB-D image and thus removes the effect of dynamic obstacles on the multi-robot SLAM system.

Experimental results of multi-robot SLAM in different dynamic environments show that the proposed method uses very small computational resources to obtain very stable self-localization results.

PoseFusion2: Simultaneous background reconstruction and human shape recovery in real-time

H. Zhang, T. Zhang, T. L. Lam and S. Vijayakumar, “PoseFusion2: Simultaneous background reconstruction and human shape recovery in real-time,” IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2021.

Dynamic environments that include unstructured moving objects pose a hard problem for Simultaneous Localization and Mapping (SLAM) performance.

The motion of rigid objects can be typically tracked by exploiting their texture and geometric features.

However, humans moving in the scene are often one of the most important, interactive targets – they are very hard to track and reconstruct robustly due to non-rigid shapes.

In this work, we present a fast, learning-based human object detector to isolate the dynamic human objects and realise a real-time dense background reconstruction framework.

We go further by estimating and reconstructing the human pose and shape.

The final output environment maps not only provide the dense static backgrounds but also contain the dynamic human meshes and their trajectories.

Our Dynamic SLAM system runs at around 26 frames per second (fps) on GPUs, while additionally turning on accurate human pose estimation can be executed at up to 10 fps.

Maneuver-based trajectory prediction for self-driving cars using spatio-temporal convolutional networks

B. Mersch, T. Höllen, K. Zhao, C. Stachniss, and R. Roscher, “Maneuver-based trajectory prediction for self-driving cars using spatio-temporal convolutional networks,” IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2021.

The ability to predict the future movements of other vehicles is a subconscious and effortless skill for humans and key to safe autonomous driving.

Therefore, trajectory prediction for autonomous cars has gained a lot of attention in recent years.

It is, however, still a hard task to achieve human-level performance.

Interdependencies between vehicle behaviors and the multimodal nature of future intentions in a dynamic and complex driving environment render trajectory prediction a challenging problem.

In this work, we propose a new, data-driven approach for predicting the motion of vehicles in a road environment.

The model allows for inferring future intentions from the past interaction among vehicles in highway driving scenarios.

Using our neighborhood-based data representation, the proposed system jointly exploits correlations in the spatial and temporal domain using convolutional neural networks.

Our system considers multiple possible maneuver intentions and their corresponding motion and predicts the trajectory for five seconds into the future.

We implemented our approach and evaluated it on two highway datasets taken in different countries
and are able to achieve a competitive prediction performance.

Moving object segmentation in 3D LiDAR data: A learning-based approach exploiting sequential data

X. Chen, S. Li, B. Mersch, L. Wiesmann, J. Gall, J. Behley, and C. Stachniss, “Moving object segmentation in 3D LiDAR data: A learning-based approach exploiting sequential data,” IEEE Robotics and Automation Letters (RA-L), 2021.

The ability to detect and segment moving objects in a scene is essential for building consistent maps, making future state predictions, avoiding collisions, and planning.

In this paper, we address the problem of moving object segmentation from 3D LiDAR scans.

We propose a novel approach that pushes the current state of the art in LiDAR-only moving object segmentation forward to provide relevant information for autonomous robots and other vehicles.

Instead of segmenting the point cloud semantically, i.e., predicting the semantic classes
such as vehicles, pedestrians, roads, etc., our approach accurately segments the scene into moving and static objects, i.e., also distinguishing between moving cars vs. parked cars.

Our proposed approach exploits sequential range images from a rotating 3D LiDAR sensor as an intermediate representation combined with a convolutional neural network and runs faster than the frame rate of the sensor.

We compare our approach to several other state-of-the-art methods showing superior segmentation quality in urban environments.

Additionally, we created a new benchmark for LiDAR-based moving object segmentation based on SemanticKITTI.

We published it to allow other researchers to compare their approaches transparently and we furthermore published our code.

Poisson surface reconstruction for LiDAR odometry and mapping

I. Vizzo, X. Chen, N. Chebrolu, J. Behley, and C. Stachniss, “Poisson surface reconstruction for LiDAR odometry and mapping,” IEEE International Conference on Robotics and Automation (ICRA), 2021.

Accurately localizing in and mapping an environment are essential building blocks of most autonomous systems.

In this paper, we present a novel approach for LiDAR odometry and mapping, focusing on improving the mapping quality and at the same time estimating the pose of the vehicle.

Our approach performs frame-to-mesh ICP, but in contrast to other SLAM approaches, we represent the map as a triangle mesh computed via Poisson surface reconstruction.

We perform the surface reconstruction in a sliding window fashion over a sequence of past scans.

In this way, we obtain accurate local maps that are well suited for registration and can also be combined into a global map.

This enables us to build a 3D map showing more geometric details than common mapping approaches relying on a truncated signed distance function or surfels.

Our experimental evaluation shows quantitatively and qualitatively that our maps offer higher geometric accuracies than these other map representations.

We also show that our maps are compact and can be used for LiDAR-based odometry estimation with a novel ray-casting-based data association.

Simple but effective redundant odometry for autonomous vehicles

A. Reinke, X. Chen, and C. Stachniss, “Simple but effective redundant odometry for autonomous vehicles,” IEEE International Conference on Robotics and Automation (ICRA), 2021.

Robust and reliable ego-motion is a key component of most autonomous mobile systems.

Many odometry estimation methods have been developed using different sensors such as cameras or LiDARs.

In this work, we present a resilient approach that exploits the redundancy of multiple odometry algorithms using a 3D LiDAR scanner and a monocular camera to provide reliable state estimation for autonomous vehicles.

Our system utilizes a stack of odometry algorithms that run in parallel.

It chooses from them the most promising pose estimation considering sanity checks using dynamic and kinematic constraints of the vehicle as well as a score computed between the current LiDAR scan and a locally built point cloud map.

In this way, our method can exploit the advantages of different existing ego-motion estimating approaches.

We evaluate our method on the KITTI Odometry dataset.

The experimental results suggest that our approach is resilient to failure cases and achieves an overall better performance than individual odometry methods employed by our system.

A Shared-control Teleoperation Architecture for Nonprehensile Object Transportation

M. Selvaggio, J. Cacace, C. Pacchierotti, F. Ruggiero and P. Robuffo Giordano, “A shared-control teleoperation architecture for nonprehensile object transportation”, IEEE Transactions on Robotics (TRO), 2021.

This article proposes a shared-control teleoperation architecture for robot manipulators transporting an object on a tray.

Differently from many existing studies about remotely operated robots with firm grasping capabilities, we consider the case in which, in principle, the object can break its contact with the robot end-effector.

The proposed shared-control approach automatically regulates the remote robot motion commanded by the user and the end-effector orientation to prevent the object from sliding over the tray.

Furthermore, the human operator is provided with haptic cues informing about the discrepancy between the commanded and executed robot motion, which assist the operator throughout the task execution.

We carried out trajectory tracking experiments employing an autonomous 7 degree-of-freedom (DoF) manipulator and compared the results obtained using the proposed approach with two different control schemes (i.e., constant tray orientation and no motion adjustment).

We also carried out a human-subjects study involving eighteen participants, in which a 3-DoF haptic device was used to teleoperate the robot linear motion and display haptic cues to the operator.

In all experiments, the results clearly show that our control approach outperforms the other solutions in terms of sliding prevention, robustness, commands tracking, and user’s preference.

TSDF++: A Multi-Object Formulation for Dynamic Object Tracking and Reconstruction

M. Grinvald, F. Tombari, R. Siegwart and J. Nieto, “TSDF++: A multi-object formulation for dynamic object tracking and reconstruction”, IEEE International Conference on Robotics and Automation (ICRA), 2021.

The ability to simultaneously track and reconstruct multiple objects moving in the scene is of the utmost importance for robotic tasks such as autonomous navigation and interaction.

Virtually all of the previous attempts to map multiple dynamic objects have evolved to store individual objects in separate reconstruction volumes and track the relative pose between them. While simple and intuitive, such formulation does not scale well with respect to the number of objects in the scene and introduces the need for an explicit occlusion handling strategy.

In contrast, we propose a map representation that allows maintaining a single volume for the entire scene and all the objects therein. To this end, we introduce a novel multi-object TSDF formulation that can encode multiple object surfaces at any given location in the map.

In a multiple dynamic object tracking and reconstruction scenario, our representation allows maintaining accurate reconstruction of surfaces even while they become temporarily occluded by other objects moving in their proximity.

We evaluate the proposed TSDF++ formulation on a public synthetic dataset and demonstrate its ability to preserve reconstructions of occluded surfaces when compared to the standard TSDF map representation.

Harmony at ERF2021

An introduction to Project Harmony presented at the European Robotics Forum 2021.

Where to go next: Learning a Subgoal Recommendation Policy for Navigation in Dynamic Environments

B. Brito, M. Everett, J. P. How and J. Alonso-Mora, “Where to go next: Learning a subgoal recommendation policy for navigation in dynamic environments”, IEEE Robotics and Automation Letters (RA-L), 2021.

Robotic navigation in environments shared with other robots or humans remains challenging because the intentions of the surrounding agents are not directly observable and the environment conditions are continuously changing.

Local trajectory optimization methods, such as model predictive control (MPC), can deal with those changes but require global guidance, which is not trivial to obtain in crowded scenarios.

This paper proposes to learn, via deep Reinforcement Learning (RL), an interaction-aware policy that provides long-term guidance to the local planner. In particular, in simulations with cooperative and non-cooperative agents, we train a deep network to recommend a subgoal for the MPC planner.

The recommended subgoal is expected to help the robot in making progress towards its goal and accounts for the expected interaction with other agents. Based on the recommended subgoal, the MPC planner then optimizes the inputs for the robot satisfying its kino-dynamic and collision avoidance constraints.

Our approach is shown to substantially improve the navigation performance in terms of number of collisions as compared to prior MPC frameworks, and in terms of both travel time and number of collisions compared to deep RL methods in cooperative, competitive and mixed multiagent scenarios.