Lane Detection Transformer Based on Multi-Frame Horizontal and Vertical Attention and Visual Transformer Module
ProposalContrast: Unsupervised Pre-training for LiDAR-Based 3D Object Detection
PreTraM: Self-Supervised Pre-training via Connecting Trajectory and Map
Master of All: Simultaneous Generalization of Urban-Scene Segmentation to All Adverse Weather Conditions
LESS: Label-Efficient Semantic Segmentation for LiDAR Point Clouds
Visual Cross-View Metric Localization with Dense Uncertainty Estimates
V2X-ViT: Vehicle-to-Everything Cooperative Perception with Vision Transformer
DevNet: Self-Supervised Monocular Depth Learning via Density Volume Construction
Action-Based Contrastive Learning for Trajectory Prediction
Radatron: Accurate Detection Using Multi-Resolution Cascaded MIMO Radar
LiDAR Distillation: Bridging the Beam-Induced Domain Gap for 3D Object Detection
Efficient Point Cloud Segmentation with Geometry-Aware Sparse Networks
FH-Net: A Fast Hierarchical Network for Scene Flow Estimation on Real-World Point Clouds
SpatialDETR: Robust Scalable Transformer-Based 3D Object Detection from Multi-View Camera Images with Global Cross-Sensor Attention
Pixel-Wise Energy-Biased Abstention Learning for Anomaly Segmentation on Complex Urban Driving Scenes
Rethinking Closed-Loop Training for Autonomous Driving
SLiDE: Self-Supervised LiDAR De-Snowing through Reconstruction Difficulty
Generative Meta-Adversarial Network for Unseen Object Navigation
Object Manipulation via Visual Target Localization
MoDA: Map Style Transfer for Self-Supervised Domain Adaptation of Embodied Agents
Housekeep: Tidying Virtual Households Using Commonsense Reasoning
Domain Randomization-Enhanced Depth Simulation and Restoration for Perceiving and Grasping Specular and Transparent Objects
Resolving Copycat Problems in Visual Imitation Learning via Residual Action Prediction
OPD: Single-View 3D Openable Part Detection
AirDet: Few-Shot Detection without Fine-Tuning for Autonomous Exploration
TransGrasp: Grasp Pose Estimation of a Category of Objects by Transferring Grasps from Only One Labeled Instance
StARformer: Transformer with State-Action-Reward Representations for Visual Reinforcement Learning
TIDEE: Tidying Up Novel Rooms Using Visuo-Semantic Commonsense Priors
Learning Efficient Multi-agent Cooperative Visual Exploration
Zero-Shot Category-Level Object Pose Estimation
Sim-to-Real 6D Object Pose Estimation via Iterative Self-Training for Robotic Bin Picking
Active Audio-Visual Separation of Dynamic Sound Sources
DexMV: Imitation Learning for Dexterous Manipulation from Human Videos
Sim-2-Sim Transfer for Vision-and-Language Navigation in Continuous Environments
Style-Agnostic Reinforcement Learning
Self-Supervised Interactive Object Segmentation through a Singulation-and-Grasping Approach
Learning from Unlabeled 3D Environments for Vision-and-Language Navigation
BodySLAM: Joint Camera Localisation, Mapping, and Human Motion Tracking
FusionVAE: A Deep Hierarchical Variational Autoencoder for RGB Image Fusion
Learning Algebraic Representation for Systematic Generalization in Abstract Reasoning
Video Dialog As Conversation about Objects Living in Space-Time.