Model Card for TRANSIC Policies
This modelcard is accompanied with the CoRL 2024 paper titled TRANSIC: Sim-to-Real Policy Transfer by Learning from Online Correction. It includes robot policies trained in the simulation and transferred to the real world for complex and contact-rich manipulation tasks.
Model Details
Model Description
This model repository includes three parts, 1) teacher policies trained in simulation with reinforcement learning; 2) student policies distilled from successful trajectories generated by teacher policies; and 3) residual policies learned in the real world to augment simulation policies.
The first part can be found in the rl
directory. We provide RL teacher policies for 8 different tasks. The second part can be found in the student
directory. We provide 5 student policies corresponding to 5 skills used in assembling the square table from FurnitureBench. The third part can be found in the residual
directory. They augment those 5 simulation base policies (student policies).
- Developed by: Yunfan Jiang
- Model type: PyTorch Checkpoints
- License: MIT
Model Sources
- Repositories: TRANSIC, TRANSIC-Envs
- Paper: TRANSIC: Sim-to-Real Policy Transfer by Learning from Online Correction
- Demo: Provided on Our Website
Uses & How to Get Started with the Model
Please see our codebase for detailed usage.
Training Details
Training Data
We provide training data in our 🤗Hugging Face data repository.
Training Procedure
- Teacher policies are first trained with reinforcement learning from scratch in simulation;
- We then rollout teacher policies to generate successful trajectories;
- These generated data are used to train student policies through behavior cloning. Student policies take point-cloud and proprioceptive observations and output joint actions.
- We then deploy student policies on the real robot. A human operator monitors the execution, intervenes when necessary, and provides online correction through teleoperation. Such teleoperation data are collected.
- We use collected correction data to learn residual policies, which then augment simulation policies for successful sim-to-real transfer.
Evaluation
Policies are evaluated in simulation and the real world. We use task success rate as the metric.
Citation
BibTeX:
@inproceedings{jiang2024transic,
title = {TRANSIC: Sim-to-Real Policy Transfer by Learning from Online Correction},
author = {Yunfan Jiang and Chen Wang and Ruohan Zhang and Jiajun Wu and Li Fei-Fei},
booktitle = {Conference on Robot Learning},
year = {2024}
}
Model Card Contact
Yunfan Jiang, email: yunfanj[at]cs[dot]stanford[dot]edu