[Project page] [Paper] [Hardware Guide] [Deployment Guide]
Mengda Xu*,1,2,3, Han Zhang*,1, Yifan Hou1, Zhenjia Xu5, Linxi Fan5, Manuela Veloso3,4, Shuran Song1,2
1Stanford University, 2Columbia University, 3J.P. Morgan AI Research, 4Carnegie Mellon University, 5NVIDIA
*Indicates Equal Contribution
Tested on Ubuntu 22.04. We recommend Miniforge for faster installation:
cd DexUMI
mamba env create -f environment.yml
mamba activate dexumi
DexUMI utilizes SAM2 and ProPainter to track and remove the exoskeleton and hand. Our system uses Record3D to track the wrist pose. To make Record3D compatible with Python 3.10, please follow the instructions here. Alternatively, you can directly install our forked version, which already integrates the solution. Please clone the above three packages into the same directory as DexUMI. The final folder structure should be:
.
โโโ DexUMI
โโโ sam2
โโโ ProPainter
โโโ record3D
Download the SAM2 checkpoint sam2.1_hiera_large.pt
into sam2/checkpoints/
.
You also need to install Record3D on your iPhone. We use iPhone 15 Pro Max to track the wrist pose. You can use any iPhone model with ARKit capability, but you might need to modify some CAD models to adapt to other iPhone dimensions.
Please check our hardware guide to download the CAD model and assembly tutorial for both Inspire Hand and XHand exoskeletons.
![]() |
![]() |
Please check the data recording and processing tutorial before data collection.
Record data with the exoskeletons:
python DexUMI/real_script/data_collection/record_exoskeleton.py -et -ef --fps 45 --reference-dir /path/to/reference_folder --hand_type xhand/inspire --data-dir /path/to/data
If you do not have a force sensor installed, simply omit the -ef
flag.
The data will be stored in /path/to/data
. Each episode structure should be:
โโโ episode_0
โโโ camera_0
โโโ camera_0.mp4
โโโ camera_1
โโโ camera_1.mp4
โโโ numeric_0
โโโ numeric_1
โโโ numeric_2
โโโ numeric_3
After collecting the dataset, modify the following parameters in real_script/data_generation_pipeline/process.sh
:
DATA_DIR="path/to/data"
TARGET_DIR="path/to/data_replay"
REFERENCE_DIR="/path/to/reference_folder"
If you do not have a force sensor installed, remove the --enable-fsr
flag on line 19 from the command.
Then run:
./process.sh
The scripts will replay the exoskeleton hand actions on the dexterous hand and record the corresponding videos.
The replay data will be stored in path/to/data_replay
. Each episode structure should be:
โโโ dex_camera_0.mp4
โโโ exo_camera_0.mp4
โโโ fsr_values_interp_1
โโโ fsr_values_interp_2
โโโ fsr_values_interp_3
โโโ hand_motor_value
โโโ joint_angles_interp
โโโ pose_interp
โโโ valid_indices
After replay is complete, modify the config/render/render_all_dataset.yaml
to update:
data_buffer_path: path/to/data_replay
reference_dir: /path/to/reference_folder
Then start dataset generation, which converts exoskeleton data into robot hand data:
python DexUMI/real_script/data_generation_pipeline/render_all_dataset.py
We provide some sample data here such that you can test the data generation pipeline.
The generated data will be stored in path/to/data_replay
. Each episode structure should be:
โโโ combined.mp4
โโโ debug_combined.mp4
โโโ dex_camera_0.mp4
โโโ dex_finger_seg_mask
โโโ dex_img
โโโ dex_seg_mask
โโโ dex_thumb_seg_mask
โโโ exo_camera_0.mp4
โโโ exo_finger_seg_mask
โโโ exo_img
โโโ exo_seg_mask
โโโ exo_thumb_seg_mask
โโโ fsr_values_interp
โโโ fsr_values_interp_1
โโโ fsr_values_interp_2
โโโ fsr_values_interp_3
โโโ hand_motor_value
โโโ inpainted
โโโ joint_angles_interp
โโโ maskout_baseline.mp4
โโโ pose_interp
โโโ valid_indices
Finally, run the following command to generate the dataset for policy training:
python 6_generate_dataset.py -d path/to/data_replay -t path/to/final_dataset --force-process total --force-adjust
If you do not have a force sensor installed, you can drop the last two flags.
The final dataset will be stored in path/to/final_dataset
. Each episode structure should be:
โโโ camera_0
โโโ fsr
โโโ hand_action
โโโ pose
โโโ proprioception
Modify the following items in config/diffusion_policy/train_diffusion_policy.yaml
:
dataset:
data_dirs: [
"path/to/final_dataset",
]
enable_fsr: True/False
fsr_binary_cutoff: [10,10,10] # we use this value for XHand; Inspire Hand cutoff depends on installation
model:
global_cond_dim: 384+ number of force input
Then run:
accelerate launch DexUMI/real_script/train_diffusion_policy.py
Open the server:
python DexUMI/real_script/open_server.py --dexhand --ur5
Evaluate the policy:
python DexUMI/real_script/eval_policy/eval_xhand.py --model_path path/to/model --ckpt N # for xhand
# or
python DexUMI/real_script/eval_policy/eval_inspire.py --model_path path/to/model --ckpt N # for inspire hand
Modify the transformation matrix before conducting evaluation. Please check our tutorial for calibrating the matrix.
For hardware optimiation, please create a new virtual env to avoid package dependency conflicts:
cd DexUMI
mamba env create -f environment_design.yml
mamba activate dexumi_design
The goal of hardware optimization is to: 1) Find equivalent mechanical structures to replace the target robot hand design to improve wearability, and 2) Use motion capture data to discover the target robot hand mechanical structure (closed-loop kinematics) if such information is unavailable in URDF.
We use a motion capture system to record the fingertip trajectories of all five fingers on the Inspire Hand and store them in DexUMI/linkage_optimization/hardware_design_data/inspire_mocap
. You can visualize the trajectories by running:
python DexUMI/linkage_optimization/viz_multi_fingertips_trajectory.py
We first start with simulating four bar linkage with different link length and joint position and record the corrpsonding fingertips pose trajectory
python DexUMI/linkage_optimization/sweep_valid_linkage_design.py --type finger/thumb ----save_path path/to/store_sim
We solve an optimization problem to find the best linkage design that matches the target (mocap) fingertip trajectory:
# For index, middle, ring, and pinky fingers
python DexUMI/linkage_optimization/get_equivalent_finger.py -r path/to/store_sim -b path/to/mocap
# For thumb
python DexUMI/linkage_optimization/get_equivalent_thumb.py -r path/to/store_sim -b path/to/mocap
This will output the optimal linkage parameters that best approximate the desired fingertip motion. We provide our optimization results at DexUMI/linkage_optimization/hardware_design_data/inspire_optimization_results
. We recommend running all scripts on a CPU with multiple cores for faster speed. One future research direction could be to optimize exoskeleton designs more efficiently with generative models.
You can visualize the optimization results by running:
python DexUMI/linkage_optimization/viz_full_fk.py
This repository is released under the MIT license.
- Diffusion Policy is adapted from Diffusion Policy
- Many useful utilies are adapted from UMI
- Many hardware designs are adapted from DOGlove
- Thanks Huy Ha for helping us to setup our tutorial videos on Youtube.