This robot learning role is with a seriously exciting scale up. The platform is mature, the data is flowing, and the team is ready to scale its most promising research directions into production-grade manipulation policies.
Scroll down to find an indepth overview of this job, and what is expected of candidates Make an application by clicking on the Apply button.
They need someone to lead the development and deployment of large behaviour models, taking diffusion transformers, VLAs, and language-conditioned policies from the literature onto a real bi-manual humanoid.
This is not a research-only role. You'll inherit a mature policy training codebase, a VR teleoperation pipeline producing high-frequency multi-modal data, and a Gymnasium environment wrapping a real robot. The work you ship runs on hardware.
The Role
You will architect, train, and deploy end-to-end large behaviour models for bi-manual and mobile manipulation, and lead the maturing of the early-stage RL pipeline.
The key responsibilities
- Architect, train, and evaluate end-to-end large behaviour models for bi-manual and mobile manipulation
- Advance diffusion transformer policies, mature VLA integration, and develop language conditioning for true multi-task generalisation
- Apply RL to refine pre-trained policies: RL token fine-tuning, residual RL, off-policy RL with reference-action regularisation, RL-based fine-tuning of diffusion policies
- Build a systematic sim-to-real transfer pipeline, connecting existing simulation infrastructure to training
- Deploy and iterate learned policies on physical robot hardware
- Mentor junior researchers and engineers, and publish at top-tier venues
What We're Looking For
Essential:
- PhD/MSc in ML, Robotics, CS, or related field with 4+ years of equivalent industry research experience
- Demonstrated expertise training and deploying learned manipulation policies on real robots
- Strong background in at least two of: behaviour cloning, diffusion policies, VLA/VLM architectures, RL for manipulation
- PyTorch and large-scale (multi-GPU, distributed) training
- Track record of publications at top-tier venues (CoRL, RSS, ICRA, NeurIPS, ICML, ICLR), or equivalent demonstrated research impact through deployed systems, patents, or significant open-source contributions
- Strong Python; production-quality research code with proper testing, type hints, and documentation
Useful:
- Hands-on experience with humanoid or bi-manual manipulation platforms
- Diffusion transformer, ACT, or VLA architectures specifically
- Pre-trained vision/language models for robot control (CLIP, DINOv2, PaliGemma)
- MuJoCo, Isaac Sim, or ManiSkill for sim-to-real policy training
- RL fine-tuning of pre-trained policies (residual RL, DPPO, or similar)
- 3D perception for policy conditioning (point clouds, keypoints, NeRFs)
Key contribution areas
Policy Architecture & Training
- End-to-end large behaviour models for bi-manual and mobile manipulation
- Scale and evolve diffusion transformer policies, VLA integration, and language conditioning
- Extend the imitation learning pipeline to leverage growing teleoperation datasets
- Apply RL to push beyond what imitation alone can reach
- Target sub-millimetre precision and contact-rich manipulation
Generalisation & Scaling
- Develop policies that generalise across tasks, object categories, and environments
- Move from single-task to multi-task and task-conditioned architectures
- Design hierarchical behaviour systems for long-horizon manipulation
- Investigate data-efficient learning: few-shot adaptation, transfer learning, multi-dataset training xwzovoh
- Drive systematic ablations across architectures
Sim-to-Real & Deployment
- Build the sim-to-real transfer pipeline: domain randomisation, rendering augmentation, sim-to-real benchmarking
- Deploy and iterate learned policies on physical robot hardware
- Extend the Gymnasium environment wrapper and integrate with the robot's control stack
- Leverage perception team outputs (keypoints, learned features, 3D point clouds) for policy conditioning
Research Leadership
- Track the literature and bring relevant advances back to the team
- Identify and propose new research directions aligned with the manipulation roadmap
- Mentor junior researchers and engineers
- Publish at top-tier venues — conference attendance and open-source contributions are actively supported
What's On Offer
- Join a team with world class applied research scientists, ML engineers, and robotics software engineers
- A mature platform that ships to physical hardware, not slides
- Active support for conference attendance and open-source contributions
- Competitive compensation
Apply or send your CV to —