Arnav Balaji

Arnav Balaji

Computer Science & Mathematics · UT Austin

I am an undergraduate studying Computer Science and Mathematics at The University of Texas at Austin.

My interests are in deep learning—especially modern generative models and representation learning across language and vision, reinforcement learning, and building systems that are robust, reliable, and aligned with real-world constraints.

I’ve explored these interests in the Robot Interactive Intelligence (RobIn) group at UT Austin, advised by Prof. Roberto Martín-Martín, applying learning methods to problems like learning from human videos, safer policy learning and evaluation, and human-robot interaction.

I also defended my undergraduate honors thesis on safety in robot learning, focused on how to measure and benchmark safety-relevant behavior in manipulation settings.

Research

OopsieVerse: A Safety Benchmark with Damage-Aware Simulation for Robot Manipulation
Arnav Balaji*, Arpit Bahety*, Sriniket Ambatipudi, Daniel Lam, Junhong Xu, Roberto Martín-Martín
Robotics: Science and Systems (RSS), 2026
Paper / Code / Website coming soon

TL;DR: A unified, damage-aware simulation framework for household manipulation that makes physical safety measurable, enabling safer data collection, imitation learning, reinforcement learning, and VLA evaluation.

MiCoBot
Albert Yu, Chengshu Li, Luca Macesanu, Arnav Balaji, Ruchira Ray, Raymond Mooney, Roberto Martín-Martín
International Conference on Robotics and Automation (ICRA), 2026

TL;DR: A system for human-robot collaboration that uses mixed-initiative natural-language dialog so both agents can propose, accept, or reject who completes each step of a task, improving task success and user experience in physical robot trials.

Arnav Balaji
UTCS Undergraduate Honors Thesis, 2025

TL;DR: A benchmark for physics-grounded damage detection in manipulation, providing a unified mechanism for quantifying the safety of robot actions. Policies trained with damage-based safety metrics learn safer strategies with substantially lower risk.

Robotics: Science and Systems (RSS), 2025
Website Paper Video

TL;DR: A framework that enables robots to safely and autonomously learn multi-step mobile manipulation tasks from a single human video by segmenting, translating, and adapting the demonstrated actions to their own morphology.

* indicates equal contribution