ARRC: Advanced Reasoning Robot Control

Ammar Jaleel Mahmood, Eugene Vorobiov, Salim Rezvani, and Robin Chhabra

Toronto Metropolitan University

arXiv | Appendix | Video

We present ARRC (advanced reasoning robot control), a practical system that connects natural language instructions to safe, local robotic control by combining Retrieval-Augmented Generation (RAG) with RGB–D perception and guarded execution on an affordable robot arm. The system indexes curated robot knowledge (movement patterns, task templates, and safety heuristics) in a vector database, retrieves task-relevant context for each instruction, and conditions a large language model (LLM) to synthesize JSON-structured action plans. These plans are executed on a UFactory xArm 850 fitted with a Dynamixel-driven parallel gripper and an Intel RealSense D435 camera. Perception uses AprilTags detections fused with depth to produce object-centric metric poses; execution is enforced via a set of software safety gates (workspace bounds, speed/force caps, timeouts, and bounded retries). We describe the architecture, knowledge design, integration choices, and a reproducible evaluation protocol for tabletop scan/approach/pick–place tasks. Experimental results are reported to demonstrate efficacy of the proposed approach. Our design shows that RAG-based planning can substantially improve plan validity and adaptability while keeping perception and low-level control local to the robot.

ARRC Tasks

We evaluate ARRC in a tabletop manipulation setting with common items that are easy to identify. Scenarios include scanning with fallback arcs, approach and grasp with guarded Z retreat, and structured place on a pose target. The setup is designed to exercise retrieval prompts, plan synthesis, and safety gates.

Method Overview

ARRC has three components: 1) Dialog and retrieval: user goals and environment notes condition retrieval over a curated knowledge base, then an LLM drafts a sub task plan with task space waypoints. 2) Plan validation: environment checks confirm reachability and collisions, and invalid steps trigger a short in context repair. 3) Execution: a motion layer translates poses to xArm waypoints with guarded speeds and force limits.

Technical Summary Video

Example Episodes

Task: Scan and Detect

[Prompt] [Dialog]

Task: Approach and Pick

[Prompt] [Dialog]

Task: Place and Recover

[Prompt] [Dialog]

Real World Experiments

We validate ARRC on a real setup with a UFactory xArm 850 and an Intel RealSense D435. We show success on scan, approach, and pick place with safe fallbacks when detection is uncertain. The main failure mode is wrong detection that recovers after a rescAN.

Successful episode

Recovery after low confidence detection

ARRC Details

Hardware and software

UFactory xArm 850 with Dynamixel driven gripper
Intel RealSense D435 RGB D camera
Ubuntu 22.04, Python 3.10, optional ROS 2
Sentence Transformers embeddings with ChromaDB or FAISS

Execution and safety

Workspace limits and velocity caps at runtime
Guarded Z approach and retreat sequences
Timeouts with bounded retries and safe stop
Pose transforms from camera to robot base

BibTeX

@misc{arrc2025,
  title={ARRC: Advanced Reasoning Robot Control—Knowledge-Driven Autonomous Manipulation Using Retrieval-Augmented Generation},
  author={Ammar Jaleel Mahmood and Eugene Vorobiov and Salim Rezvani and Robin Chhabra},
  year={2025},
  eprint={2510.05547},
  archivePrefix={arXiv},
  primaryClass={cs.RO}
}

Contact

Please reach out to ammar.j.mahmood@torontomu.ca or robin.chhabra@torontomu.ca