Spatial Concepts-Based Prompts With Large Language Models for Robot Action Planning

Hasegawa, Shoichi

Spatial Concepts-Based Prompts
With Large Language Models
for Robot Action Planning

Shoichi Hasegawa^1,*, Yoshinobu Hagiwara^2,1, Akira Taniguchi¹,
Lotfi El Hafi¹, and Tadahiro Taniguchi^3,1

¹Ritsumeikan Univ. ²Soka Univ. ³Kyoto Univ.
IEEE Access
^*Corresponding Author

Abstract

Daily life instructions often include contextual cues related not to the object itself, but to its surrounding environment (e.g., objects placed around the target). Therefore, it is essential for service robots to interpret such surrounding information and translate it into action plans. Recently, there has been growing interest in using large language models (LLMs) to generate actions from instructions. Among these approaches, some studies explored leveraging semantic maps to better handle environmental context. However, previous studies face two limitations: 1) semantic maps are limited to object-level data, such as object class and position, and 2) since LLMs refer to semantic maps based on object classes, they cannot utilize the surrounding context, leading to misidentifications. To address them, we propose an action planning method that integrates the spatial concept model with LLMs. The spatial concept model categorizes observations and learns the parameters of probabilistic distributions. It then infers place names and object arrangements using Bayesian inference. By leveraging inference results as prompts, our method enables context-aware action planning. In the simulation, we designed object-search tasks using an open vocabulary within a household scenario in Gazebo. The robot received two types of instructions: 1) instructions that specify the target and its surrounding items, and 2) instructions that specify the target and its location. Our method achieved a success rate of over 0.8, outperforming baselines. On the other hand, real-world experiments revealed challenges in providing feedback when manipulation or detection failed.

Research Overview.

Proposed Model: SpCoRAP.

Video Demonstration (Full)

BibTeX

@article{hasegawa2025spcorap,
  title={{Spatial Concepts-Based Prompts With Large Language Models for Robot Action Planning}},
  author={Hasegawa, Shoichi and Hagiwara, Yoshinobu and Taniguchi, Akira and El Hafi, Lotfi and Taniguchi, Tadahiro},
  journal={{IEEE Access}},
  volume={13},
  pages={216937--216955},
  year={2025}
}

More Works from Our Lab

Take That for Me: Multimodal Exophora Resolution with Interactive Questioning for Ambiguous Out-of-View Instructions

Toward Ownership Understanding of Objects: Active Question Generation with Large Language Model and Probabilistic Generative Model

Multi-Robot Task Planning for Multi-Object Retrieval Tasks with Distributed On-Site Knowledge via Large Language Models