The ability to understand a variety of verbal instructions and perform tasks is important for daily life support robots. People's speech to the robot may include greetings and demonstratives, and the robot needs to act appropriately in such cases. In the case of instruction including demonstratives such as "Bring me that cup," exophora resolution (ER) is needed to find the specific object corresponding to "that cup." Recently, action planning based on large language models (LLMs) has been necessary for robots to act and has been rapidly studied. However, in previous action planning research, it was difficult to identify the target object from language instructions including demonstratives, and to perform the planning desired by the user. This study aims to plan actions expected by the user from various user language queries, such as greetings or those containing demonstratives. We propose a method for action planning from a variety of language instructions by combining a task classification module, an ER framework, and LLMs. In the experiment, various queries were given, including queries containing demonstratives and queries that do not require the robot to move but only to respond, and the planning accuracy was compared to a baseline without an exophora resolution framework. The results show that the proposed method is approximately 1.3 times more accurate than the baseline.
We prepared different type's instructions. "Not a task" is Greetings or soliloquies (ex. Hello). "w/ demonstrative" is instruction with a demonstrative (Bring me that bottle). "w/o demonstrative" is instruction without a demonstrative (Bring a bottle). The proposed method achieved a high score than other methods.
We prepared different type's instructions. "Not a task" is Greetings or soliloquies (ex. Hello). "w/ demonstrative" is instruction with a demonstrative (Bring me that bottle). "w/o demonstrative" is instruction without a demonstrative (Bring a bottle). The proposed method achieved a high score than other methods.
@inproceedings{oyama2024ecrap,
author={Oyama, Akira and Hasegawa, Shoichi and Hagiwara, Yoshinobu and Taniguchi, Akira and Taniguchi, Tadahiro},
title={ECRAP: Exophora Resolution and Classifying User Commands for Robot Action Planning by Large Language Models},
booktitle={IEEE International Conference on Robotic Computing (IRC)},
year={2024, accepted}
}
This work was supported by JSPS KAKENHI Grants-in-Aid for Scientific Research (Grant Numbers JP23K16975, 22K12212), JST Moonshot Research \& Development Program (Grant Number JPMJMS2011).