ECRAP: Exophora Resolution and Classifying User Commands for Robot Action Planning by Large Language Models

1Ritsumeikan University, 2Soka University, 3Kyoto University
Accepted at IEEE IRC 2024.

*Corresponding Author

Abstract

The ability to understand a variety of verbal instructions and perform tasks is important for daily life support robots. People's speech to the robot may include greetings and demonstratives, and the robot needs to act appropriately in such cases. In the case of instruction including demonstratives such as "Bring me that cup," exophora resolution (ER) is needed to find the specific object corresponding to "that cup." Recently, action planning based on large language models (LLMs) has been necessary for robots to act and has been rapidly studied. However, in previous action planning research, it was difficult to identify the target object from language instructions including demonstratives, and to perform the planning desired by the user. This study aims to plan actions expected by the user from various user language queries, such as greetings or those containing demonstratives. We propose a method for action planning from a variety of language instructions by combining a task classification module, an ER framework, and LLMs. In the experiment, various queries were given, including queries containing demonstratives and queries that do not require the robot to move but only to respond, and the planning accuracy was compared to a baseline without an exophora resolution framework. The results show that the proposed method is approximately 1.3 times more accurate than the baseline.

Overview

image of model

An overview of our research. The robot classifies tasks based on the user's query and determines the requisite processing. If the query necessitates exophora resolution (ER), the robot executes ER using the query from the user and skeleton information to identify the target object. If the query does not require ER or following the identification of the target object, the robot proceeds with action planning.

Exophora resolution and Classifying user commands for Robot Action Planning (ECRAP)

image of model

An overview of ECRAP. The proposed method consists of four modules: (a) task classification, (b) response generation, (c) ER framework, and (d) robot action planning. First, the system performs task classification based on user queries. The robot only replies to queries for which it has no task to execute. If ER is required, action planning is conducted after identifying the target object using the ER framework. If ER is not necessary, action planning proceeds directly without it.

Success rate of action planning when the query is in english

We prepared different type's instructions. "Not a task" is Greetings or soliloquies (ex. Hello). "w/ demonstrative" is instruction with a demonstrative (Bring me that bottle). "w/o demonstrative" is instruction without a demonstrative (Bring a bottle). The proposed method achieved a high score than other methods.

image of representations

Success rate of action planning when the query is in japanese

We prepared different type's instructions. "Not a task" is Greetings or soliloquies (ex. Hello). "w/ demonstrative" is instruction with a demonstrative (Bring me that bottle). "w/o demonstrative" is instruction without a demonstrative (Bring a bottle). The proposed method achieved a high score than other methods.

image of representations

BibTeX


      @inproceedings{oyama2024ecrap,
        author={Oyama, Akira  and Hasegawa, Shoichi  and Hagiwara, Yoshinobu  and Taniguchi, Akira  and Taniguchi, Tadahiro},
        title={ECRAP: Exophora Resolution and Classifying User Commands for Robot Action Planning by Large Language Models},
        booktitle={IEEE International Conference on Robotic Computing (IRC)},
        year={2024, accepted}
      }
    

Laboratory Information

Funding

This work was supported by JSPS KAKENHI Grants-in-Aid for Scientific Research (Grant Numbers JP23K16975, 22K12212), JST Moonshot Research \& Development Program (Grant Number JPMJMS2011).