ECRAP

Abstract

The ability to understand a variety of verbal instructions and perform tasks is important for daily life support robots. People's speech to the robot may include greetings and demonstratives, and the robot needs to act appropriately in such cases. In the case of instruction including demonstratives such as "Bring me that cup," exophora resolution (ER) is needed to find the specific object corresponding to "that cup." Recently, action planning based on large language models (LLMs) has been necessary for robots to act and has been rapidly studied. However, in previous action planning research, it was difficult to identify the target object from language instructions including demonstratives, and to perform the planning desired by the user. This study aims to plan actions expected by the user from various user language queries, such as greetings or those containing demonstratives. We propose a method for action planning from a variety of language instructions by combining a task classification module, an ER framework, and LLMs. In the experiment, various queries were given, including queries containing demonstratives and queries that do not require the robot to move but only to respond, and the planning accuracy was compared to a baseline without an exophora resolution framework. The results show that the proposed method is approximately 1.3 times more accurate than the baseline.

Overview

An overview of our research. The robot classifies tasks based on the user's query and determines the requisite processing. If the query necessitates exophora resolution (ER), the robot executes ER using the query from the user and skeleton information to identify the target object. If the query does not require ER or following the identification of the target object, the robot proceeds with action planning.

Exophora resolution and Classifying user commands for Robot Action Planning (ECRAP)

An overview of ECRAP. The proposed method consists of four modules: (a) task classification, (b) response generation, (c) ER framework, and (d) robot action planning. First, the system performs task classification based on user queries. The robot only replies to queries for which it has no task to execute. If ER is required, action planning is conducted after identifying the target object using the ER framework. If ER is not necessary, action planning proceeds directly without it.

Success rate of action planning when the query is in english

We prepared different type's instructions. "Not a task" is Greetings or soliloquies (ex. Hello). "w/ demonstrative" is instruction with a demonstrative (Bring me that bottle). "w/o demonstrative" is instruction without a demonstrative (Bring a bottle). The proposed method achieved a high score than other methods.

Success rate of action planning when the query is in japanese

BibTeX

@inproceedings{oyama2024ecrap, author={Oyama, Akira and Hasegawa, Shoichi and Hagiwara, Yoshinobu and Taniguchi, Akira and Taniguchi, Tadahiro}, title={ECRAP: Exophora Resolution and Classifying User Commands for Robot Action Planning by Large Language Models}, booktitle={IEEE International Conference on Robotic Computing (IRC)}, year={2024, accepted} }

ECRAP: Exophora Resolution and Classifying User Commands for Robot Action Planning by Large Language Models

Abstract

Overview

Exophora resolution and Classifying user commands for Robot Action Planning (ECRAP)

Success rate of action planning when the query is in english

Success rate of action planning when the query is in japanese

BibTeX

Laboratory Information

Funding

ECRAP: Exophora Resolution and Classifying User Commands for Robot Action Planning by Large Language Models

Abstract

Overview

Exophora resolution and Classifying user commands for Robot Action Planning (ECRAP)

Success rate of action planning when the query is in english

Success rate of action planning when the query is in japanese

BibTeX

Related Research

Laboratory Information

Funding