Detecting Target Objects by Natural Language Instructions Using an RGB-D Camera

Bao, J; Jia, Y; Cheng, Y; Tang, H; Xi, N

File Download

content.pdf

Links for fulltext

(May Require Subscription)

Publisher Website: 10.3390/s16122117
Scopus: eid_2-s2.0-85006695062
WOS: WOS:000391303000136
Find via

Supplementary

Citations:
- Scopus: 0
- Web of Science: 0
Appears in Collections:
- Industrial & Manufacturing Systems Engineering: Journal/Magazine Articles

Article: Detecting Target Objects by Natural Language Instructions Using an RGB-D Camera

Title	Detecting Target Objects by Natural Language Instructions Using an RGB-D Camera
Authors	Bao, J Jia, Y Cheng, Y Tang, H Xi, N
Keywords	Natural language control Natural language processing Object grounding Object recognition Robotic manipulation system Target object detection
Issue Date	2016
Publisher	Molecular Diversity Preservation International. The Journal's web site is located at http://www.mdpi.net/sensors
Citation	Sensors, 2016, v. 16, p. 2117 How to Cite? DOI: http://dx.doi.org/10.3390/s16122117
Abstract	Controlling robots by natural language (NL) is increasingly attracting attention for its versatility, convenience and no need of extensive training for users. Grounding is a crucial challenge of this problem to enable robots to understand NL instructions from humans. This paper mainly explores the object grounding problem and concretely studies how to detect target objects by the NL instructions using an RGB-D camera in robotic manipulation applications. In particular, a simple yet robust vision algorithm is applied to segment objects of interest. With the metric information of all segmented objects, the object attributes and relations between objects are further extracted. The NL instructions that incorporate multiple cues for object specifications are parsed into domain-specific annotations. The annotations from NL and extracted information from the RGB-D camera are matched in a computational state estimation framework to search all possible object grounding states. The final grounding is accomplished by selecting the states which have the maximum probabilities. An RGB-D scene dataset associated with different groups of NL instructions based on different cognition levels of the robot are collected. Quantitative evaluations on the dataset illustrate the advantages of the proposed method. The experiments of NL controlled object manipulation and NL-based task programming using a mobile manipulator show its effectiveness and practicability in robotic applications. View Full-Text
Persistent Identifier	http://hdl.handle.net/10722/262334
ISSN	1424-8220 2021 Impact Factor: 3.847 2020 SCImago Journal Rankings: 0.636
ISI Accession Number ID	WOS:000391303000136

DC Field	Value	Language
dc.contributor.author	Bao, J	-
dc.contributor.author	Jia, Y	-
dc.contributor.author	Cheng, Y	-
dc.contributor.author	Tang, H	-
dc.contributor.author	Xi, N	-
dc.date.accessioned	2018-09-28T04:57:31Z	-
dc.date.available	2018-09-28T04:57:31Z	-
dc.date.issued	2016	-
dc.identifier.citation	Sensors, 2016, v. 16, p. 2117	-
dc.identifier.issn	1424-8220	-
dc.identifier.uri	http://hdl.handle.net/10722/262334	-
dc.description.abstract	Controlling robots by natural language (NL) is increasingly attracting attention for its versatility, convenience and no need of extensive training for users. Grounding is a crucial challenge of this problem to enable robots to understand NL instructions from humans. This paper mainly explores the object grounding problem and concretely studies how to detect target objects by the NL instructions using an RGB-D camera in robotic manipulation applications. In particular, a simple yet robust vision algorithm is applied to segment objects of interest. With the metric information of all segmented objects, the object attributes and relations between objects are further extracted. The NL instructions that incorporate multiple cues for object specifications are parsed into domain-specific annotations. The annotations from NL and extracted information from the RGB-D camera are matched in a computational state estimation framework to search all possible object grounding states. The final grounding is accomplished by selecting the states which have the maximum probabilities. An RGB-D scene dataset associated with different groups of NL instructions based on different cognition levels of the robot are collected. Quantitative evaluations on the dataset illustrate the advantages of the proposed method. The experiments of NL controlled object manipulation and NL-based task programming using a mobile manipulator show its effectiveness and practicability in robotic applications. View Full-Text	-
dc.language	eng	-
dc.publisher	Molecular Diversity Preservation International. The Journal's web site is located at http://www.mdpi.net/sensors	-
dc.relation.ispartof	Sensors	-
dc.rights	This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.	-
dc.subject	Natural language control	-
dc.subject	Natural language processing	-
dc.subject	Object grounding	-
dc.subject	Object recognition	-
dc.subject	Robotic manipulation system	-
dc.subject	Target object detection	-
dc.title	Detecting Target Objects by Natural Language Instructions Using an RGB-D Camera	-
dc.type	Article	-
dc.identifier.email	Xi, N: xining@hku.hk	-
dc.identifier.authority	Xi, N=rp02044	-
dc.description.nature	published_or_final_version	-
dc.identifier.doi	10.3390/s16122117	-
dc.identifier.scopus	eid_2-s2.0-85006695062	-
dc.identifier.hkuros	292804	-
dc.identifier.volume	16	-
dc.identifier.spage	2117	-
dc.identifier.epage	2117	-
dc.identifier.isi	WOS:000391303000136	-
dc.publisher.place	Switzerland	-
dc.identifier.issnl	1424-8220	-

File Download

Links for fulltext

(May Require Subscription)

Supplementary

Article: Detecting Target Objects by Natural Language Instructions Using an RGB-D Camera

Export via OAI-PMH Interface in XML Formats

OR

Export to Other Non-XML Formats