A Crowdsourcing Framework for Collecting Tabular Data

Shan, C; Mamoulis, N; Li, G; Cheng, R; HUANG, Z; ZHENG, Y

File Download

There are no files associated with this item.

Links for fulltext

(May Require Subscription)

Publisher Website: 10.1109/TKDE.2019.2914903
Scopus: eid_2-s2.0-85092536844
WOS: WOS:000576417000001
Find via

Supplementary

Citations:
- Scopus: 0
- Web of Science: 0
Appears in Collections:
- Computer Science: Journal/Magazine Articles

Article: A Crowdsourcing Framework for Collecting Tabular Data

Title	A Crowdsourcing Framework for Collecting Tabular Data
Authors	Shan, C Mamoulis, N Li, G Cheng, R HUANG, Z ZHENG, Y
Keywords	Task analysis Crowdsourcing Computers Cleaning Data models
Issue Date	2019
Publisher	Institute of Electrical and Electronics Engineers . The Journal's web site is located at http://ieeexplore.ieee.org/xpl/RecentIssue.jsp/?punumber=69
Citation	IEEE Transactions on Knowledge and Data Engineering, 2019, v. 32 n. 11, p. 2060-2074 How to Cite? DOI: http://dx.doi.org/10.1109/TKDE.2019.2914903
Abstract	In crowdsourcing, human workers are employed to tackle problems that are traditionally difficult for computers (e.g., data cleaning, missing value filling, and sentiment analysis). In this paper, we study the effective use of crowdsourcing in filling missing values in a given relation (e.g., a table containing different attributes of celebrity stars, such as nationality and age). A task given to a worker typically consists of questions about the missing attribute values (e.g., What is the age of Jet Li?). Although this problem has been studied before, existing work often treats related attributes independently, leading to suboptimal performance. In this paper, we present T-Crowd, which is a crowdsourcing system that considers attribute relationships. Particularly, T-Crowd integrates each worker's answers on different attributes to effectively learn his/her trustworthiness and the true data values. The attribute relationship information is used to guide task allocation to workers. Our solution seamlessly supports categorical and continuous attributes. Our extensive experiments on real and synthetic datasets show that T-Crowd outperforms state-of-the-art methods, improving the quality of truth inference and reducing the monetary cost of crowdsourcing.
Persistent Identifier	http://hdl.handle.net/10722/291248
ISSN	1041-4347 2023 Impact Factor: 8.9 2023 SCImago Journal Rankings: 2.867
ISI Accession Number ID	WOS:000576417000001

DC Field	Value	Language
dc.contributor.author	Shan, C	-
dc.contributor.author	Mamoulis, N	-
dc.contributor.author	Li, G	-
dc.contributor.author	Cheng, R	-
dc.contributor.author	HUANG, Z	-
dc.contributor.author	ZHENG, Y	-
dc.date.accessioned	2020-11-07T13:54:26Z	-
dc.date.available	2020-11-07T13:54:26Z	-
dc.date.issued	2019	-
dc.identifier.citation	IEEE Transactions on Knowledge and Data Engineering, 2019, v. 32 n. 11, p. 2060-2074	-
dc.identifier.issn	1041-4347	-
dc.identifier.uri	http://hdl.handle.net/10722/291248	-
dc.description.abstract	In crowdsourcing, human workers are employed to tackle problems that are traditionally difficult for computers (e.g., data cleaning, missing value filling, and sentiment analysis). In this paper, we study the effective use of crowdsourcing in filling missing values in a given relation (e.g., a table containing different attributes of celebrity stars, such as nationality and age). A task given to a worker typically consists of questions about the missing attribute values (e.g., What is the age of Jet Li?). Although this problem has been studied before, existing work often treats related attributes independently, leading to suboptimal performance. In this paper, we present T-Crowd, which is a crowdsourcing system that considers attribute relationships. Particularly, T-Crowd integrates each worker's answers on different attributes to effectively learn his/her trustworthiness and the true data values. The attribute relationship information is used to guide task allocation to workers. Our solution seamlessly supports categorical and continuous attributes. Our extensive experiments on real and synthetic datasets show that T-Crowd outperforms state-of-the-art methods, improving the quality of truth inference and reducing the monetary cost of crowdsourcing.	-
dc.language	eng	-
dc.publisher	Institute of Electrical and Electronics Engineers . The Journal's web site is located at http://ieeexplore.ieee.org/xpl/RecentIssue.jsp/?punumber=69	-
dc.relation.ispartof	IEEE Transactions on Knowledge and Data Engineering	-
dc.rights	IEEE Transactions on Knowledge and Data Engineering. Copyright © Institute of Electrical and Electronics Engineers .	-
dc.rights	©20xx IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.	-
dc.subject	Task analysis	-
dc.subject	Crowdsourcing	-
dc.subject	Computers	-
dc.subject	Cleaning	-
dc.subject	Data models	-
dc.title	A Crowdsourcing Framework for Collecting Tabular Data	-
dc.type	Article	-
dc.identifier.email	Shan, C: sxdtgg@hku.hk	-
dc.identifier.email	Mamoulis, N: nikos@cs.hku.hk	-
dc.identifier.email	Cheng, R: ckcheng@cs.hku.hk	-
dc.identifier.authority	Mamoulis, N=rp00155	-
dc.identifier.authority	Cheng, R=rp00074	-
dc.description.nature	link_to_subscribed_fulltext	-
dc.identifier.doi	10.1109/TKDE.2019.2914903	-
dc.identifier.scopus	eid_2-s2.0-85092536844	-
dc.identifier.hkuros	318666	-
dc.identifier.volume	32	-
dc.identifier.issue	11	-
dc.identifier.spage	2060	-
dc.identifier.epage	2074	-
dc.identifier.isi	WOS:000576417000001	-
dc.publisher.place	United States	-
dc.identifier.issnl	1041-4347	-

File Download

Links for fulltext

(May Require Subscription)

Supplementary

Article: A Crowdsourcing Framework for Collecting Tabular Data

Export via OAI-PMH Interface in XML Formats

OR

Export to Other Non-XML Formats