DEFEATnet - A deep conventional image representation for image classification

Gao, Shenghua; Duan, Lixin; Tsang, Ivor W.

File Download

There are no files associated with this item.

Links for fulltext

(May Require Subscription)

Publisher Website: 10.1109/TCSVT.2015.2389413
Scopus: eid_2-s2.0-84964324864
Find via

Supplementary

Citations:
- Scopus: 0
Appears in Collections:
- Computer Science: Journal/Magazine Articles

Article: DEFEATnet - A deep conventional image representation for image classification

Title	DEFEATnet - A deep conventional image representation for image classification
Authors	Gao, Shenghua Duan, Lixin Tsang, Ivor W.
Keywords	Conventional Image Representation Deep Architecture Feature Encoding Local Max Pooling
Issue Date	2016
Citation	IEEE Transactions on Circuits and Systems for Video Technology, 2016, v. 26, n. 3, p. 494-505 How to Cite? DOI: http://dx.doi.org/10.1109/TCSVT.2015.2389413
Abstract	To study underlying possibilities for the successes of conventional image representation and deep neural networks (DNNs) in image representation, we propose a deep feature extraction, encoding, and pooling network (DEFEATnet) architecture, which is a marriage between conventional image representation approaches and DNNs. In particular, in DEFEATnet, each layer consists of three components: feature extraction, feature encoding, and pooling. The primary advantage of DEFEATnet is twofold. First, it consolidates the prior knowledge (e.g., translation invariance) from extracting, encoding, and pooling handcrafted features, as in the conventional feature representation approaches. Second, it represents the object parts at different granularities by gradually increasing the local receptive fields in different layers, as in DNNs. Moreover, DEFEATnet is a generalized framework that can readily incorporate all types of local features as well as all kinds of well-designed feature encoding and pooling methods. Since prior knowledge is preserved in DEFEATnet, it is especially useful for image representation on small/medium size data sets, where DNNs usually fail due to the lack of sufficient training data. Promising experimental results clearly show that DEFEATnets outperform shallow conventional image representation approaches by a large margin when the same type of features, feature encoding and pooling are used. The extensive experiments also demonstrate the effectiveness of the deep architecture of our DEFEATnet in improving the robustness for image presentation.
Persistent Identifier	http://hdl.handle.net/10722/345213
ISSN	1051-8215 2023 Impact Factor: 8.3 2023 SCImago Journal Rankings: 2.299

DC Field	Value	Language
dc.contributor.author	Gao, Shenghua	-
dc.contributor.author	Duan, Lixin	-
dc.contributor.author	Tsang, Ivor W.	-
dc.date.accessioned	2024-08-15T09:25:57Z	-
dc.date.available	2024-08-15T09:25:57Z	-
dc.date.issued	2016	-
dc.identifier.citation	IEEE Transactions on Circuits and Systems for Video Technology, 2016, v. 26, n. 3, p. 494-505	-
dc.identifier.issn	1051-8215	-
dc.identifier.uri	http://hdl.handle.net/10722/345213	-
dc.description.abstract	To study underlying possibilities for the successes of conventional image representation and deep neural networks (DNNs) in image representation, we propose a deep feature extraction, encoding, and pooling network (DEFEATnet) architecture, which is a marriage between conventional image representation approaches and DNNs. In particular, in DEFEATnet, each layer consists of three components: feature extraction, feature encoding, and pooling. The primary advantage of DEFEATnet is twofold. First, it consolidates the prior knowledge (e.g., translation invariance) from extracting, encoding, and pooling handcrafted features, as in the conventional feature representation approaches. Second, it represents the object parts at different granularities by gradually increasing the local receptive fields in different layers, as in DNNs. Moreover, DEFEATnet is a generalized framework that can readily incorporate all types of local features as well as all kinds of well-designed feature encoding and pooling methods. Since prior knowledge is preserved in DEFEATnet, it is especially useful for image representation on small/medium size data sets, where DNNs usually fail due to the lack of sufficient training data. Promising experimental results clearly show that DEFEATnets outperform shallow conventional image representation approaches by a large margin when the same type of features, feature encoding and pooling are used. The extensive experiments also demonstrate the effectiveness of the deep architecture of our DEFEATnet in improving the robustness for image presentation.	-
dc.language	eng	-
dc.relation.ispartof	IEEE Transactions on Circuits and Systems for Video Technology	-
dc.subject	Conventional Image Representation	-
dc.subject	Deep Architecture	-
dc.subject	Feature Encoding	-
dc.subject	Local Max Pooling	-
dc.title	DEFEATnet - A deep conventional image representation for image classification	-
dc.type	Article	-
dc.description.nature	link_to_subscribed_fulltext	-
dc.identifier.doi	10.1109/TCSVT.2015.2389413	-
dc.identifier.scopus	eid_2-s2.0-84964324864	-
dc.identifier.volume	26	-
dc.identifier.issue	3	-
dc.identifier.spage	494	-
dc.identifier.epage	505	-

File Download

Links for fulltext

(May Require Subscription)

Supplementary

Article: DEFEATnet - A deep conventional image representation for image classification

Export via OAI-PMH Interface in XML Formats

OR

Export to Other Non-XML Formats