Graph over-parameterization: Why the graph helps the training of deep graph convolutional network

Lin, Yucong; Li, Silu; Xu, Jiaxing; Xu, Jiawei; Huang, Dong; Zheng, Wendi; Cao, Yuan; Lu, Junwei

File Download

There are no files associated with this item.

Links for fulltext

(May Require Subscription)

Publisher Website: 10.1016/j.neucom.2023.02.054
Scopus: eid_2-s2.0-85149840724
WOS: WOS:000951824200001
Find via

Supplementary

Citations:
- Scopus: 0
- Web of Science: 0
Appears in Collections:
- Statistics & Actuarial Science: Journal/Magazine Articles

Article: Graph over-parameterization: Why the graph helps the training of deep graph convolutional network

Title	Graph over-parameterization: Why the graph helps the training of deep graph convolutional network
Authors	Lin, Yucong Li, Silu Xu, Jiaxing Xu, Jiawei Huang, Dong Zheng, Wendi Cao, Yuan Lu, Junwei
Keywords	Graph convolutional neural network Over-parameterization
Issue Date	10-Mar-2023
Publisher	Elsevier
Citation	Neurocomputing, 2023, v. 534, p. 77-85 How to Cite? DOI: http://dx.doi.org/10.1016/j.neucom.2023.02.054
Abstract	Recent studies show that gradient descent can train a deep neural network (DNN) to achieve small training and test errors when the DNN is sufficiently wide. This result applies to various over-parameterized neural network models including fully-connected neural networks and convolutional neural networks. However, existing theory does not apply to graph convolutional networks (GCNs), as GCNs is built according to the topological structures of the data. It has been empirically observed that GCNs can outperform vanilla neural networks when the underlying graph captures geometric information of the data. However, there is few theoretical justification of such observation. In this paper, we establish theoretical guarantees of the high-probability convergence of gradient descent for training over-parameterized GCNs. Specifically, we introduce a novel measurement of the relation between the graph and the data, called the “graph disparity coefficient”, and show that the convergence of GCN is faster when the graph disparity coefficient is smaller. Our analysis provides novel insights into how the graph convolution operation in a GCN helps training, and provides useful guidance for GCN training in practice.
Persistent Identifier	http://hdl.handle.net/10722/338369
ISSN	0925-2312 2023 Impact Factor: 5.5 2023 SCImago Journal Rankings: 1.815
ISI Accession Number ID	WOS:000951824200001

DC Field	Value	Language
dc.contributor.author	Lin, Yucong	-
dc.contributor.author	Li, Silu	-
dc.contributor.author	Xu, Jiaxing	-
dc.contributor.author	Xu, Jiawei	-
dc.contributor.author	Huang, Dong	-
dc.contributor.author	Zheng, Wendi	-
dc.contributor.author	Cao, Yuan	-
dc.contributor.author	Lu, Junwei	-
dc.date.accessioned	2024-03-11T10:28:21Z	-
dc.date.available	2024-03-11T10:28:21Z	-
dc.date.issued	2023-03-10	-
dc.identifier.citation	Neurocomputing, 2023, v. 534, p. 77-85	-
dc.identifier.issn	0925-2312	-
dc.identifier.uri	http://hdl.handle.net/10722/338369	-
dc.description.abstract	<p>Recent studies show that gradient descent can train a deep neural network (DNN) to achieve small training and test errors when the DNN is sufficiently wide. This result applies to various over-parameterized neural network models including fully-connected neural networks and convolutional neural networks. However, existing theory does not apply to graph convolutional networks (GCNs), as GCNs is built according to the topological structures of the data. It has been empirically observed that GCNs can outperform vanilla neural networks when the underlying graph captures geometric information of the data. However, there is few theoretical justification of such observation. In this paper, we establish theoretical guarantees of the high-probability convergence of gradient descent for training over-parameterized GCNs. Specifically, we introduce a novel measurement of the relation between the graph and the data, called the “graph disparity coefficient”, and show that the convergence of GCN is faster when the graph disparity coefficient is smaller. Our analysis provides novel insights into how the graph convolution operation in a GCN helps training, and provides useful guidance for GCN training in practice.<br></p>	-
dc.language	eng	-
dc.publisher	Elsevier	-
dc.relation.ispartof	Neurocomputing	-
dc.subject	Graph convolutional neural network	-
dc.subject	Over-parameterization	-
dc.title	Graph over-parameterization: Why the graph helps the training of deep graph convolutional network	-
dc.type	Article	-
dc.identifier.doi	10.1016/j.neucom.2023.02.054	-
dc.identifier.scopus	eid_2-s2.0-85149840724	-
dc.identifier.volume	534	-
dc.identifier.spage	77	-
dc.identifier.epage	85	-
dc.identifier.eissn	1872-8286	-
dc.identifier.isi	WOS:000951824200001	-
dc.identifier.issnl	0925-2312	-

File Download

Links for fulltext

(May Require Subscription)

Supplementary

Article: Graph over-parameterization: Why the graph helps the training of deep graph convolutional network

Export via OAI-PMH Interface in XML Formats

OR

Export to Other Non-XML Formats