File Download
There are no files associated with this item.
Supplementary
-
Citations:
- Appears in Collections:
Conference Paper: Two-level Graph Caching for Expediting Distributed GNN Training
Title | Two-level Graph Caching for Expediting Distributed GNN Training |
---|---|
Authors | |
Issue Date | 17-May-2023 |
Abstract | Graph Neural Networks (GNNs) are increasingly popular due to excellent performance on learning graph-structured data in various domains. With fast expanding graph sizes and feature dimensions, distributed GNN training has been adopted, with multiple concurrent workers learning on different portions of a large graph. It has been observed that a main bottleneck in distributed GNN training lies in graph feature fetching across servers, which dominates the training time of each training iteration at each worker. This paper studies efficient feature caching on each worker to minimize feature fetching overhead, in order to expedite distributed GNN training. Current distributed GNN training systems largely adopt static caching of fixed neighbor nodes. We propose a novel two-level dynamic cache design exploiting both GPU memory and host memory at each worker, and design efficient two-level dynamic caching algorithms based on online optimization and a lookahead batching mechanism. Our dynamic caching algorithms consider node requesting probabilities and heterogeneous feature fetching costs from different servers, achieving an O(log 3 k) competitive ratio in terms of overall feature-fetching communication cost (where k is the cache capacity). We evaluate practical performance of our caching design with testbed experiments, and show that our design achieves up to 5.4x convergence speed-up. |
Persistent Identifier | http://hdl.handle.net/10722/333890 |
DC Field | Value | Language |
---|---|---|
dc.contributor.author | Zhang, Zhe | - |
dc.contributor.author | Luo, Ziyue | - |
dc.contributor.author | Wu, Chuan | - |
dc.date.accessioned | 2023-10-06T08:39:56Z | - |
dc.date.available | 2023-10-06T08:39:56Z | - |
dc.date.issued | 2023-05-17 | - |
dc.identifier.uri | http://hdl.handle.net/10722/333890 | - |
dc.description.abstract | <p>Graph Neural Networks (GNNs) are increasingly popular due to excellent performance on learning graph-structured data in various domains. With fast expanding graph sizes and feature dimensions, distributed GNN training has been adopted, with multiple concurrent workers learning on different portions of a large graph. It has been observed that a main bottleneck in distributed GNN training lies in graph feature fetching across servers, which dominates the training time of each training iteration at each worker. This paper studies efficient feature caching on each worker to minimize feature fetching overhead, in order to expedite distributed GNN training. Current distributed GNN training systems largely adopt static caching of fixed neighbor nodes. We propose a novel two-level dynamic cache design exploiting both GPU memory and host memory at each worker, and design efficient two-level dynamic caching algorithms based on online optimization and a lookahead batching mechanism. Our dynamic caching algorithms consider node requesting probabilities and heterogeneous feature fetching costs from different servers, achieving an O(log <sup>3</sup> k) competitive ratio in terms of overall feature-fetching communication cost (where k is the cache capacity). We evaluate practical performance of our caching design with testbed experiments, and show that our design achieves up to 5.4x convergence speed-up.<br></p> | - |
dc.language | eng | - |
dc.relation.ispartof | IEEE International Conference on Computer Communications (INFOCOM) 2023 (17/05/2023-20/05/2023, New York) | - |
dc.title | Two-level Graph Caching for Expediting Distributed GNN Training | - |
dc.type | Conference_Paper | - |
dc.identifier.doi | 10.1109/INFOCOM53939.2023.10228911 | - |