File Download
There are no files associated with this item.
Links for fulltext
(May Require Subscription)
- Publisher Website: 10.1109/TWC.2025.3544333
- Scopus: eid_2-s2.0-85219583356
- Find via

Supplementary
-
Citations:
- Scopus: 0
- Appears in Collections:
Article: Resource Management for Low-latency Cooperative Fine-tuning of Foundation Models at the Network Edge
| Title | Resource Management for Low-latency Cooperative Fine-tuning of Foundation Models at the Network Edge |
|---|---|
| Authors | |
| Issue Date | 1-Jan-2025 |
| Publisher | Institute of Electrical and Electronics Engineers |
| Citation | IEEE Transactions on Wireless Communications, 2025, v. 24, n. 6, p. 4839-4852 How to Cite? |
| Abstract | The emergence of large-scale foundation models (FoMo’s) that can perform human-like intelligence motivates their deployment at the network edge for devices to access state-of-the-art artificial intelligence (AI). For better user experiences, the pre-trained FoMo’s need to be adapted to specialized downstream tasks through fine-tuning techniques. To transcend a single device’s memory and computation limitations, we advocate multi-device cooperation within the device-edge cooperative fine-tuning (DEFT) paradigm, where edge devices cooperate to simultaneously optimize different parts of fine-tuning parameters within a FoMo. The edge server is responsible for coordination and gradient aggregation. However, the parameter blocks reside at different depths within a FoMo architecture, leading to varied computation latency-and-memory cost due to gradient backpropagation-based calculations. The heterogeneous on-device computation and memory capacities and channel conditions necessitate an integrated communication-and-computation (C2) allocation of local computation loads and uplink communication resources to achieve low-latency (LoLa) DEFT. To this end, we consider the depth-ware DEFT block allocation problem. The involved optimal block-device matching is tackled by the proposed low-complexity Cutting-RecoUNting-CHecking (CRUNCH) algorithm, which is designed by exploiting the monotone-increasing property between block depth and computation latency-and-memory cost. Next, the joint bandwidthand- block allocation (JBBA) makes the problem more sophisticated, i.e., mathematically NP-hard. We observe a splittable Lagrangian expression through the transformation and analysis of the original problem, where the variables indicating device involvement are introduced to decouple the block and bandwidth allocation. Then, the dual ascent method is employed to tackle the JBBA problem iteratively. Within each iteration, block allocation and bandwidth allocation are optimized concurrently. The optimal block allocation sub-problem is solved efficiently by applying the Hungarian method facilitated by the proposed CRUNCH algorithm. On the other hand, the bandwidth allocation sub-problem is solved in closed form, shedding light on favorable allocations to resource-limited devices. Through extensive experiments conducted on the GLUE benchmark, our results demonstrate significant latency reduction achievable by LoLa DEFT for fine-tuning a RoBERTa model. |
| Persistent Identifier | http://hdl.handle.net/10722/362004 |
| ISSN | 2023 Impact Factor: 8.9 2023 SCImago Journal Rankings: 5.371 |
| DC Field | Value | Language |
|---|---|---|
| dc.contributor.author | Wu, Hai | - |
| dc.contributor.author | Chen, Xu | - |
| dc.contributor.author | Huang, Kaibin | - |
| dc.date.accessioned | 2025-09-18T00:36:13Z | - |
| dc.date.available | 2025-09-18T00:36:13Z | - |
| dc.date.issued | 2025-01-01 | - |
| dc.identifier.citation | IEEE Transactions on Wireless Communications, 2025, v. 24, n. 6, p. 4839-4852 | - |
| dc.identifier.issn | 1536-1276 | - |
| dc.identifier.uri | http://hdl.handle.net/10722/362004 | - |
| dc.description.abstract | The emergence of large-scale foundation models (FoMo’s) that can perform human-like intelligence motivates their deployment at the network edge for devices to access state-of-the-art artificial intelligence (AI). For better user experiences, the pre-trained FoMo’s need to be adapted to specialized downstream tasks through fine-tuning techniques. To transcend a single device’s memory and computation limitations, we advocate multi-device cooperation within the device-edge cooperative fine-tuning (DEFT) paradigm, where edge devices cooperate to simultaneously optimize different parts of fine-tuning parameters within a FoMo. The edge server is responsible for coordination and gradient aggregation. However, the parameter blocks reside at different depths within a FoMo architecture, leading to varied computation latency-and-memory cost due to gradient backpropagation-based calculations. The heterogeneous on-device computation and memory capacities and channel conditions necessitate an integrated communication-and-computation (C2) allocation of local computation loads and uplink communication resources to achieve low-latency (LoLa) DEFT. To this end, we consider the depth-ware DEFT block allocation problem. The involved optimal block-device matching is tackled by the proposed low-complexity Cutting-RecoUNting-CHecking (CRUNCH) algorithm, which is designed by exploiting the monotone-increasing property between block depth and computation latency-and-memory cost. Next, the joint bandwidthand- block allocation (JBBA) makes the problem more sophisticated, i.e., mathematically NP-hard. We observe a splittable Lagrangian expression through the transformation and analysis of the original problem, where the variables indicating device involvement are introduced to decouple the block and bandwidth allocation. Then, the dual ascent method is employed to tackle the JBBA problem iteratively. Within each iteration, block allocation and bandwidth allocation are optimized concurrently. The optimal block allocation sub-problem is solved efficiently by applying the Hungarian method facilitated by the proposed CRUNCH algorithm. On the other hand, the bandwidth allocation sub-problem is solved in closed form, shedding light on favorable allocations to resource-limited devices. Through extensive experiments conducted on the GLUE benchmark, our results demonstrate significant latency reduction achievable by LoLa DEFT for fine-tuning a RoBERTa model. | - |
| dc.language | eng | - |
| dc.publisher | Institute of Electrical and Electronics Engineers | - |
| dc.relation.ispartof | IEEE Transactions on Wireless Communications | - |
| dc.rights | This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License. | - |
| dc.title | Resource Management for Low-latency Cooperative Fine-tuning of Foundation Models at the Network Edge | - |
| dc.type | Article | - |
| dc.identifier.doi | 10.1109/TWC.2025.3544333 | - |
| dc.identifier.scopus | eid_2-s2.0-85219583356 | - |
| dc.identifier.volume | 24 | - |
| dc.identifier.issue | 6 | - |
| dc.identifier.spage | 4839 | - |
| dc.identifier.epage | 4852 | - |
| dc.identifier.eissn | 1558-2248 | - |
| dc.identifier.issnl | 1536-1276 | - |
