File Download
There are no files associated with this item.
Supplementary
-
Citations:
- Appears in Collections:
Conference Paper: SplitQuant: Resource-Efficient LLM Offline Serving on Heterogeneous GPUs via Phase-Aware Model Partition and Adaptive Quantization
| Title | SplitQuant: Resource-Efficient LLM Offline Serving on Heterogeneous GPUs via Phase-Aware Model Partition and Adaptive Quantization |
|---|---|
| Authors | |
| Issue Date | 3-Sep-2025 |
| Persistent Identifier | http://hdl.handle.net/10722/366603 |
| DC Field | Value | Language |
|---|---|---|
| dc.contributor.author | Wu, Chuan | - |
| dc.date.accessioned | 2025-11-25T04:20:24Z | - |
| dc.date.available | 2025-11-25T04:20:24Z | - |
| dc.date.issued | 2025-09-03 | - |
| dc.identifier.uri | http://hdl.handle.net/10722/366603 | - |
| dc.language | eng | - |
| dc.relation.ispartof | IEEE Cluster (02/09/2025-05/09/2025, London) | - |
| dc.title | SplitQuant: Resource-Efficient LLM Offline Serving on Heterogeneous GPUs via Phase-Aware Model Partition and Adaptive Quantization | - |
| dc.type | Conference_Paper | - |
