File Download

There are no files associated with this item.

Supplementary

Conference Paper: SplitQuant: Resource-Efficient LLM Offline Serving on Heterogeneous GPUs via Phase-Aware Model Partition and Adaptive Quantization

TitleSplitQuant: Resource-Efficient LLM Offline Serving on Heterogeneous GPUs via Phase-Aware Model Partition and Adaptive Quantization
Authors
Issue Date3-Sep-2025
Persistent Identifierhttp://hdl.handle.net/10722/366603

 

DC FieldValueLanguage
dc.contributor.authorWu, Chuan-
dc.date.accessioned2025-11-25T04:20:24Z-
dc.date.available2025-11-25T04:20:24Z-
dc.date.issued2025-09-03-
dc.identifier.urihttp://hdl.handle.net/10722/366603-
dc.languageeng-
dc.relation.ispartofIEEE Cluster (02/09/2025-05/09/2025, London)-
dc.titleSplitQuant: Resource-Efficient LLM Offline Serving on Heterogeneous GPUs via Phase-Aware Model Partition and Adaptive Quantization-
dc.typeConference_Paper-

Export via OAI-PMH Interface in XML Formats


OR


Export to Other Non-XML Formats