SplitQuant: Resource-Efficient LLM Offline Serving on Heterogeneous GPUs via Phase-Aware Model Partition and Adaptive Quantization

There are no files associated with this item.

Title	SplitQuant: Resource-Efficient LLM Offline Serving on Heterogeneous GPUs via Phase-Aware Model Partition and Adaptive Quantization
Authors	Wu, Chuan
Issue Date	3-Sep-2025
Persistent Identifier	http://hdl.handle.net/10722/366603

DC Field	Value	Language
dc.contributor.author	Wu, Chuan	-
dc.date.accessioned	2025-11-25T04:20:24Z	-
dc.date.available	2025-11-25T04:20:24Z	-
dc.date.issued	2025-09-03	-
dc.identifier.uri	http://hdl.handle.net/10722/366603	-
dc.language	eng	-
dc.relation.ispartof	IEEE Cluster (02/09/2025-05/09/2025, London)	-
dc.title	SplitQuant: Resource-Efficient LLM Offline Serving on Heterogeneous GPUs via Phase-Aware Model Partition and Adaptive Quantization	-
dc.type	Conference_Paper	-