A Model-Based Software Solution for Simultaneous Multiple Kernels on GPUs

WU, H; Liu, WZ; LIN, H; Wang, CL

File Download

There are no files associated with this item.

Links for fulltext

(May Require Subscription)

Publisher Website: 10.1145/3377138
Scopus: eid_2-s2.0-85081572012
WOS: WOS:000582614800007
Find via

Supplementary

Citations:
- Scopus: 0
- Web of Science: 0
Appears in Collections:
- Computer Science: Journal/Magazine Articles

Article: A Model-Based Software Solution for Simultaneous Multiple Kernels on GPUs

Title	A Model-Based Software Solution for Simultaneous Multiple Kernels on GPUs
Authors	WU, H Liu, WZ LIN, H Wang, CL
Keywords	Multitasking Program processors Supercomputers Computing resource concurrent kernel execution
Issue Date	2020
Publisher	Association for Computing Machinery, Inc. The Journal's web site is located at http://taco.acm.org
Citation	ACM Transactions on Architecture and Code Optimization, 2020, v. 17 n. 1, p. article no. 7 How to Cite? DOI: http://dx.doi.org/10.1145/3377138
Abstract	As a critical computing resource in multiuser systems such as supercomputers, data centers, and cloud services, a GPU contains multiple compute units (CUs). GPU Multitasking is an intuitive solution to underutilization in GPGPU computing. Recently proposed solutions of multitasking GPUs can be classified into two categories: (1) spatially partitioned sharing (SPS), which coexecutes different kernels on disjointed sets of compute units (CU), and (2) simultaneous multikernel (SMK), which runs multiple kernels simultaneously within a CU. Compared to SPS, SMK can improve resource utilization even further due to the interleaving of instructions from kernels with low dynamic resource contentions. However, it is hard to implement SMK on current GPU architecture, because (1) techniques for applying SMK on top of GPU hardware scheduling policy are scarce and (2) finding an efficient SMK scheme is difficult due to the complex interferences of concurrently executed kernels. In this article, we propose a lightweight and effective performance model to evaluate the complex interferences of SMK. Based on the probability of independent events, our performance model is built from a totally new angle and contains limited parameters. Then, we propose a metric, symbiotic factor, which can evaluate an SMK scheme so that kernels with complementary resource utilization can corun within a CU. Also, we analyze the advantages and disadvantages of kernel slicing and kernel stretching techniques and integrate them to apply SMK on GPUs instead of simulators. We validate our model on 18 benchmarks. Compared to the optimized hardware-based concurrent kernel execution whose kernel launching order brings fast execution time, the results of corunning kernel pairs show 11%, 18%, and 12% speedup on AMD R9 290X, RX 480, and Vega 64, respectively, on average. Compared to the Warped-Slicer, the results show 29%, 18%, and 51% speedup on AMD R9 290X, RX 480, and Vega 64, respectively, on average.
Persistent Identifier	http://hdl.handle.net/10722/283311
ISSN	1544-3566 2021 Impact Factor: 1.444 2020 SCImago Journal Rankings: 0.263
ISI Accession Number ID	WOS:000582614800007

DC Field	Value	Language
dc.contributor.author	WU, H	-
dc.contributor.author	Liu, WZ	-
dc.contributor.author	LIN, H	-
dc.contributor.author	Wang, CL	-
dc.date.accessioned	2020-06-22T02:54:52Z	-
dc.date.available	2020-06-22T02:54:52Z	-
dc.date.issued	2020	-
dc.identifier.citation	ACM Transactions on Architecture and Code Optimization, 2020, v. 17 n. 1, p. article no. 7	-
dc.identifier.issn	1544-3566	-
dc.identifier.uri	http://hdl.handle.net/10722/283311	-
dc.description.abstract	As a critical computing resource in multiuser systems such as supercomputers, data centers, and cloud services, a GPU contains multiple compute units (CUs). GPU Multitasking is an intuitive solution to underutilization in GPGPU computing. Recently proposed solutions of multitasking GPUs can be classified into two categories: (1) spatially partitioned sharing (SPS), which coexecutes different kernels on disjointed sets of compute units (CU), and (2) simultaneous multikernel (SMK), which runs multiple kernels simultaneously within a CU. Compared to SPS, SMK can improve resource utilization even further due to the interleaving of instructions from kernels with low dynamic resource contentions. However, it is hard to implement SMK on current GPU architecture, because (1) techniques for applying SMK on top of GPU hardware scheduling policy are scarce and (2) finding an efficient SMK scheme is difficult due to the complex interferences of concurrently executed kernels. In this article, we propose a lightweight and effective performance model to evaluate the complex interferences of SMK. Based on the probability of independent events, our performance model is built from a totally new angle and contains limited parameters. Then, we propose a metric, symbiotic factor, which can evaluate an SMK scheme so that kernels with complementary resource utilization can corun within a CU. Also, we analyze the advantages and disadvantages of kernel slicing and kernel stretching techniques and integrate them to apply SMK on GPUs instead of simulators. We validate our model on 18 benchmarks. Compared to the optimized hardware-based concurrent kernel execution whose kernel launching order brings fast execution time, the results of corunning kernel pairs show 11%, 18%, and 12% speedup on AMD R9 290X, RX 480, and Vega 64, respectively, on average. Compared to the Warped-Slicer, the results show 29%, 18%, and 51% speedup on AMD R9 290X, RX 480, and Vega 64, respectively, on average.	-
dc.language	eng	-
dc.publisher	Association for Computing Machinery, Inc. The Journal's web site is located at http://taco.acm.org	-
dc.relation.ispartof	ACM Transactions on Architecture and Code Optimization	-
dc.rights	ACM Transactions on Architecture and Code Optimization. Copyright © Association for Computing Machinery, Inc.	-
dc.rights	©ACM, YYYY. This is the author's version of the work. It is posted here by permission of ACM for your personal use. Not for redistribution. The definitive version was published in PUBLICATION, {VOL#, ISS#, (DATE)} http://doi.acm.org/10.1145/nnnnnn.nnnnnn	-
dc.subject	Multitasking	-
dc.subject	Program processors	-
dc.subject	Supercomputers	-
dc.subject	Computing resource	-
dc.subject	concurrent kernel execution	-
dc.title	A Model-Based Software Solution for Simultaneous Multiple Kernels on GPUs	-
dc.type	Article	-
dc.identifier.email	Wang, CL: clwang@cs.hku.hk	-
dc.identifier.authority	Wang, CL=rp00183	-
dc.description.nature	link_to_subscribed_fulltext	-
dc.identifier.doi	10.1145/3377138	-
dc.identifier.scopus	eid_2-s2.0-85081572012	-
dc.identifier.hkuros	310353	-
dc.identifier.volume	17	-
dc.identifier.issue	1	-
dc.identifier.spage	article no. 7	-
dc.identifier.epage	article no. 7	-
dc.identifier.isi	WOS:000582614800007	-
dc.publisher.place	United States	-
dc.identifier.issnl	1544-3566	-

File Download

Links for fulltext

(May Require Subscription)

Supplementary

Article: A Model-Based Software Solution for Simultaneous Multiple Kernels on GPUs

Export via OAI-PMH Interface in XML Formats

OR

Export to Other Non-XML Formats