File Download
Links for fulltext
(May Require Subscription)
- Publisher Website: 10.1109/ICPP.2013.62
- Scopus: eid_2-s2.0-84893224794
- WOS: WOS:000330046000052
- Find via
Supplementary
- Citations:
- Appears in Collections:
Conference Paper: Java with Auto-parallelization on Graphics Coprocessing Architecture
Title | Java with Auto-parallelization on Graphics Coprocessing Architecture |
---|---|
Authors | |
Issue Date | 2013 |
Publisher | I E E E, Computer Society. The Journal's web site is located at http://ieeexplore.ieee.org/xpl/conhome.jsp?punumber=1000540 |
Citation | The 42nd International Conference on Parallel Processing (ICPP), Lyon, France, 1-4 October 2013. In International Conference on Parallel Processing, 2013, p. 504-509 How to Cite? |
Abstract | GPU-based many-core accelerators have gained a footing in supercomputing. Their widespread adoption yet hinges on better parallelization and load scheduling techniques to utilize the hybrid system of CPU and GPU cores easily and efficiently. This paper introduces a new user-friendly compiler framework and runtime system, dubbed Japonica, to help Java applications harness the full power of a heterogeneous system. Japonica unveils an all-round system design unifying the programming style and language for transparent use of both CPU and GPU resources, automatically parallelizing all kinds of loops and scheduling workloads efficiently across the CPU-GPU border. By means of simple user annotations, sequential Java source code will be analyzed, translated and compiled into a dual executable consisting of CUDA kernels and multiple Java threads running on GPU and CPU cores respectively. Annotated loops will be automatically split into loop chunks (or tasks) being scheduled to execute on all available GPU/CPU cores. Implementing a GPU-tailored thread-level speculation (TLS) model, Japonica supports speculative execution of loops with moderate dependency densities and privatization of loops having only false dependencies on the GPU side. Our scheduler also supports task stealing and task sharing algorithms that allow swift load redistribution across GPU and CPU. Experimental results show that Japonica, on average, can run 10x, 2.5x and 2.14x faster than the best serial (1-thread CPU), GPU-alone and CPU-alone versions respectively. |
Persistent Identifier | http://hdl.handle.net/10722/189651 |
ISSN | 2020 SCImago Journal Rankings: 0.269 |
ISI Accession Number ID |
DC Field | Value | Language |
---|---|---|
dc.contributor.author | Han, G | en_US |
dc.contributor.author | Zhang, C | en_US |
dc.contributor.author | Lam, KT | en_US |
dc.contributor.author | Wang, CL | en_US |
dc.date.accessioned | 2013-09-17T14:50:38Z | - |
dc.date.available | 2013-09-17T14:50:38Z | - |
dc.date.issued | 2013 | en_US |
dc.identifier.citation | The 42nd International Conference on Parallel Processing (ICPP), Lyon, France, 1-4 October 2013. In International Conference on Parallel Processing, 2013, p. 504-509 | en_US |
dc.identifier.issn | 0190-3918 | - |
dc.identifier.uri | http://hdl.handle.net/10722/189651 | - |
dc.description.abstract | GPU-based many-core accelerators have gained a footing in supercomputing. Their widespread adoption yet hinges on better parallelization and load scheduling techniques to utilize the hybrid system of CPU and GPU cores easily and efficiently. This paper introduces a new user-friendly compiler framework and runtime system, dubbed Japonica, to help Java applications harness the full power of a heterogeneous system. Japonica unveils an all-round system design unifying the programming style and language for transparent use of both CPU and GPU resources, automatically parallelizing all kinds of loops and scheduling workloads efficiently across the CPU-GPU border. By means of simple user annotations, sequential Java source code will be analyzed, translated and compiled into a dual executable consisting of CUDA kernels and multiple Java threads running on GPU and CPU cores respectively. Annotated loops will be automatically split into loop chunks (or tasks) being scheduled to execute on all available GPU/CPU cores. Implementing a GPU-tailored thread-level speculation (TLS) model, Japonica supports speculative execution of loops with moderate dependency densities and privatization of loops having only false dependencies on the GPU side. Our scheduler also supports task stealing and task sharing algorithms that allow swift load redistribution across GPU and CPU. Experimental results show that Japonica, on average, can run 10x, 2.5x and 2.14x faster than the best serial (1-thread CPU), GPU-alone and CPU-alone versions respectively. | - |
dc.language | eng | en_US |
dc.publisher | I E E E, Computer Society. The Journal's web site is located at http://ieeexplore.ieee.org/xpl/conhome.jsp?punumber=1000540 | - |
dc.relation.ispartof | International Conference on Parallel Processing | en_US |
dc.title | Java with Auto-parallelization on Graphics Coprocessing Architecture | en_US |
dc.type | Conference_Paper | en_US |
dc.identifier.email | Lam, KT: kingtin@hku.hk | en_US |
dc.identifier.email | Wang, CL: clwang@cs.hku.hk | en_US |
dc.identifier.authority | Wang, CL=rp00183 | en_US |
dc.description.nature | link_to_subscribed_fulltext | - |
dc.identifier.doi | 10.1109/ICPP.2013.62 | - |
dc.identifier.scopus | eid_2-s2.0-84893224794 | - |
dc.identifier.hkuros | 225162 | en_US |
dc.identifier.spage | 504 | - |
dc.identifier.epage | 509 | - |
dc.identifier.isi | WOS:000330046000052 | - |
dc.publisher.place | United States | - |
dc.identifier.issnl | 0190-3918 | - |