File Download
  Links for fulltext
     (May Require Subscription)
Supplementary

postgraduate thesis: A software shared virtual memory system with three way coherence protocols on the intel single-chip cloud computer

TitleA software shared virtual memory system with three way coherence protocols on the intel single-chip cloud computer
Authors
Issue Date2015
PublisherThe University of Hong Kong (Pokfulam, Hong Kong)
Citation
Hung, C. D. [熊哲皓]. (2015). A software shared virtual memory system with three way coherence protocols on the intel single-chip cloud computer. (Thesis). University of Hong Kong, Pokfulam, Hong Kong SAR. Retrieved from http://dx.doi.org/10.5353/th_b5570812
AbstractWith the advancement of design and fabrication of high-performance integrated circuits technology, it is foreseeable that processors with more than 1,000 cores per die will appear in the near future. However, these many-core architectures have introduced a lot of challenges at the memory system level, such as complicated cache coherence and limited memory access speed, to name a few. This thesis focuses on one prominent many-core prototype — the Intel’s Single-chip Cloud Computer (SCC). The SCC architecture does not provide hardware cache coherency. Instead, it relies on on-chip programmable memory. The baseline coherence protocol for the SCC is the Software Managed Coherence (SMC) layer. To achieve memory consistency, it accesses shared memory without part of the typical cache hierarchy for efficient invalidation and flushing. We found that performance provided by this coherence layer in this manner is sub-optimal because accesses of shared memory would all turn into data update messages within the network mesh. As cache locality could not be exploited to its full potential, the execution pipelines stall much often for memory fetches from outside the chip. This research is to address the performance problem of shared virtual memory consistency for this cache in-coherent architecture. Oriented at sitting data on-chip as much as possible to reduce memory accesses external to the chip, we propose two techniques to leverage the cache hierarchy to full and reside data in the on-chip scratchpad memory. First, targeted at the architectural specificity of the hardware, we redesigned traditional software distributed shared memory (SDSM) to allow shared data be treated transparently like private memory so the cache hierarchy can be fully utilised without sacrificing memory consistency. Second, we propose a distance-aware page allocation scheme that samples access frequencies and select the most frequently-recently used pages to be stored on the on-chip scratchpad memory. Our experimental results show that our first technique, the ordinary SDSM outperforms the current SMC approach by 5 times. Moreover, in some cases, with the second technique that is based on scratchpad memory, our proposed system outperforms further by an additional 1.57 times. Our experiments also demonstrated that the SMC approach is not scalable due to congestion of the network mesh by coherence traffic generated while the two new approaches continued to scale well. The main contribution of this research is the implementation of a cache coherence software library system built for an architecture that comes with non-coherent cache hardware and just relies on software-defined cache. This new cache hierarchy has evidently opened the door for smarter and faster inter-processor-core data sharing without the need of complicated cache coherence hardware.
DegreeMaster of Philosophy
SubjectDistributed shared memory
Cloud computing
Dept/ProgramComputer Science
Persistent Identifierhttp://hdl.handle.net/10722/219988
HKU Library Item IDb5570812

 

DC FieldValueLanguage
dc.contributor.authorHung, Chit-ho, Dominic-
dc.contributor.author熊哲皓-
dc.date.accessioned2015-10-08T23:12:17Z-
dc.date.available2015-10-08T23:12:17Z-
dc.date.issued2015-
dc.identifier.citationHung, C. D. [熊哲皓]. (2015). A software shared virtual memory system with three way coherence protocols on the intel single-chip cloud computer. (Thesis). University of Hong Kong, Pokfulam, Hong Kong SAR. Retrieved from http://dx.doi.org/10.5353/th_b5570812-
dc.identifier.urihttp://hdl.handle.net/10722/219988-
dc.description.abstractWith the advancement of design and fabrication of high-performance integrated circuits technology, it is foreseeable that processors with more than 1,000 cores per die will appear in the near future. However, these many-core architectures have introduced a lot of challenges at the memory system level, such as complicated cache coherence and limited memory access speed, to name a few. This thesis focuses on one prominent many-core prototype — the Intel’s Single-chip Cloud Computer (SCC). The SCC architecture does not provide hardware cache coherency. Instead, it relies on on-chip programmable memory. The baseline coherence protocol for the SCC is the Software Managed Coherence (SMC) layer. To achieve memory consistency, it accesses shared memory without part of the typical cache hierarchy for efficient invalidation and flushing. We found that performance provided by this coherence layer in this manner is sub-optimal because accesses of shared memory would all turn into data update messages within the network mesh. As cache locality could not be exploited to its full potential, the execution pipelines stall much often for memory fetches from outside the chip. This research is to address the performance problem of shared virtual memory consistency for this cache in-coherent architecture. Oriented at sitting data on-chip as much as possible to reduce memory accesses external to the chip, we propose two techniques to leverage the cache hierarchy to full and reside data in the on-chip scratchpad memory. First, targeted at the architectural specificity of the hardware, we redesigned traditional software distributed shared memory (SDSM) to allow shared data be treated transparently like private memory so the cache hierarchy can be fully utilised without sacrificing memory consistency. Second, we propose a distance-aware page allocation scheme that samples access frequencies and select the most frequently-recently used pages to be stored on the on-chip scratchpad memory. Our experimental results show that our first technique, the ordinary SDSM outperforms the current SMC approach by 5 times. Moreover, in some cases, with the second technique that is based on scratchpad memory, our proposed system outperforms further by an additional 1.57 times. Our experiments also demonstrated that the SMC approach is not scalable due to congestion of the network mesh by coherence traffic generated while the two new approaches continued to scale well. The main contribution of this research is the implementation of a cache coherence software library system built for an architecture that comes with non-coherent cache hardware and just relies on software-defined cache. This new cache hierarchy has evidently opened the door for smarter and faster inter-processor-core data sharing without the need of complicated cache coherence hardware.-
dc.languageeng-
dc.publisherThe University of Hong Kong (Pokfulam, Hong Kong)-
dc.relation.ispartofHKU Theses Online (HKUTO)-
dc.rightsThis work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.-
dc.rightsThe author retains all proprietary rights, (such as patent rights) and the right to use in future works.-
dc.subject.lcshDistributed shared memory-
dc.subject.lcshCloud computing-
dc.titleA software shared virtual memory system with three way coherence protocols on the intel single-chip cloud computer-
dc.typePG_Thesis-
dc.identifier.hkulb5570812-
dc.description.thesisnameMaster of Philosophy-
dc.description.thesislevelMaster-
dc.description.thesisdisciplineComputer Science-
dc.description.naturepublished_or_final_version-
dc.identifier.doi10.5353/th_b5570812-
dc.identifier.mmsid991011109529703414-

Export via OAI-PMH Interface in XML Formats


OR


Export to Other Non-XML Formats