Performance analysis of access latency in distributed storage systems

Shuai, Qiqi; 帥奇奇

File Download

FullText.pdf

Links for fulltext

(May Require Subscription)

DOI: 10.5353/th_b5801616

Supplementary

Citations:
Appears in Collections:
- Electrical & Electronic Engineering: Theses
- HKU Theses Online

postgraduate thesis: Performance analysis of access latency in distributed storage systems

Title	Performance analysis of access latency in distributed storage systems
Authors	Shuai, Qiqi 帥奇奇
Issue Date	2016
Publisher	The University of Hong Kong (Pokfulam, Hong Kong)
Citation	Shuai, Q. [帥奇奇]. (2016). Performance analysis of access latency in distributed storage systems. (Thesis). University of Hong Kong, Pokfulam, Hong Kong SAR. Retrieved from http://dx.doi.org/10.5353/th_b5801616.
Abstract	Access latency performance is a key metric in distributed storage systems since it greatly impacts user experience while existing codes mainly focus on improving performance such as storage overhead and repair cost. By generating parity nodes from parity nodes, in this work we design new XOR-based erasure codes HTSC and FH HTSC to reduce access latency in distributed storage systems. By comparing with other popular and representative codes, we show that, under the same repair cost, HTSC and FH HTSC codes can reduce access latency while maintaining favorable performance in other metrics. In particular, under the same repair cost, FH HTSC can achieve lower access latency, higher or equal failure tolerance and lower computation cost compared with the representative codes while enjoying similar storage overhead. Accordingly, FH HTSC is a superior choice for applications requiring low access latency and outstanding failure tolerance capability at the same time. Both direct and k-access reads are common in distributed storage systems. However, much of previous research only considers k-access reads and many schemes, such as Redundant Scheme, are only shown to reduce latency for k-access reads. We have no idea whether those existing schemes can also work for direct reads. The study regarding the characteristics of the latency performance of direct reads, and the appropriate schemes for direct reads to reduce latency is still lacking. In this work, we study the latency performance of direct reads and its correlation with degraded reads. We illustrate the relationship between degraded reads and bandwidth cost and answer important questions like when degraded reads can help reduce latency. Then we propose a scheme DRALB to reduce latency for direct reads. DRALB can be easily added to existing schemes and can greatly reduce the latency of hot data. We also conduct trace-driven simulations to verify that DRALB significantly outperforms existing schemes, in terms of latency performance of direct reads. Till now, almost all previous studies analyze access latency when a user is interested in reading all the files in a codeword. Our research extends previous studies and analyzes the access latency in a general case when users require different sizes of files from a codeword. We also characterize the latency-cost tradeoffs for the general case. In addition, we study the latency performance of coding and replication with non-uniform data popularity in practical storage systems. Accounting for practical conditions and through extensive simulations using real service time traces from Amazon S3, we compare the latency performance of coding and replication and find that, different from previous results, under the same storage cost, we cannot determine easily which one is better, since it depends on many conditions, especially on whether the data popularity is uniform or not.
Degree	Doctor of Philosophy
Subject	Storage area networks (Computer networks) Electronic data processing - Distributed processing
Dept/Program	Electrical and Electronic Engineering
Persistent Identifier	http://hdl.handle.net/10722/246681
HKU Library Item ID	b5801616

DC Field	Value	Language
dc.contributor.author	Shuai, Qiqi	-
dc.contributor.author	帥奇奇	-
dc.date.accessioned	2017-09-22T03:40:11Z	-
dc.date.available	2017-09-22T03:40:11Z	-
dc.date.issued	2016	-
dc.identifier.citation	Shuai, Q. [帥奇奇]. (2016). Performance analysis of access latency in distributed storage systems. (Thesis). University of Hong Kong, Pokfulam, Hong Kong SAR. Retrieved from http://dx.doi.org/10.5353/th_b5801616.	-
dc.identifier.uri	http://hdl.handle.net/10722/246681	-
dc.description.abstract	Access latency performance is a key metric in distributed storage systems since it greatly impacts user experience while existing codes mainly focus on improving performance such as storage overhead and repair cost. By generating parity nodes from parity nodes, in this work we design new XOR-based erasure codes HTSC and FH HTSC to reduce access latency in distributed storage systems. By comparing with other popular and representative codes, we show that, under the same repair cost, HTSC and FH HTSC codes can reduce access latency while maintaining favorable performance in other metrics. In particular, under the same repair cost, FH HTSC can achieve lower access latency, higher or equal failure tolerance and lower computation cost compared with the representative codes while enjoying similar storage overhead. Accordingly, FH HTSC is a superior choice for applications requiring low access latency and outstanding failure tolerance capability at the same time. Both direct and k-access reads are common in distributed storage systems. However, much of previous research only considers k-access reads and many schemes, such as Redundant Scheme, are only shown to reduce latency for k-access reads. We have no idea whether those existing schemes can also work for direct reads. The study regarding the characteristics of the latency performance of direct reads, and the appropriate schemes for direct reads to reduce latency is still lacking. In this work, we study the latency performance of direct reads and its correlation with degraded reads. We illustrate the relationship between degraded reads and bandwidth cost and answer important questions like when degraded reads can help reduce latency. Then we propose a scheme DRALB to reduce latency for direct reads. DRALB can be easily added to existing schemes and can greatly reduce the latency of hot data. We also conduct trace-driven simulations to verify that DRALB significantly outperforms existing schemes, in terms of latency performance of direct reads. Till now, almost all previous studies analyze access latency when a user is interested in reading all the files in a codeword. Our research extends previous studies and analyzes the access latency in a general case when users require different sizes of files from a codeword. We also characterize the latency-cost tradeoffs for the general case. In addition, we study the latency performance of coding and replication with non-uniform data popularity in practical storage systems. Accounting for practical conditions and through extensive simulations using real service time traces from Amazon S3, we compare the latency performance of coding and replication and find that, different from previous results, under the same storage cost, we cannot determine easily which one is better, since it depends on many conditions, especially on whether the data popularity is uniform or not.	-
dc.language	eng	-
dc.publisher	The University of Hong Kong (Pokfulam, Hong Kong)	-
dc.relation.ispartof	HKU Theses Online (HKUTO)	-
dc.rights	The author retains all proprietary rights, (such as patent rights) and the right to use in future works.	-
dc.rights	This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.	-
dc.subject.lcsh	Storage area networks (Computer networks)	-
dc.subject.lcsh	Electronic data processing - Distributed processing	-
dc.title	Performance analysis of access latency in distributed storage systems	-
dc.type	PG_Thesis	-
dc.identifier.hkul	b5801616	-
dc.description.thesisname	Doctor of Philosophy	-
dc.description.thesislevel	Doctoral	-
dc.description.thesisdiscipline	Electrical and Electronic Engineering	-
dc.description.nature	published_or_final_version	-
dc.identifier.doi	10.5353/th_b5801616	-
dc.identifier.mmsid	991043959797403414	-

File Download

Links for fulltext

(May Require Subscription)

Supplementary

postgraduate thesis: Performance analysis of access latency in distributed storage systems

Export via OAI-PMH Interface in XML Formats

OR

Export to Other Non-XML Formats