distributed storage allocation problems
DESCRIPTION
Distributed Storage Allocation Problems. Derek Leong, Alexandros G. Dimakis , Tracey Ho California Institute of Technology NetCod 2009 2009-06-16. Motivation. Motivation. 0.1. 2. ?. ?. ?. ?. ?. Σ ≥ 1?. Motivation. A. 1. 1. 0. 0. 0. B. 2 / 5. 2 / 5. 2 / 5. 2 / 5. - PowerPoint PPT PresentationTRANSCRIPT
Distributed Storage Allocation ProblemsDerek Leong, Alexandros G. Dimakis, Tracey HoCalifornia Institute of TechnologyNetCod 20092009-06-16
Motivation
Motivation
2
? ?? ?
?
0.1
Σ ≥ 1?
Motivation
1 1 0 0 0A
2/5B
2/5 2/5 2/5 2/5
1/2
C1/2 1/2 1/2 0
Motivation
Success probability= 0.90 × 0.15 × 0 successful 0-subsets + 0.91 × 0.14 × 2 successful 1-subsets+ 0.92 × 0.13 × 7 successful 2-subsets+ 0.93 × 0.12 × 9 successful 3-subsets+ 0.94 × 0.11 × 5 successful 4-subsets+ 0.95 × 0.10 × 1 successful 5-subsets= 0.99
1 1 0 0 0A
Motivation
Success probability= 0.90 × 0.15 × 0 successful 0-subsets + 0.91 × 0.14 × 0 successful 1-subsets+ 0.92 × 0.13 × 0 successful 2-subsets+ 0.93 × 0.12 × 10 successful 3-subsets+ 0.94 × 0.11 × 5 successful 4-subsets+ 0.95 × 0.10 × 1 successful 5-subsets= 0.99144
2/5B
2/5 2/5 2/5 2/5
Motivation
Success probability= 0.90 × 0.15 × 0 successful 0-subsets + 0.91 × 0.14 × 0 successful 1-subsets+ 0.92 × 0.13 × 6 successful 2-subsets+ 0.93 × 0.12 × 10 successful 3-subsets+ 0.94 × 0.11 × 5 successful 4-subsets+ 0.95 × 0.10 × 1 successful 5-subsets= 0.9963
1/2
C1/2 1/2 1/2 0
Motivation
1 1 0 0 0A
2/5B
2/5 2/5 2/5 2/5
1/2
C1/2 1/2 1/2 0
0.99
0.99144
0.9963
Motivation
2
? ?? ?
?
0.1
Σ ≥ 1?
allocationmodelaccess
model
Problem Description x
How do we use storage nodes to store a data object reliably, subject to an aggregate storage budget?
• x
• Storage Allocation• Access by the Data Collector• Objective
Problem Description x
How do we use storage nodes to store a data object reliably, subject to an aggregate storage budget?
• x
• Storage Allocation• Source s has a data object of unit size
• It can use n storage nodes to store x1, x2, …, xn amount of data
• But faces an aggregate storage budget T, i.e.
• Access by the Data Collector• Objective
Problem Description x
How do we use storage nodes to store a data object reliably, subject to an aggregate storage budget?
• x
• Storage Allocation• Access by the Data Collector
• Data collector t attempts to recover the data object by accessinga subset r of storage nodes
• It succeeds when the total amount of data accessed is at least the size of the data object, i.e.
• Objective
Problem Description x
How do we use storage nodes to store a data object reliably, subject to an aggregate storage budget?
• x
• Storage Allocation• Access by the Data Collector• Objective
• We seek the optimal allocation that maximizes the probability of successful recovery
Problem Description x
How do we use storage nodes to store a data object reliably, subject to an aggregate storage budget?
• x
• Difficulty• Problem is nonconvex• Large space of possible symmetric and nonsymmetric allocations
(an allocation is symmetric if all its nonzero elements are equal,and nonsymmetric otherwise)
[1] Deterministic Allocation with Probabilistic Access
Data collector accesses each storage node independentlywith constant probability p
• Symmetric allocations can be suboptimal• †Given n = 5 storage nodes,
budget T = 12/5, and p = 0.9,the nonsymmetric allocation
performs better than the optimal symmetric allocation
• Finding the optimal symmetric allocation is also nontrivial
[1] Deterministic Allocation with Probabilistic Access
†Originally from a discussion among R. Karp, R. Kleinberg, †C. Papadimitriou, E. Friedman, and others †at UC Berkeley
[2] Deterministic Allocation with Fixed Access
Data collector accesses an r-subset of storage nodes,selected uniformly at random from the collection of all possible r-subsets, where r < n is a constant
[2] Deterministic Allocation with Fixed Access• Equivalently, we can seek the
allocation that minimizes the budget T, among all allocationsthat achieve a given probabilityof successful recovery
[2] Deterministic Allocation with Fixed Access• Example: (n, r) = (6,2)
• Question: For any budget T, is therealways a symmetric allocation thatproduces the maximum success probability?
[2] Deterministic Allocation with Fixed Access• Question: What is the optimal
symmetric allocation?• For most choices of (n, r, T ), the
optimal allocation either concentrates the budget over a minimal number of nodes, or spreads it out maximally
• An example of an exception is (n, r, T ) = (15, 3, 4.6)for which the optimal number of nodes to use, 9, is neither of the extremes
[2] Deterministic Allocation with Fixed Access• For Probability-1 Recovery, the
problem reduces to a simple LP• Result 1:
If we require all possible r -subsets to allow successful recovery, then we need a minimum budget of
which corresponds to the allocation
i.e. it is optimal to spread the budget maximally
• We can also bound the success probability above which this allocation is optimal
[3] Symmetric Probabilistic Allocation with Fixed Access
Each storage node is used independently with constant
probability s/n to store the same amount of data 1/`, andthe total storage used must be at most budget T in expectation
[3] Symmetric Probabilistic Allocation with Fixed Access
• Probability of successful recovery can be written as
where “Bin(n, p)” denotes the binomial random variable with n trials and success probability p
• Reparameterizing in terms ofbudget T gives the success probability
,
,
each nonempty node stores1/` amount of data
[3] Symmetric Probabilistic Allocation with Fixed Access
• Result 2: For any r ≥ 2, and at any budget T large enough to support a success probability
xXXxxP (r, T,`) > 0.9
for some `, the choice ofx x x x x x x x x x ` = ris optimal, i.e. it is best to spread the budget maximally
each nonempty node stores1/` amount of data
[3] Symmetric Probabilistic Allocation with Fixed Access• As we increase the budget T, we
observe a sharp change in the optimal allocation• For small budgets and therefore
low success probabilities,it is optimal to store the data object in its entirety (` = 1) and hope the data collector accesses at least one of the nonempty nodes
• For large budgets and therefore high success probabilities, it is optimal to store only 1/r amount of data in each nodeused (` = r) and hope the data collector accesses r of them
r = 5
[3] Symmetric Probabilistic Allocation with Fixed Access
• We conjecture that for any r and T, the optimal choice of ` that maximizessuccess probability P (r, T,`) is either ` = 1 or ` = r
r = 5
each nonempty node stores1/` amount of data
[3] Symmetric Probabilistic Allocation with Fixed Access
• We conjecture that for any r and T, the optimal choice of ` that maximizessuccess probability P (r, T,`) is either ` = 1 or ` = r
each nonempty node stores1/` amount of data
r = 5
store lessstore more
increasing budgetper node
Summary & Future Work[1] Deterministic Allocation with Probabilistic Access
• Suboptimality of symmetric allocations
[2] Deterministic Allocation with Fixed Access• Optimal allocation for high probability recovery• Extreme point solutions not necessarily optimal
for symmetric allocations• Is there always a symmetric optimal allocation?
[3]iSymmetric Probabilistic Allocation with Fixed Access• Optimal allocation in high-probability regime• Is there a phase transition in optimal allocation
with increasing budget?
Distributed Storage Allocation ProblemsDerek Leong, Alexandros G. Dimakis, Tracey HoCalifornia Institute of TechnologyNetCod 20092009-06-16