mogilefs, 簡約可靠的儲存方案

35
MogileFS 簡約可靠的儲存方案 TWJUG Meetup Nov. 2016 kaif@kaif (member of mogilefs-moji)

Upload: hua-chu

Post on 22-Jan-2018

309 views

Category:

Technology


3 download

TRANSCRIPT

Page 1: Mogilefs, 簡約可靠的儲存方案

MogileFS簡約可靠的儲存方案

TWJUG Meetup Nov. 2016

kaif@kaif (member of mogilefs-moji)

Page 2: Mogilefs, 簡約可靠的儲存方案

Outline

• Mogilefs

• Moji

• State of the art in mogilefs reliability

Page 3: Mogilefs, 簡約可靠的儲存方案

Quick facts

“Open source distributed object storage” – a.k.a. cloud storage, soft defined storage…

• 高可用、水平擴展

• 檔案多副本儲存、修復

• 簡單的架構、容易使用

• 眾多應用實績

Page 4: Mogilefs, 簡約可靠的儲存方案

Brad Fitzpatrick

• Golang

• OpenID

• LiveJournal

– Memcached

– MogileFS

– …

Page 5: Mogilefs, 簡約可靠的儲存方案

Simplicity

Page 6: Mogilefs, 簡約可靠的儲存方案

Easy-to-use

• Command line tool

• Config file

Page 7: Mogilefs, 簡約可靠的儲存方案

Easy-to-use

• Admin tool

Page 8: Mogilefs, 簡約可靠的儲存方案

client

tracker

store

mysql

create_opendomain=toast&class=triple&debug_profile=0&fid=

0&multi_dest=1&key=qoo3

OK

path_1=http://127.0.0.20:7500/dev2/0/000/000/0000000014.fid&path_3=http://127.0.0.25:7500/dev3/0/000/000/0000000014.fid&devid_1=2&devid_3=3&

fid=14&path_2=http://127.0.0.25:7500/dev4/0/000/000/0000000014.fid&dev_count=3&devid_2=4

storestore

trackertracker

PUT /dev208/0/068/050/0068050934.fid HTTP/1.0Content-length: 9

some data

200 OK

1. Create open

3. Create close

2. Write data (webdav)

create_closedomain=toast&fid=14&devid=2&path=http://127.

0.0.20:7500/dev2/0/000/000/0000000014.fid&size=1048576&key=qoo3&devid_2=3&path_2=http://127.0.0.25:7500/dev3/0/000/000/0000000014.fid&mul

ti_dest=1

Page 9: Mogilefs, 簡約可靠的儲存方案

Availability

Page 10: Mogilefs, 簡約可靠的儲存方案

1WNR, memcached…

Scalability

Page 11: Mogilefs, 簡約可靠的儲存方案

使用者見證

Page 12: Mogilefs, 簡約可靠的儲存方案

KKBOX

Page 13: Mogilefs, 簡約可靠的儲存方案

KKBOX

• 超過3,000 萬首歌(檔案)

• 儲存伺服器超過 75 台

• 總硬碟超過 2,300 顆

• 總儲存空間超過 10 PB

• 使用 8 個機櫃

(KKBOX 的音樂檔案儲存技術Posted on August 2, 2016 by Chris Yuan)

Page 14: Mogilefs, 簡約可靠的儲存方案

My production experience

• 檔案量:KKBOX*10*N

• Node數:10^2*N

• 複雜的workload(備份、串流、物聯網、web、log…orz)

• Java ♥

Page 15: Mogilefs, 簡約可靠的儲存方案

Moji

• A file-like MogileFS client for Java developers

• Production-ready features

– Connection pooling, load balancing, fault-tolerant…

• Quality

– Spring friendly, integration tests, well documented, actively developing…

https://github.com/mogilefs-moji/moji

Page 16: Mogilefs, 簡約可靠的儲存方案

Configuration

• Using plain-old-Java

• Using the Spring framework

SpringMojiBean moji = new SpringMojiBean();moji.setAddressesCsv("192.168.0.1:7001,192.168.0.2:7001");moji.setDomain("testdomain");moji.initialise();moji.setTestOnBorrow(true);

moji.tracker.address=192.168.0.1:7001,192.168.0.2:7001moji.domain=testdomain

<import resource="moji-context.xml" />

Page 17: Mogilefs, 簡約可靠的儲存方案

Usage

• Create/update a remote file

• Download a remote file

MojiFile rickRoll = moji.getFile("rick-astley");moji.copyToMogile(new File("never-gonna-give-you-up.mp3"), rickRoll);

rickRoll.copyToFile(new File("foo-fighters.mp3"));

Page 18: Mogilefs, 簡約可靠的儲存方案

Usage

• IO streamMojiFile fooFighters = moji.getFile("stacked-actors");

InputStream stream = null;try {

stream = fooFighters.getInputStream();// Do something streamy// stream.read();

} finally {stream.close();

}

OutputStream stream = null;try {

stream = fooFighters.getOutputStream();// Do something streamy// stream.write(...);stream.flush();

} finally {stream.close();

}

Page 19: Mogilefs, 簡約可靠的儲存方案

• Setup environment manually

– MogileFS

– Maven dependency

Call to action!

• Quickstart feat. docker run -d --name mogile-node jeffutter/mogile-nodedocker run -it --link mogile-node:mogile-node hrchu/mogile-moji

<dependency><groupId>fm.last</groupId><artifactId>moji</artifactId><version>2.0.0</version>

</dependency>

https://code.google.com/p/mogilefs/wiki/QuickStartGuide

Page 20: Mogilefs, 簡約可靠的儲存方案

來講一些 關於可靠度的事

Page 21: Mogilefs, 簡約可靠的儲存方案

Mogilefs的可靠度對策

• Single copy ACK

• Multiple host replication policy

• MD5 checksum

• Basic health disk check

• Multiple zone plugin

• Reaper/fsck

Page 22: Mogilefs, 簡約可靠的儲存方案

從此檔案們就過著幸福快樂的日子~

… ?

Page 23: Mogilefs, 簡約可靠的儲存方案

強化可靠度可能方向

• Mutiple sites

• Scrubber

• Modern durable write

Page 24: Mogilefs, 簡約可靠的儲存方案

Multiple Sites

• MogileFS::Network plugin

• 不同機房配置不同網段

• Zone對應網段設定

• Replication policy

Page 25: Mogilefs, 簡約可靠的儲存方案

Multiple Sites• Given a network of: 10.10.0.0/16

• All of your machines are configured to have a netmask of 10.10.0.0/16 . When assigning IP addresses to machines, pick them from 10.10.5.0/24

• 設定IP

– web1: 10.10.5.1 (netmask 255.255.0.0 or /16)

– web2: 10.10.5.2

– tracker1: 10.10.5.3

– tracker2: 10.10.5.4

– storage node 1: 10.10.5.5

– storage node 2: 10.10.5.6

– storage node 3: 10.10.8.1

• MogileFS zones, you configure:

– near=10.10.5.0/24 far=10.10.8.0/24

web1

tracker1

node1 node2

near

tracker2

node3

far

web2

Page 26: Mogilefs, 簡約可靠的儲存方案

Scrubber

• Make use of routine FSCK as scrubber

• Modified Algorithm

– Remove exhaustive search

– Improve performance in large scalehttps://github.com/mogilefs/MogileFS-

Network/blob/master/lib/MogileFS/ReplicationPolicy/HostsPerNetwork.pm#L84

mogadm fsck status |grep " Yes " || (mogadm fsck reset; mogadm fsck clearlog; mogadm fsck start) >/var/log/mogadm.fsck 2>&1

Page 27: Mogilefs, 簡約可靠的儲存方案

Modern durable write

• AS-IS

client

tracker

store

mysql

store store

trackertracker

4. Write other copies asynchronously

Assume that a file should have at least three replicas in the system to fit the durability requirement

Page 28: Mogilefs, 簡約可靠的儲存方案

Modern durable write

client

tracker

store

mysql

2. Write at least two copiesbefore ACK

store store

trackertracker

4. Write other copiesasynchronously

• TO-BEAssume that a file should have at least three replicas in the system to fit the durability requirement

mogilefs-moji#25

mogilefs/MogileFS-Server#39

Page 29: Mogilefs, 簡約可靠的儲存方案

Analysis

• Disk failure pattern

– MTTF?

– poisson distribution?

• Mark-out: 發現錯誤的空窗期

• Rep latency: 非同步複製的空窗期

• 硬碟大小,檔案大小也會影響計算結果

Page 30: Mogilefs, 簡約可靠的儲存方案

Analysis

• Combinatorial analysis model

– Assume that each disk fails independently

– Assume that after x hours of operation each block has P(xi) = p

– Probability of failure q = 1 - p.

– 對replication來說是一個naive的公式:1 – qn

Page 31: Mogilefs, 簡約可靠的儲存方案

Analysis

• 若考慮

– Non-Recoverable Errors (NREs)

– drive failure events are poisson

– site failures (e.g. due to regional disasters)

– rep latency, mark-out time

– …

• Analysis of system durability is commonly done with Markov models

Page 32: Mogilefs, 簡約可靠的儲存方案

Analysis

• Example of durable write

– Assume mean disk life is 500K hrs

– 2 replicas, no NRE

249960

249980

250000

250020

250040

250060

250080

1 0.041666667 0.020833333 0.013888889

diff disk life 5

diff disk life 5

Diff of MTTDL in hr

mu

複製速率越低, durable write的改善幅度越大

Page 33: Mogilefs, 簡約可靠的儲存方案

Analysis

• Example of probability of data loss

0.000000E+00

1.000000E-05

2.000000E-05

3.000000E-05

4.000000E-05

5.000000E-05

6.000000E-05

7.000000E-05

8.000000E-05

1 2 3 4 5 6 7 8 9 10 11 12 13 14

P of data loss 72

P of data loss 48

P of data loss 24

P of data loss 1

Page 34: Mogilefs, 簡約可靠的儲存方案

Recap

儲存之於架構 案場需求決定儲存架構抉擇

在考量機敏資料、業主需求、成本或是legacy的情境,mogilefs或許會是合適的儲存架構選擇~

關於Mogilefs,我想說的是… 簡單可擴展的非結構化儲存系統

Java stack建議搭配moji服用

如果事業做很大有富爸爸,能找specialist/consulting,ceph/swift會是更先進複雜的選擇!

Page 35: Mogilefs, 簡約可靠的儲存方案

Thank you~

【關於我】

https://kaif.io/u/kaif

https://github.com/hrchu

[email protected]

【關於moji】

https://github.com/mogilefs-moji/moji

FIN~