![Page 3: Stateful Services on Mesos - events.static.linuxfound.org Services on Mesos Ankan Mukherjee (ankan@moz.com) Arunabha Ghosh (agh@moz.com)](https://reader034.vdocuments.pub/reader034/viewer/2022042708/5ad019587f8b9a56098df926/html5/thumbnails/3.jpg)
Presentation
Business
Data
![Page 4: Stateful Services on Mesos - events.static.linuxfound.org Services on Mesos Ankan Mukherjee (ankan@moz.com) Arunabha Ghosh (agh@moz.com)](https://reader034.vdocuments.pub/reader034/viewer/2022042708/5ad019587f8b9a56098df926/html5/thumbnails/4.jpg)
Presentation
Business
Data
![Page 5: Stateful Services on Mesos - events.static.linuxfound.org Services on Mesos Ankan Mukherjee (ankan@moz.com) Arunabha Ghosh (agh@moz.com)](https://reader034.vdocuments.pub/reader034/viewer/2022042708/5ad019587f8b9a56098df926/html5/thumbnails/5.jpg)
Why run on Mesos?
● Services are decoupled from the nodes
● Automatic failover
● Easier to manage/maintain
● Simpler version management
● Simpler environments, staging → deployment
● Lesser complexity of the set of systems
![Page 6: Stateful Services on Mesos - events.static.linuxfound.org Services on Mesos Ankan Mukherjee (ankan@moz.com) Arunabha Ghosh (agh@moz.com)](https://reader034.vdocuments.pub/reader034/viewer/2022042708/5ad019587f8b9a56098df926/html5/thumbnails/6.jpg)
Transition
![Page 7: Stateful Services on Mesos - events.static.linuxfound.org Services on Mesos Ankan Mukherjee (ankan@moz.com) Arunabha Ghosh (agh@moz.com)](https://reader034.vdocuments.pub/reader034/viewer/2022042708/5ad019587f8b9a56098df926/html5/thumbnails/7.jpg)
Challenges
● Packaging/deployment
● Naming/finding services
● Dependency on persistent state
![Page 8: Stateful Services on Mesos - events.static.linuxfound.org Services on Mesos Ankan Mukherjee (ankan@moz.com) Arunabha Ghosh (agh@moz.com)](https://reader034.vdocuments.pub/reader034/viewer/2022042708/5ad019587f8b9a56098df926/html5/thumbnails/8.jpg)
Challenges
● Packaging/deployment
● Naming/finding services
● Dependency on persistent state
![Page 9: Stateful Services on Mesos - events.static.linuxfound.org Services on Mesos Ankan Mukherjee (ankan@moz.com) Arunabha Ghosh (agh@moz.com)](https://reader034.vdocuments.pub/reader034/viewer/2022042708/5ad019587f8b9a56098df926/html5/thumbnails/9.jpg)
The problem
Examples:● Legacy apps● Single node SQL
databases (mysql, postgres)
● Apps that depend on local storage
?
![Page 10: Stateful Services on Mesos - events.static.linuxfound.org Services on Mesos Ankan Mukherjee (ankan@moz.com) Arunabha Ghosh (agh@moz.com)](https://reader034.vdocuments.pub/reader034/viewer/2022042708/5ad019587f8b9a56098df926/html5/thumbnails/10.jpg)
Potential Solutions
● Local storage
● Shared storage
● Network block device
● Mesos persistent resource primitives
● Application specific distributed solutions
![Page 11: Stateful Services on Mesos - events.static.linuxfound.org Services on Mesos Ankan Mukherjee (ankan@moz.com) Arunabha Ghosh (agh@moz.com)](https://reader034.vdocuments.pub/reader034/viewer/2022042708/5ad019587f8b9a56098df926/html5/thumbnails/11.jpg)
Local storage (option 1)
?
● Pin to node● On failure
○ Manually bring the node up○ Rely on existing process
![Page 12: Stateful Services on Mesos - events.static.linuxfound.org Services on Mesos Ankan Mukherjee (ankan@moz.com) Arunabha Ghosh (agh@moz.com)](https://reader034.vdocuments.pub/reader034/viewer/2022042708/5ad019587f8b9a56098df926/html5/thumbnails/12.jpg)
Local storage (option 1)
● Pros○ Easiest (~ no changes)○ Share free resources from node
● Cons○ No auto failover○ Service still coupled to node○ Feels like cheating!
![Page 13: Stateful Services on Mesos - events.static.linuxfound.org Services on Mesos Ankan Mukherjee (ankan@moz.com) Arunabha Ghosh (agh@moz.com)](https://reader034.vdocuments.pub/reader034/viewer/2022042708/5ad019587f8b9a56098df926/html5/thumbnails/13.jpg)
Local storage (option 2)backup
![Page 14: Stateful Services on Mesos - events.static.linuxfound.org Services on Mesos Ankan Mukherjee (ankan@moz.com) Arunabha Ghosh (agh@moz.com)](https://reader034.vdocuments.pub/reader034/viewer/2022042708/5ad019587f8b9a56098df926/html5/thumbnails/14.jpg)
Local storage (option 2)backup
restore
![Page 15: Stateful Services on Mesos - events.static.linuxfound.org Services on Mesos Ankan Mukherjee (ankan@moz.com) Arunabha Ghosh (agh@moz.com)](https://reader034.vdocuments.pub/reader034/viewer/2022042708/5ad019587f8b9a56098df926/html5/thumbnails/15.jpg)
Local storage (option 2)
● Periodic backups to central location● On failure:
○ Restore last known good state to local storage
○ Proceed as usual
backup
restore
![Page 16: Stateful Services on Mesos - events.static.linuxfound.org Services on Mesos Ankan Mukherjee (ankan@moz.com) Arunabha Ghosh (agh@moz.com)](https://reader034.vdocuments.pub/reader034/viewer/2022042708/5ad019587f8b9a56098df926/html5/thumbnails/16.jpg)
Local storage (option 2)
● When and where to backup?
● When and where to restore?○ Which node?○ Which backup?
![Page 17: Stateful Services on Mesos - events.static.linuxfound.org Services on Mesos Ankan Mukherjee (ankan@moz.com) Arunabha Ghosh (agh@moz.com)](https://reader034.vdocuments.pub/reader034/viewer/2022042708/5ad019587f8b9a56098df926/html5/thumbnails/17.jpg)
Local storage (option 2)
● When and where to backup?
● When and where to restore?○ Which node?○ Which backup?
“Automated scripted restore at process start.”
![Page 18: Stateful Services on Mesos - events.static.linuxfound.org Services on Mesos Ankan Mukherjee (ankan@moz.com) Arunabha Ghosh (agh@moz.com)](https://reader034.vdocuments.pub/reader034/viewer/2022042708/5ad019587f8b9a56098df926/html5/thumbnails/18.jpg)
Local storage (option 2)
● Pros:○ Easy to set up ○ Auto failover○ Share free resources
● Cons:○ Scripted restore complexity○ Adversely affected by system & data volume/type○ Time to restore○ Data loss
![Page 19: Stateful Services on Mesos - events.static.linuxfound.org Services on Mesos Ankan Mukherjee (ankan@moz.com) Arunabha Ghosh (agh@moz.com)](https://reader034.vdocuments.pub/reader034/viewer/2022042708/5ad019587f8b9a56098df926/html5/thumbnails/19.jpg)
Shared file system - centralized
![Page 20: Stateful Services on Mesos - events.static.linuxfound.org Services on Mesos Ankan Mukherjee (ankan@moz.com) Arunabha Ghosh (agh@moz.com)](https://reader034.vdocuments.pub/reader034/viewer/2022042708/5ad019587f8b9a56098df926/html5/thumbnails/20.jpg)
Shared file system - centralized
● POSIX compliant centralized shared FS● Example: NFS● Mounted to same path across all nodes● On failure:
○ Let Mesos start new instance on any available node
![Page 21: Stateful Services on Mesos - events.static.linuxfound.org Services on Mesos Ankan Mukherjee (ankan@moz.com) Arunabha Ghosh (agh@moz.com)](https://reader034.vdocuments.pub/reader034/viewer/2022042708/5ad019587f8b9a56098df926/html5/thumbnails/21.jpg)
Shared file system - centralized
What can go wrong?
● What did we just do?
○ Added network between the process and the storage
![Page 22: Stateful Services on Mesos - events.static.linuxfound.org Services on Mesos Ankan Mukherjee (ankan@moz.com) Arunabha Ghosh (agh@moz.com)](https://reader034.vdocuments.pub/reader034/viewer/2022042708/5ad019587f8b9a56098df926/html5/thumbnails/22.jpg)
MasterMaster
Master
MasterMaster
Master
MasterMaster
Master
Node disconnects from master
![Page 23: Stateful Services on Mesos - events.static.linuxfound.org Services on Mesos Ankan Mukherjee (ankan@moz.com) Arunabha Ghosh (agh@moz.com)](https://reader034.vdocuments.pub/reader034/viewer/2022042708/5ad019587f8b9a56098df926/html5/thumbnails/23.jpg)
MasterMaster
Master
MasterMaster
Master
MasterMaster
Master
MasterMaster
Master
Node disconnects and reconnects
![Page 24: Stateful Services on Mesos - events.static.linuxfound.org Services on Mesos Ankan Mukherjee (ankan@moz.com) Arunabha Ghosh (agh@moz.com)](https://reader034.vdocuments.pub/reader034/viewer/2022042708/5ad019587f8b9a56098df926/html5/thumbnails/24.jpg)
MasterMaster
Master
MasterMaster
Master
scaleTo = 2
Task is scaled to >1
![Page 25: Stateful Services on Mesos - events.static.linuxfound.org Services on Mesos Ankan Mukherjee (ankan@moz.com) Arunabha Ghosh (agh@moz.com)](https://reader034.vdocuments.pub/reader034/viewer/2022042708/5ad019587f8b9a56098df926/html5/thumbnails/25.jpg)
MasterMaster
Master
MasterMaster
Master
MasterMaster
Master
Node disconnects from FS
![Page 26: Stateful Services on Mesos - events.static.linuxfound.org Services on Mesos Ankan Mukherjee (ankan@moz.com) Arunabha Ghosh (agh@moz.com)](https://reader034.vdocuments.pub/reader034/viewer/2022042708/5ad019587f8b9a56098df926/html5/thumbnails/26.jpg)
Shared file system - centralized
To summarize, we could end up with…
● Possibly corrupted data if
○ Node disconnects from master but is connected to FS
○ Node disconnects from network & then connects back
○ Somehow the task is “scaled” to >1 instances
● Possibly undesired state of process/service if
○ Node is connected to master but disconnects from FS
![Page 27: Stateful Services on Mesos - events.static.linuxfound.org Services on Mesos Ankan Mukherjee (ankan@moz.com) Arunabha Ghosh (agh@moz.com)](https://reader034.vdocuments.pub/reader034/viewer/2022042708/5ad019587f8b9a56098df926/html5/thumbnails/27.jpg)
Shared file system - centralized
How do we fix this?
MasterMaster
Master
![Page 28: Stateful Services on Mesos - events.static.linuxfound.org Services on Mesos Ankan Mukherjee (ankan@moz.com) Arunabha Ghosh (agh@moz.com)](https://reader034.vdocuments.pub/reader034/viewer/2022042708/5ad019587f8b9a56098df926/html5/thumbnails/28.jpg)
zookeeperzookeeper
Shared file system - centralized
How do we fix this?
MasterMaster
Masterzookeeperlock node ● Use zookeeper exclusive lock
● The process should
○ start only if it has acquired the zk lock
(exit otherwise)
○ exit at any point it loses the zk lock
● Check for FS mount and exit if NA
![Page 29: Stateful Services on Mesos - events.static.linuxfound.org Services on Mesos Ankan Mukherjee (ankan@moz.com) Arunabha Ghosh (agh@moz.com)](https://reader034.vdocuments.pub/reader034/viewer/2022042708/5ad019587f8b9a56098df926/html5/thumbnails/29.jpg)
Shared file system - centralized
● How without changing orig app?
○ New startup app/script (wrapper)
○ entrypoint/startup → wrapper → orig app
zookeeper lock node
![Page 30: Stateful Services on Mesos - events.static.linuxfound.org Services on Mesos Ankan Mukherjee (ankan@moz.com) Arunabha Ghosh (agh@moz.com)](https://reader034.vdocuments.pub/reader034/viewer/2022042708/5ad019587f8b9a56098df926/html5/thumbnails/30.jpg)
Shared file system - centralized
Check:
● Possibly corrupted data if
○ Node disconnects from master but is connected to FS
○ Node disconnects from network & then connects back
○ Somehow the task is “scaled” to >1 instances
● Possibly undesired state of process/service if
○ Node is connected to master but disconnects from FS
![Page 31: Stateful Services on Mesos - events.static.linuxfound.org Services on Mesos Ankan Mukherjee (ankan@moz.com) Arunabha Ghosh (agh@moz.com)](https://reader034.vdocuments.pub/reader034/viewer/2022042708/5ad019587f8b9a56098df926/html5/thumbnails/31.jpg)
Shared file system - centralized
● Pros:○ Easy to set up ○ Process benefits from most features (except scaling)
● Cons:○ Handle mutual exclusion (but this is fairly simple)○ Depends on network speed/latency
![Page 32: Stateful Services on Mesos - events.static.linuxfound.org Services on Mesos Ankan Mukherjee (ankan@moz.com) Arunabha Ghosh (agh@moz.com)](https://reader034.vdocuments.pub/reader034/viewer/2022042708/5ad019587f8b9a56098df926/html5/thumbnails/32.jpg)
Shared file system - distributed
● POSIX compliant distributed shared FS ● Examples: glusterfs, MooseFS, Lustre● Mounted to same path across all nodes● On failure:
○ Let Mesos start new instance on any available node
![Page 33: Stateful Services on Mesos - events.static.linuxfound.org Services on Mesos Ankan Mukherjee (ankan@moz.com) Arunabha Ghosh (agh@moz.com)](https://reader034.vdocuments.pub/reader034/viewer/2022042708/5ad019587f8b9a56098df926/html5/thumbnails/33.jpg)
Shared file system - distributed
● Similar to centralized shared FS● Pros:
○ Process benefits from most features (except scaling)
● Cons:○ Similar as centralized shared FS○ Setup may be complex
○ Replication, data distribution, processing overhead, etc.
![Page 34: Stateful Services on Mesos - events.static.linuxfound.org Services on Mesos Ankan Mukherjee (ankan@moz.com) Arunabha Ghosh (agh@moz.com)](https://reader034.vdocuments.pub/reader034/viewer/2022042708/5ad019587f8b9a56098df926/html5/thumbnails/34.jpg)
Network Block Device
![Page 35: Stateful Services on Mesos - events.static.linuxfound.org Services on Mesos Ankan Mukherjee (ankan@moz.com) Arunabha Ghosh (agh@moz.com)](https://reader034.vdocuments.pub/reader034/viewer/2022042708/5ad019587f8b9a56098df926/html5/thumbnails/35.jpg)
Network Block Device
● Somewhat between local and shared FS● Device mounted to only 1 node at a time● On node failure:
○ Repair & mount device to new node ○ Proceed as usual
![Page 36: Stateful Services on Mesos - events.static.linuxfound.org Services on Mesos Ankan Mukherjee (ankan@moz.com) Arunabha Ghosh (agh@moz.com)](https://reader034.vdocuments.pub/reader034/viewer/2022042708/5ad019587f8b9a56098df926/html5/thumbnails/36.jpg)
Network Block Device
● Pros○ Lesser overhead than a high level protocol like NFS.
● Cons○ Slightly more difficult to manage.○ Failover is not automatic
■ Need to mount to new node (scripted).
○ May need to repair the FS on the NBD at startup (run fsck before mount)
![Page 37: Stateful Services on Mesos - events.static.linuxfound.org Services on Mesos Ankan Mukherjee (ankan@moz.com) Arunabha Ghosh (agh@moz.com)](https://reader034.vdocuments.pub/reader034/viewer/2022042708/5ad019587f8b9a56098df926/html5/thumbnails/37.jpg)
Persistent State Resource Primitives
● New features
○ Storage as a resource
○ Keep data across process restarts
○ Process affinity to data with node (on node restarts)
● Easier to work with storage
![Page 38: Stateful Services on Mesos - events.static.linuxfound.org Services on Mesos Ankan Mukherjee (ankan@moz.com) Arunabha Ghosh (agh@moz.com)](https://reader034.vdocuments.pub/reader034/viewer/2022042708/5ad019587f8b9a56098df926/html5/thumbnails/38.jpg)
Application Specific Solutions
● For mysql:○ Vitess○ Mysos (Apache Cotton)
● Pros○ Replication and availability built in○ Scalable
● Cons○ Relatively more involved setup○ NA for most applications
![Page 39: Stateful Services on Mesos - events.static.linuxfound.org Services on Mesos Ankan Mukherjee (ankan@moz.com) Arunabha Ghosh (agh@moz.com)](https://reader034.vdocuments.pub/reader034/viewer/2022042708/5ad019587f8b9a56098df926/html5/thumbnails/39.jpg)
Stateful services we’re running
● mysql● postgresql● mongodb (single, clustered soon)● redis● rethinkdb● elasticsearch (single, clustered)
![Page 40: Stateful Services on Mesos - events.static.linuxfound.org Services on Mesos Ankan Mukherjee (ankan@moz.com) Arunabha Ghosh (agh@moz.com)](https://reader034.vdocuments.pub/reader034/viewer/2022042708/5ad019587f8b9a56098df926/html5/thumbnails/40.jpg)
Best Practices / Lessons Learnt
● Mount dir at the same point (path)
● Multi-level backup as storage may be SPOF
○ Disk based ones like RAID
○ App specific ones like mysqldump
● Leverage services like zookeeper for mutual exclusion
![Page 41: Stateful Services on Mesos - events.static.linuxfound.org Services on Mesos Ankan Mukherjee (ankan@moz.com) Arunabha Ghosh (agh@moz.com)](https://reader034.vdocuments.pub/reader034/viewer/2022042708/5ad019587f8b9a56098df926/html5/thumbnails/41.jpg)
Best Practices / Lessons Learnt
● Isolate applications at this layer ○ Based on
■ disk space & usage■ disk iops & usage■ network bandwidth & usage
○ Use multiple mounts, specific allocation, etc.● Set up adequate monitoring & alerting
![Page 42: Stateful Services on Mesos - events.static.linuxfound.org Services on Mesos Ankan Mukherjee (ankan@moz.com) Arunabha Ghosh (agh@moz.com)](https://reader034.vdocuments.pub/reader034/viewer/2022042708/5ad019587f8b9a56098df926/html5/thumbnails/42.jpg)
Conclusion
● Although not a natural fit, it is possible to gainfully run
stateful services in Mesos.
● Should be approached as an engineering problem rather
than one with a generic or ideal solution.
![Page 43: Stateful Services on Mesos - events.static.linuxfound.org Services on Mesos Ankan Mukherjee (ankan@moz.com) Arunabha Ghosh (agh@moz.com)](https://reader034.vdocuments.pub/reader034/viewer/2022042708/5ad019587f8b9a56098df926/html5/thumbnails/43.jpg)
Performance Test
● Disclaimer○ Very much dependent on the setup, network, etc.○ YMMV!
● Setup○ local* : ~ 2000r / 1000w IOPS ○ nfs500 : ~ 500 IOPS○ nfs1000: ~ 1000 IOPS*24 10k SAS disks in RAID 10
![Page 44: Stateful Services on Mesos - events.static.linuxfound.org Services on Mesos Ankan Mukherjee (ankan@moz.com) Arunabha Ghosh (agh@moz.com)](https://reader034.vdocuments.pub/reader034/viewer/2022042708/5ad019587f8b9a56098df926/html5/thumbnails/44.jpg)
Performance Test
● System○ Single node mysql server○ Buffer pool size: 128 M
● Tests○ sysbench tests run for 300 seconds
■ default RO & RW tests■ custom WO tests with no reads■ single thread
![Page 45: Stateful Services on Mesos - events.static.linuxfound.org Services on Mesos Ankan Mukherjee (ankan@moz.com) Arunabha Ghosh (agh@moz.com)](https://reader034.vdocuments.pub/reader034/viewer/2022042708/5ad019587f8b9a56098df926/html5/thumbnails/45.jpg)
● Read only queries
● No Begin/Commit
Performance Test
![Page 46: Stateful Services on Mesos - events.static.linuxfound.org Services on Mesos Ankan Mukherjee (ankan@moz.com) Arunabha Ghosh (agh@moz.com)](https://reader034.vdocuments.pub/reader034/viewer/2022042708/5ad019587f8b9a56098df926/html5/thumbnails/46.jpg)
● Read only queries
● With Begin/Commit
Performance Test
![Page 47: Stateful Services on Mesos - events.static.linuxfound.org Services on Mesos Ankan Mukherjee (ankan@moz.com) Arunabha Ghosh (agh@moz.com)](https://reader034.vdocuments.pub/reader034/viewer/2022042708/5ad019587f8b9a56098df926/html5/thumbnails/47.jpg)
● Read/Write queries
● With Begin/Commit
● 26% write queries
Performance Test
![Page 48: Stateful Services on Mesos - events.static.linuxfound.org Services on Mesos Ankan Mukherjee (ankan@moz.com) Arunabha Ghosh (agh@moz.com)](https://reader034.vdocuments.pub/reader034/viewer/2022042708/5ad019587f8b9a56098df926/html5/thumbnails/48.jpg)
● Write only queries
● With Begin/Commit
Performance Test
![Page 49: Stateful Services on Mesos - events.static.linuxfound.org Services on Mesos Ankan Mukherjee (ankan@moz.com) Arunabha Ghosh (agh@moz.com)](https://reader034.vdocuments.pub/reader034/viewer/2022042708/5ad019587f8b9a56098df926/html5/thumbnails/49.jpg)
Performance Test
● For read heavy queries ○ increasing buffer pool size may compensate for
performance decrease with network FS.
● For write heavy queries ○ memory size is less relevant as these are disk bound.
![Page 50: Stateful Services on Mesos - events.static.linuxfound.org Services on Mesos Ankan Mukherjee (ankan@moz.com) Arunabha Ghosh (agh@moz.com)](https://reader034.vdocuments.pub/reader034/viewer/2022042708/5ad019587f8b9a56098df926/html5/thumbnails/50.jpg)
Thanks!