20171027 モニタリング勉強会

14
Operating Prometheus モニタリング勉強会 2017/10/27 @kfdm

Upload: paul-traylor

Post on 22-Jan-2018

1.982 views

Category:

Software


0 download

TRANSCRIPT

Page 1: 20171027 モニタリング勉強会

Operating Prometheusモニタリング勉強会2017/10/27 @kfdm

Page 2: 20171027 モニタリング勉強会

Self Introduction• Paul Traylor• LINE Fukuoka 開発室• Currently responsible for updating monitoring environment at

LINE Fukuoka• https://github.com/line/promgen• https://promcon.io/2017-munich/talks/prometheus-as-a-

internal-service/

Page 3: 20171027 モニタリング勉強会

Operating Prometheus at LINE Fukuoka• 4 HA Pairs• ~2000 targets

per machine• ~800k samples

per machine

• ~3.5 million samples• ~7000 exporters

https://github.com/line/promgen

Page 4: 20171027 モニタリング勉強会

Scaling Prometheus ‒ HA• Run multiple Prometheus

instance with the same targets• Alerts are de-duplicated by Alertmanager

Page 5: 20171027 モニタリング勉強会

Scaling Prometheus ‒ Shard• Split targets

across multipleservers• Alertmanager

de-duplicatesalerts• Proxy or remote

read

Page 6: 20171027 モニタリング勉強会

Prometheus 1.8 ‒ Storage Format

https://promcon.io/2016-berlin/talks/the-prometheus-time-series-database/

http://labs.gree.jp/blog/2017/10/16614/

• One series per file• Rewrites may have

to touch millionsof files• Queries also may

touch millions offiles• No easy way to backup

Page 7: 20171027 モニタリング勉強会

Prometheus 2.0 ‒ New Storage Format

https://promcon.io/2017-munich/slides/storing-16-bytes-at-scale.pdfhttps://fabxc.org/blog/2017-04-10-writing-a-tsdb/

• Chunks stored in buckets by time• Chunks past retention setting are just deleted• Easier to backup• Easier to compress

Page 8: 20171027 モニタリング勉強会

Prometheus 2.0 ‒ Backups

├── 01BX40G8TA6T1MNSS8JJE7ENPY/│ ├── chunks/│ ├── index│ ├── meta.json│ └── tombstones├── 01BX5Y9SSE10VBZK4CMZ86WDR6/│ ├── chunks/│ ├── index│ ├── meta.json│ └── tombstones├── lock└── wal/├── 000760└──000761

• https://github.com/Gouthamve/agni

Page 9: 20171027 モニタリング勉強会

Prometheus 2.0 ‒ Flag Changes• Most flags move from single dash to double dash• Many storage settings move to tsdb settings• -config.file -> --config.file• -storage.local.path -> --storage.tsdb.path

Page 10: 20171027 モニタリング勉強会

Prometheus 2.0 ‒ Rule Format Changes

https://www.robustperception.io/converting-rules-to-the-prometheus-2-0-format/

groups:- name: alert.rulesrules:- alert: HighErrorRateexpr: job:request_latency_seconds:mean5m{job="myjob"}> 0.5for: 10mannotations:summary: High request latency- alert: DailyTestexpr: vector(1)for: 1mannotations:summary: Daily alert test

• ./promtool update rules /path/to/rules

Page 11: 20171027 モニタリング勉強会

Prometheus 2.0 ‒ Migration

Page 12: 20171027 モニタリング勉強会

Prometheus 2.0 ‒ Remote Read• Prometheus 1.8 (Read)• InfluxDB (Read and Write)• Graphite (Write)• OpenTSDB (Write)• TimescaledB (Read and Write)• https://prometheus.io/docs/operating/integrations/• https://github.com/prometheus/prometheus/tree/master/do

cumentation/examples/remote_storage/remote_storage_adapter

Page 13: 20171027 モニタリング勉強会

Open Metrics• https://github.com/RichiH/OpenMetrics• https://github.com/RichiH/OpenMetrics/blob/master/CONT

RIBUTORS.md

Page 14: 20171027 モニタリング勉強会

Questions?