Download - Prometheus on AWS
![Page 1: Prometheus on AWS](https://reader036.vdocuments.pub/reader036/viewer/2022062223/58706e2a1a28ab48378b6f6d/html5/thumbnails/1.jpg)
Prometheus on AWS
![Page 2: Prometheus on AWS](https://reader036.vdocuments.pub/reader036/viewer/2022062223/58706e2a1a28ab48378b6f6d/html5/thumbnails/2.jpg)
About me
• Mitsuhiro Tanda• Infrastructure Engineer @ GREE• Use Prometheus on AWS (1 year)• Grafana committer• @mtanda
![Page 3: Prometheus on AWS](https://reader036.vdocuments.pub/reader036/viewer/2022062223/58706e2a1a28ab48378b6f6d/html5/thumbnails/3.jpg)
Features
• multi-dimensional data model• flexible query language• pull model over HTTP• service discovery• Prometheus values reliability
![Page 4: Prometheus on AWS](https://reader036.vdocuments.pub/reader036/viewer/2022062223/58706e2a1a28ab48378b6f6d/html5/thumbnails/4.jpg)
AWS Monitoring Problems
• Instance lifecycle is short• Instance is launched/terminated by
ASG• Instance workload is not same
among AZ, …
![Page 5: Prometheus on AWS](https://reader036.vdocuments.pub/reader036/viewer/2022062223/58706e2a1a28ab48378b6f6d/html5/thumbnails/5.jpg)
Why we use Prometheus• multi-dimensional data model & flexible query
language– aggregate metrics by Role/AZ, and compare the result– detect the instance which workload is differ among
the Role
• pull model over HTTP & service discovery– specify monitoring target by Role, ...– easily adapt monitoring target increase
![Page 6: Prometheus on AWS](https://reader036.vdocuments.pub/reader036/viewer/2022062223/58706e2a1a28ab48378b6f6d/html5/thumbnails/6.jpg)
multi-dimensional data model
• record instance metadata to labelskey valueinstance_id i-1234abcdinstance_type ec2, rds, elasticache, elb, …instance_model t2.large, m4.large, c4.large,
r3.large, …region ap-northeast-1, us-east-1, …availability_zone ap-northeast-1a, ap-northeast-1c,
…role (instance tag) web, db, …environment (instance tag) production, staging, …
![Page 7: Prometheus on AWS](https://reader036.vdocuments.pub/reader036/viewer/2022062223/58706e2a1a28ab48378b6f6d/html5/thumbnails/7.jpg)
avg(cpu) by (availability_zone)
![Page 8: Prometheus on AWS](https://reader036.vdocuments.pub/reader036/viewer/2022062223/58706e2a1a28ab48378b6f6d/html5/thumbnails/8.jpg)
cpu{role="web"}
![Page 9: Prometheus on AWS](https://reader036.vdocuments.pub/reader036/viewer/2022062223/58706e2a1a28ab48378b6f6d/html5/thumbnails/9.jpg)
avg(cpu) by (role)
![Page 10: Prometheus on AWS](https://reader036.vdocuments.pub/reader036/viewer/2022062223/58706e2a1a28ab48378b6f6d/html5/thumbnails/10.jpg)
Service Discovery
• auto detect monitoring target• Prometheus provides several SD– ec2_sd, consul_sd, kubernetes_sd, file_sd
• (fundamental feature for Pull architecture)
![Page 11: Prometheus on AWS](https://reader036.vdocuments.pub/reader036/viewer/2022062223/58706e2a1a28ab48378b6f6d/html5/thumbnails/11.jpg)
ec2_sd• detect monitoring target by ec2:DescribeInstances API• specify monitoring target by AZ, Instance Tags, ...• example setting for specifying Web Role target
- job_name: 'job_name' ec2_sd_configs: - region: ap-northeast-1 port: 9100 relabel_configs: - source_labels: [__meta_ec2_tag_Role] regex: web.* action: keep
![Page 12: Prometheus on AWS](https://reader036.vdocuments.pub/reader036/viewer/2022062223/58706e2a1a28ab48378b6f6d/html5/thumbnails/12.jpg)
How we deploy setting
Prometheus(for web)
Prometheus(for db)
Role=web Role=db
pack
upload
deploy
edit
このロゴは Jenkins project (https://jenkins.io/)に帰属します。
![Page 13: Prometheus on AWS](https://reader036.vdocuments.pub/reader036/viewer/2022062223/58706e2a1a28ab48378b6f6d/html5/thumbnails/13.jpg)
CloudWatch support• We store CloudWatch metrics to Prometheus• Don't use cloudwatch_exporter, because it's depend on
Java• Create in-house CloudWatch exporter by aws-sdk-go• Recording timestamp cause some problems
– CloudWatch metrics emission is delayed for several minutes– Prometheus treat the metrics as stale, and drop it– I give up to record timestamp for some metrics
![Page 14: Prometheus on AWS](https://reader036.vdocuments.pub/reader036/viewer/2022062223/58706e2a1a28ab48378b6f6d/html5/thumbnails/14.jpg)
Instance Spec we use• use t2.micro - t2.medium instance• use gp2 EBS, volume size is 50-100GB• If the number of monitoring target is 50-100, t2.medium is
enough to monitor them• I recommend to use t2.small or upper
– t2.micro's memory size is not enough– need to change storage.local.memory-chunks
• Sudden load increase can handled by Burst– t2 Instance burst– EBS(gp2) burst
![Page 15: Prometheus on AWS](https://reader036.vdocuments.pub/reader036/viewer/2022062223/58706e2a1a28ab48378b6f6d/html5/thumbnails/15.jpg)
Disk write workload
![Page 16: Prometheus on AWS](https://reader036.vdocuments.pub/reader036/viewer/2022062223/58706e2a1a28ab48378b6f6d/html5/thumbnails/16.jpg)
Disk usage• calculate per monitoring target instance• We have 150 - 300 metrics per one
instance• scrape interval is 15 seconds• Disk usage becomes approximately
200MB per 1 month
![Page 17: Prometheus on AWS](https://reader036.vdocuments.pub/reader036/viewer/2022062223/58706e2a1a28ab48378b6f6d/html5/thumbnails/17.jpg)
Long term metrics storage
• Prometheus doesn't support summarize metrics like rrdtool • The data size becomes large if you set long retention period• The default retention period is 15 days• Prometheus is not designed for long term metrics storage• To store metrics for a long term
– Use Remote Storage (e.g. Graphite)– Launch another Prometheus for long term storage, and store
summarized metrics data (we create metrics summarize exporter)
![Page 18: Prometheus on AWS](https://reader036.vdocuments.pub/reader036/viewer/2022062223/58706e2a1a28ab48378b6f6d/html5/thumbnails/18.jpg)
Using 1 year• daily operation
– Prometheus workload is very stable– mostly no operation required
• upgrade Prometheus– need to change configuration file due to format change– breaking change will come until version 1.0
• support new monitoring target middleware– create exporter for each middleware– by using Prometheus powerful query, exporter becomes very
simple
![Page 19: Prometheus on AWS](https://reader036.vdocuments.pub/reader036/viewer/2022062223/58706e2a1a28ab48378b6f6d/html5/thumbnails/19.jpg)
Reference URL
• http://www.robustperception.io/automatically-monitoring-ec2-instances/
• http://www.robustperception.io/how-to-have-labels-for-machine-roles/
• http://www.robustperception.io/life-of-a-label/• http://www.slideshare.net/FabianReinartz/prometheus-stora
ge-57557499