open source logging and metric tools

69
Open Source Logging and Metrics Tools CapitalCamp and Gov Days 2014

Upload: phase2

Post on 10-May-2015

1.099 views

Category:

Technology


13 download

DESCRIPTION

CapitalCamp and GovDays 2014

TRANSCRIPT

Open Source Loggingand Metrics Tools

CapitalCamp and Gov Days 2014

Introduction

Director of Engineering, Phase2

Steven Merrill

Twitter: @stevenmerrill

About This Talk

• Let you visualize your data with OSS tools

• Information on customizing logs from common daemons

• Strong focus on log aggregation, parsing, and search

• Information about drupal.org's logging setup

• Some information on performance metrics tools

• Two-machine demo of Drupal and logging tools

Demo:ELK Stack in Action

Demo Setup

• 2 Google Cloud Engine g1.small instances

• All instances run collectd to grab system metrics

• 1 'drupal' instance with Apache, Varnish, MySQL, PHP

• 1 'utility' instance with rsyslog host, Jenkins, Graphite, Grafana, ElasticSearch, Logstash, Kibana, bucky

Logs

Ceci n'est pas une log

Logs are time + data.

Creator of Logstash

Jordan Sissel

“ ”

What Are Logs

• Ultimately, logs are about keeping track of events

• Logs are very different; some use custom formats, while some may be in pure XML or JSON

• Some are one line, some are many, like Java stacktraces or MySQL slow query logs

Who Produces Logs

• Drupal

• nginx

• Apache

• Varnish

• Jenkins

• SOLR

• MySQL

• cron

• sudo

• ...

Types of Logs

• Error Logs

• Transaction Logs

• Trace Logs

• Debug Logs

Issues With Logs

• Legal retention requirements

• Require shell access to view

• Not often human-parseable

• Cyborg-friendly tooling

Solving Problems With Log Data

• Find slow pages or queries

• Sort through Drupal logs to trace user action on a site

• Get an average idea of traffic to a particular area

• Track new PHP error types

Shipping Logs

Ship Those Logs!

• syslog-ng

• rsyslogd

• Ship syslog

• Ship other log files

• Lumberjack (logstash-forwarder)

• Beaver

Shipping Concerns

• Queueing

• Behavior when shipping to remote servers

• Max spool disk usage

• Retries?

• Security

• Encrypted channel

• Encrypted at rest

• Access to sensitive data

Configuring rsyslogd Clients

• Ship logs to another rsyslog server over TCP

• *.* @@utility:514

• This defaults to shipping anything that it would normally log to /var/log/syslog or /var/log/messages

Configuring rsyslogd Servers

• Prevent remote logs from showing up in /var/log/messages

• if $source != 'utility' then ~

• Store logs coming in based on hostname and date

• $template DailyPerHostLogs,"/var/log/rsyslog/%HOSTNAME%/%HOSTNAME%.%$YEAR%-%$MONTH%-%$DAY%.log" *.* -?DailyPerHostLogs;RSYSLOG_TraditionalFileFormat

Configuring rsyslogd Shipping

• Read lines from a particular file and ship over syslog

• $ModLoad imfile$InputFileName /var/log/httpd/access_log $InputFileTag apache_access:$InputFileStateFile state-apache_access $InputFileSeverity info$InputFileFacility local0$InputFilePollInterval 10$InputRunFileMonitor

Configuring rsyslogd Spooling

• Configure spooling and queueing behavior

• $WorkDirectory /var/lib/rsyslog # where to place spool files $ActionQueueFileName fwdRule1 # unique name prefix for spool files $ActionQueueMaxDiskSpace 1g # 1gb space limit $ActionQueueSaveOnShutdown on # save messages to disk on shutdown $ActionQueueType LinkedList # run asynchronously $ActionResumeRetryCount -1 # infinite retries if host is down

Syslog-shipped Log FilesMar 11 15:38:14 drupal drupal: http://192.168.32.3|1394566694|system|192.168.32.1|http://192.168.32.3/admin/modules/list /confirm|http://192.168.32.3/admin/modules|1||php module installed. !Jul 30 15:04:14 drupal varnish_access: 156.40.118.178 - - [30/Jul/2014:15:04:09 +0000] "GET http://23.251.149.143/misc/tableheader.js?n9j5uu HTTP/1.1" 200 1848 "http://23.251.149.143/admin/modules" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_4) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/36.0.1985.125 Safari/537.36" 0.000757 miss

Log Formats

SyslogApr 11 18:35:53 shiftiest dnsmasq-dhcp[23185]: DHCPACK(br100) 192.168.32.4 fa:16:3e:c4:2f:fd varnish4 Mar 11 15:38:14 drupal drupal: http://192.168.32.3|1394566694|system|192.168.32.1|http://192.168.32.3/admin/modules/list /confirm|http://192.168.32.3/admin/modules|1||php module installed.

Apache127.0.0.1 - - [08/Mar/2014:00:36:44 -0500] "GET /dashboard HTTP/1.0" 302 20 "https://68.232.187.42/dashboard/" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_2) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/33.0.1750.146 Safari/537.36"

nginx192.168.32.1 - - [11/Apr/2014:10:44:36 -0400] "GET /kibana/font/fontawesome-webfont.woff?v=3.2.1 HTTP/1.1" 200 43572 "http://192.168.32.6/kibana/" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_2) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/34.0.1847.116 Safari/537.36"

Varnish192.168.32.1 - - [11/Apr/2014:10:47:52 -0400] "GET http://192.168.32.3/themes/seven/images/list-item.png HTTP/1.1" 200 195 "http://192.168.32.3/admin/config" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_2) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/34.0.1847.116 Safari/537.36"

Additional Features

• Apache, nginx, and Varnish all support additional output

• Varnish can log cache hit/miss

• With Logstash we can look at how to normalize these

• A regex engine with built-in named patterns

• Online tools to parse sample logs

Apache

• Configurable log formats are available – http://httpd.apache.org/docs/2.2/mod/mod_log_config.html

• A single LogFormat directive in any Apache configuration file will override all log formats

• The default NCSA combined log format is as follows

• LogFormat "%h %l %u %t \"%r\" %>s %b \"%{Referer}i\" \"%{User-agent}i\"" combined

Apache

• Additional useful information:

• %D Time taken to serve request in microseconds

• %{Host}i Value of the Host HTTP header

• %p Port

• New LogFormat line:

• LogFormat "%h %l %u %t \"%r\" %>s %b \"%{Referer}i\" \"%{User-Agent}i\" %D %{Host}i %p" combined

nginx

• Log formats are defined with the log_format directive – http://nginx.org/en/docs/http/ngx_http_log_module.html#log_format

• You may not override the default NCSA combined format

• log_format combined '$remote_addr - $remote_user [$time_local] ' '"$request" $status $body_bytes_sent ' '"$http_referer" "$http_user_agent"';

Apache127.0.0.1 - - [29/Jul/2014:22:03:07 +0000] "GET /admin/config/development/performance HTTP/1.0" 200 3500 "-" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_4) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/36.0.1985.125 Safari/537.36" !127.0.0.1 - - [29/Jul/2014:22:03:07 +0000] "GET /admin/config/development/performance HTTP/1.0" 200 3500 "-" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_4) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/36.0.1985.125 Safari/537.36" 45304 23.251.149.143 80

nginx

• Additional useful information:

• $request_time Time taken to serve request in seconds with millisecond resolution (e.g. 0.073)

• $http_host Value of the Host HTTP header

• $server_post Port

nginx

• New log_format line and example config for a vhost:

• log_format logstash '$remote_addr - $remote_user [$time_local] ' '"$request" $status $body_bytes_sent ' '"$http_referer" "$http_user_agent" ' '$request_time $http_host $server_port';

• access_log /var/log/nginx/access.log logstash;

nginx70.42.157.6 - - [22/Jul/2014:22:03:30 +0000] "POST /logstash-2014.07.22/_search HTTP/1.0" 200 281190 "http://146.148.34.62/kibana/index.html" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_4) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/36.0.1985.125 Safari/537.36" !70.42.157.6 - - [22/Jul/2014:22:03:30 +0000] "POST /logstash-2014.07.22/_search HTTP/1.0" 200 281190 "http://146.148.34.62/kibana/index.html" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_4) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/36.0.1985.125 Safari/537.36" 0.523 146.148.34.62 80

Varnish

• The varnishncsa daemon outputs NCSA-format logs

• You may pass a different log format to the varnishncsa daemon; many share the same format as Apache

Varnish

• Additional useful information:

• %D Time taken to serve request in seconds with microsecond precision (e.g. 0.000884)

• %{Varnish:hitmiss}x The text "hit" or "miss"

• varnishncsa daemon argument:

• -F '%h %l %u %t \"%r\" %s %b \"%{Referer}i\" \"%{User-agent}i\" %D %{Varnish:hitmiss}x'

Varnish70.42.157.6 - - [29/Jul/2014:22:03:07 +0000] "GET http://23.251.149.143/admin/config/development/performance HTTP/1.0" 200 3500 "-" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_4) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/36.0.1985.125 Safari/537.36" !70.42.157.6 - - [29/Jul/2014:22:03:07 +0000] "GET http://23.251.149.143/admin/config/development/performance HTTP/1.0" 200 3500 "-" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_4) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/36.0.1985.125 Safari/537.36" 0.045969 miss

Automated Tools

Proprietary Tools

• Third-party SaaS systems are plentiful in this area

• Splunk

• SumoLogic

• Loggly

• LogEntries

Logstash

• http://logstash.net/

• Great tool to work with logs of ALL sorts

• Has input, filter, and output pipelines

• Inputs can be parsed with different codecs (JSON, netflow)

• http://logstash.net/docs/1.4.2/ describes many options

ElasticSearch

• http://www.elasticsearch.com/

• A Java search engine based on Lucene, similar to SOLR

• Offers a nicer REST API; easy discovery for clustering

Kibana

• Great viewer for Logstash logs

• Needs direct HTTP access to ElasticSearch

• You may need to protect this with nginx or the like

• Uses ElasticSearch features to show statistical information

• Can show any ElasticSearch data, not just Logstash

Grok

• Tool for pulling semantic data from logs; logstash filter

• A regex engine with built-in named patterns

• Online tools to parse sample logs

• http://grokdebug.herokuapp.com/

• http://grokconstructor.appspot.com/

Example:Grokking nginx Logs

192.168.32.1 - - [11/Apr/2014:10:44:36 -0400] "GET /kibana/font/fontawesome-webfont.woff?v=3.2.1 HTTP/1.1" 200 43572 "http://192.168.32.6/kibana/" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_2) AppleWebKit/537.36 (KHTML, like Gecko)

Configuring Logstash

Logstash Config

• By default Logstash looks in /etc/logstash/conf.d/*.conf

• You many include multiple files

• Each must have at least an input, filter, or output stanza

Logstash Configinput { file { path => "/var/log/rsyslog/*/*.log" exclude => "*.bz2" type => syslog sincedb_path => "/var/run/logstash/sincedb" sincedb_write_interval => 10 } }

Logstash Configfilter { if [type] == "syslog" { mutate { add_field => [ "syslog_message", "%{message}" ] remove_field => "message" } grok { match => [ "syslog_message", "%{SYSLOGLINE}" ] } date { match => [ "timestamp", "MMM d HH:mm:ss", "MMM dd HH:mm:ss" ] } # Parse Drupal logs that are logged to syslog.

Logstash Config date { match => [ "timestamp", "MMM d HH:mm:ss", "MMM dd HH:mm:ss" ] } if [program] == "drupal" { grok { match => [ "message", "https?://%{HOSTNAME:vhost}?\|%{NUMBER:d_timestamp}\|(?<d_type>[^\|]*)\|%{IP:d_ip}\|(?<d_request_uri>[^\|]*)\|(?<d_referer>[^\|]*)\|(?<d_uid>[^\|]*)\|(?<d_link>[^\|]*)\|(?<d_message>.*)" ] } }

Logstash Config if [program] == "nginx_access" { ruby { code => "event['duration'] = event['duration'].to_f * 1000.0" } } if [program] == "varnish_access" { ruby { code => "event['duration'] = event['duration'].to_f * 1000.0" } } } }

Monitoring and Performance Metrics

Logs vs Performance Counters

• Generally, logs capture data at a particular time

• You may also want to keep information about how your servers are running and performing

• A separate set of tools are often used to help monitoring and manage systems performance

• This data can then be trended to chart resource usage and capacity

Proprietary Tools

• Third-party SaaS systems are also plentiful in this area

• DataDog

• Librato Metrics

• Circonus

• New Relic / AppNeta

Time-Series Data

• Generally, performance counters are taken with regular sampling at an interval, known as time-series data

• Several OSS tools exist to store and query time-series data:

• RRDTool

• Whisper

• InfluxDB

First Wave: RRD-based Tools

• Many tools can graph metrics and make and plot RRD files

• Munin

• Cacti

• Ganglia

• collectd

Second Wave: Graphite

• Graphite is a more general tool; it does not collect metrics

• It uses an advanced storage engine called Whisper

• It can buffer data and cache it under heavy load

• It does not require data to be inserted all the time

• It's fully designed to take time-series data and graph it

Grafana

• Grafana is to Graphite as Kibana is to ElasticSearch

• HTML / JavaScript app

• Needs direct HTTP access to Graphite

• You may need to protect this with nginx or the like

Collectd

• http://collectd.org/

• Collectd is a tool that makes it easy to capture many system-level statistics

• It can write to RRD databases or to Graphite

• Collectd is written in C and is efficient; it can remain resident in memory and report on a regular interval

Demo: Graphite / collectd / Grafana

The Drupal.orgLogging Setup

Single Log Host Machine

• CentOS 5

• Dual quad-core Gulftown Xeons (8 cores, 16 threads)

• 16 GB RAM

• 600 GB of HDD storage dedicated to Logstash

Software

• ElasticSearch 0.90

• Logstash 1.2

• Kibana 3.0.0m3

• Curator 0.6.2

Stats

• Consolidating logs from ≈ 10 web servers

• Incoming syslog (Drupal), Apache, nginx, and Varnish logs

• Non-syslog logs are updated every hour with rsync

• > 2 billion logs processed per month

• Indexing is spiky but not constant; load average of 0.5

Questions?

Resources

Links

• http://logstash.net/

• http://elasticsearch.com/

• https://github.com/elasticsearch/kibana/

• http://graphite.wikidot.com/

• http://grafana.org/

Links

• https://collectd.org/

• https://www.drupal.org/documentation/modules/syslog

• https://github.com/elasticsearch/logstash-forwarder

PHASE2TECHNOLOGY.COM