open source logging and metric tools
DESCRIPTION
CapitalCamp and GovDays 2014TRANSCRIPT
About This Talk
• Let you visualize your data with OSS tools
• Information on customizing logs from common daemons
• Strong focus on log aggregation, parsing, and search
• Information about drupal.org's logging setup
• Some information on performance metrics tools
• Two-machine demo of Drupal and logging tools
Demo Setup
• 2 Google Cloud Engine g1.small instances
• All instances run collectd to grab system metrics
• 1 'drupal' instance with Apache, Varnish, MySQL, PHP
• 1 'utility' instance with rsyslog host, Jenkins, Graphite, Grafana, ElasticSearch, Logstash, Kibana, bucky
What Are Logs
• Ultimately, logs are about keeping track of events
• Logs are very different; some use custom formats, while some may be in pure XML or JSON
• Some are one line, some are many, like Java stacktraces or MySQL slow query logs
Issues With Logs
• Legal retention requirements
• Require shell access to view
• Not often human-parseable
• Cyborg-friendly tooling
Solving Problems With Log Data
• Find slow pages or queries
• Sort through Drupal logs to trace user action on a site
• Get an average idea of traffic to a particular area
• Track new PHP error types
Ship Those Logs!
• syslog-ng
• rsyslogd
• Ship syslog
• Ship other log files
• Lumberjack (logstash-forwarder)
• Beaver
Shipping Concerns
• Queueing
• Behavior when shipping to remote servers
• Max spool disk usage
• Retries?
• Security
• Encrypted channel
• Encrypted at rest
• Access to sensitive data
Configuring rsyslogd Clients
• Ship logs to another rsyslog server over TCP
• *.* @@utility:514
• This defaults to shipping anything that it would normally log to /var/log/syslog or /var/log/messages
Configuring rsyslogd Servers
• Prevent remote logs from showing up in /var/log/messages
• if $source != 'utility' then ~
• Store logs coming in based on hostname and date
• $template DailyPerHostLogs,"/var/log/rsyslog/%HOSTNAME%/%HOSTNAME%.%$YEAR%-%$MONTH%-%$DAY%.log" *.* -?DailyPerHostLogs;RSYSLOG_TraditionalFileFormat
Configuring rsyslogd Shipping
• Read lines from a particular file and ship over syslog
• $ModLoad imfile$InputFileName /var/log/httpd/access_log $InputFileTag apache_access:$InputFileStateFile state-apache_access $InputFileSeverity info$InputFileFacility local0$InputFilePollInterval 10$InputRunFileMonitor
Configuring rsyslogd Spooling
• Configure spooling and queueing behavior
• $WorkDirectory /var/lib/rsyslog # where to place spool files $ActionQueueFileName fwdRule1 # unique name prefix for spool files $ActionQueueMaxDiskSpace 1g # 1gb space limit $ActionQueueSaveOnShutdown on # save messages to disk on shutdown $ActionQueueType LinkedList # run asynchronously $ActionResumeRetryCount -1 # infinite retries if host is down
Syslog-shipped Log FilesMar 11 15:38:14 drupal drupal: http://192.168.32.3|1394566694|system|192.168.32.1|http://192.168.32.3/admin/modules/list /confirm|http://192.168.32.3/admin/modules|1||php module installed. !Jul 30 15:04:14 drupal varnish_access: 156.40.118.178 - - [30/Jul/2014:15:04:09 +0000] "GET http://23.251.149.143/misc/tableheader.js?n9j5uu HTTP/1.1" 200 1848 "http://23.251.149.143/admin/modules" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_4) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/36.0.1985.125 Safari/537.36" 0.000757 miss
SyslogApr 11 18:35:53 shiftiest dnsmasq-dhcp[23185]: DHCPACK(br100) 192.168.32.4 fa:16:3e:c4:2f:fd varnish4 Mar 11 15:38:14 drupal drupal: http://192.168.32.3|1394566694|system|192.168.32.1|http://192.168.32.3/admin/modules/list /confirm|http://192.168.32.3/admin/modules|1||php module installed.
Apache127.0.0.1 - - [08/Mar/2014:00:36:44 -0500] "GET /dashboard HTTP/1.0" 302 20 "https://68.232.187.42/dashboard/" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_2) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/33.0.1750.146 Safari/537.36"
nginx192.168.32.1 - - [11/Apr/2014:10:44:36 -0400] "GET /kibana/font/fontawesome-webfont.woff?v=3.2.1 HTTP/1.1" 200 43572 "http://192.168.32.6/kibana/" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_2) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/34.0.1847.116 Safari/537.36"
Varnish192.168.32.1 - - [11/Apr/2014:10:47:52 -0400] "GET http://192.168.32.3/themes/seven/images/list-item.png HTTP/1.1" 200 195 "http://192.168.32.3/admin/config" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_2) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/34.0.1847.116 Safari/537.36"
Additional Features
• Apache, nginx, and Varnish all support additional output
• Varnish can log cache hit/miss
• With Logstash we can look at how to normalize these
• A regex engine with built-in named patterns
• Online tools to parse sample logs
Apache
• Configurable log formats are available – http://httpd.apache.org/docs/2.2/mod/mod_log_config.html
• A single LogFormat directive in any Apache configuration file will override all log formats
• The default NCSA combined log format is as follows
• LogFormat "%h %l %u %t \"%r\" %>s %b \"%{Referer}i\" \"%{User-agent}i\"" combined
Apache
• Additional useful information:
• %D Time taken to serve request in microseconds
• %{Host}i Value of the Host HTTP header
• %p Port
• New LogFormat line:
• LogFormat "%h %l %u %t \"%r\" %>s %b \"%{Referer}i\" \"%{User-Agent}i\" %D %{Host}i %p" combined
nginx
• Log formats are defined with the log_format directive – http://nginx.org/en/docs/http/ngx_http_log_module.html#log_format
• You may not override the default NCSA combined format
• log_format combined '$remote_addr - $remote_user [$time_local] ' '"$request" $status $body_bytes_sent ' '"$http_referer" "$http_user_agent"';
Apache127.0.0.1 - - [29/Jul/2014:22:03:07 +0000] "GET /admin/config/development/performance HTTP/1.0" 200 3500 "-" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_4) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/36.0.1985.125 Safari/537.36" !127.0.0.1 - - [29/Jul/2014:22:03:07 +0000] "GET /admin/config/development/performance HTTP/1.0" 200 3500 "-" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_4) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/36.0.1985.125 Safari/537.36" 45304 23.251.149.143 80
nginx
• Additional useful information:
• $request_time Time taken to serve request in seconds with millisecond resolution (e.g. 0.073)
• $http_host Value of the Host HTTP header
• $server_post Port
nginx
• New log_format line and example config for a vhost:
• log_format logstash '$remote_addr - $remote_user [$time_local] ' '"$request" $status $body_bytes_sent ' '"$http_referer" "$http_user_agent" ' '$request_time $http_host $server_port';
• access_log /var/log/nginx/access.log logstash;
nginx70.42.157.6 - - [22/Jul/2014:22:03:30 +0000] "POST /logstash-2014.07.22/_search HTTP/1.0" 200 281190 "http://146.148.34.62/kibana/index.html" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_4) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/36.0.1985.125 Safari/537.36" !70.42.157.6 - - [22/Jul/2014:22:03:30 +0000] "POST /logstash-2014.07.22/_search HTTP/1.0" 200 281190 "http://146.148.34.62/kibana/index.html" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_4) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/36.0.1985.125 Safari/537.36" 0.523 146.148.34.62 80
Varnish
• The varnishncsa daemon outputs NCSA-format logs
• You may pass a different log format to the varnishncsa daemon; many share the same format as Apache
Varnish
• Additional useful information:
• %D Time taken to serve request in seconds with microsecond precision (e.g. 0.000884)
• %{Varnish:hitmiss}x The text "hit" or "miss"
• varnishncsa daemon argument:
• -F '%h %l %u %t \"%r\" %s %b \"%{Referer}i\" \"%{User-agent}i\" %D %{Varnish:hitmiss}x'
Varnish70.42.157.6 - - [29/Jul/2014:22:03:07 +0000] "GET http://23.251.149.143/admin/config/development/performance HTTP/1.0" 200 3500 "-" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_4) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/36.0.1985.125 Safari/537.36" !70.42.157.6 - - [29/Jul/2014:22:03:07 +0000] "GET http://23.251.149.143/admin/config/development/performance HTTP/1.0" 200 3500 "-" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_4) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/36.0.1985.125 Safari/537.36" 0.045969 miss
Proprietary Tools
• Third-party SaaS systems are plentiful in this area
• Splunk
• SumoLogic
• Loggly
• LogEntries
Logstash
• http://logstash.net/
• Great tool to work with logs of ALL sorts
• Has input, filter, and output pipelines
• Inputs can be parsed with different codecs (JSON, netflow)
• http://logstash.net/docs/1.4.2/ describes many options
ElasticSearch
• http://www.elasticsearch.com/
• A Java search engine based on Lucene, similar to SOLR
• Offers a nicer REST API; easy discovery for clustering
Kibana
• Great viewer for Logstash logs
• Needs direct HTTP access to ElasticSearch
• You may need to protect this with nginx or the like
• Uses ElasticSearch features to show statistical information
• Can show any ElasticSearch data, not just Logstash
Grok
• Tool for pulling semantic data from logs; logstash filter
• A regex engine with built-in named patterns
• Online tools to parse sample logs
• http://grokdebug.herokuapp.com/
• http://grokconstructor.appspot.com/
Example:Grokking nginx Logs
192.168.32.1 - - [11/Apr/2014:10:44:36 -0400] "GET /kibana/font/fontawesome-webfont.woff?v=3.2.1 HTTP/1.1" 200 43572 "http://192.168.32.6/kibana/" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_2) AppleWebKit/537.36 (KHTML, like Gecko)
Logstash Config
• By default Logstash looks in /etc/logstash/conf.d/*.conf
• You many include multiple files
• Each must have at least an input, filter, or output stanza
Logstash Configinput { file { path => "/var/log/rsyslog/*/*.log" exclude => "*.bz2" type => syslog sincedb_path => "/var/run/logstash/sincedb" sincedb_write_interval => 10 } }
Logstash Configfilter { if [type] == "syslog" { mutate { add_field => [ "syslog_message", "%{message}" ] remove_field => "message" } grok { match => [ "syslog_message", "%{SYSLOGLINE}" ] } date { match => [ "timestamp", "MMM d HH:mm:ss", "MMM dd HH:mm:ss" ] } # Parse Drupal logs that are logged to syslog.
Logstash Config date { match => [ "timestamp", "MMM d HH:mm:ss", "MMM dd HH:mm:ss" ] } if [program] == "drupal" { grok { match => [ "message", "https?://%{HOSTNAME:vhost}?\|%{NUMBER:d_timestamp}\|(?<d_type>[^\|]*)\|%{IP:d_ip}\|(?<d_request_uri>[^\|]*)\|(?<d_referer>[^\|]*)\|(?<d_uid>[^\|]*)\|(?<d_link>[^\|]*)\|(?<d_message>.*)" ] } }
Logstash Config if [program] == "nginx_access" { ruby { code => "event['duration'] = event['duration'].to_f * 1000.0" } } if [program] == "varnish_access" { ruby { code => "event['duration'] = event['duration'].to_f * 1000.0" } } } }
Logs vs Performance Counters
• Generally, logs capture data at a particular time
• You may also want to keep information about how your servers are running and performing
• A separate set of tools are often used to help monitoring and manage systems performance
• This data can then be trended to chart resource usage and capacity
Proprietary Tools
• Third-party SaaS systems are also plentiful in this area
• DataDog
• Librato Metrics
• Circonus
• New Relic / AppNeta
Time-Series Data
• Generally, performance counters are taken with regular sampling at an interval, known as time-series data
• Several OSS tools exist to store and query time-series data:
• RRDTool
• Whisper
• InfluxDB
First Wave: RRD-based Tools
• Many tools can graph metrics and make and plot RRD files
• Munin
• Cacti
• Ganglia
• collectd
Second Wave: Graphite
• Graphite is a more general tool; it does not collect metrics
• It uses an advanced storage engine called Whisper
• It can buffer data and cache it under heavy load
• It does not require data to be inserted all the time
• It's fully designed to take time-series data and graph it
Grafana
• Grafana is to Graphite as Kibana is to ElasticSearch
• HTML / JavaScript app
• Needs direct HTTP access to Graphite
• You may need to protect this with nginx or the like
Collectd
• http://collectd.org/
• Collectd is a tool that makes it easy to capture many system-level statistics
• It can write to RRD databases or to Graphite
• Collectd is written in C and is efficient; it can remain resident in memory and report on a regular interval
Single Log Host Machine
• CentOS 5
• Dual quad-core Gulftown Xeons (8 cores, 16 threads)
• 16 GB RAM
• 600 GB of HDD storage dedicated to Logstash
Stats
• Consolidating logs from ≈ 10 web servers
• Incoming syslog (Drupal), Apache, nginx, and Varnish logs
• Non-syslog logs are updated every hour with rsync
• > 2 billion logs processed per month
• Indexing is spiky but not constant; load average of 0.5
Links
• http://logstash.net/
• http://elasticsearch.com/
• https://github.com/elasticsearch/kibana/
• http://graphite.wikidot.com/
• http://grafana.org/
Links
• https://collectd.org/
• https://www.drupal.org/documentation/modules/syslog
• https://github.com/elasticsearch/logstash-forwarder