data onboarding

Download Data Onboarding

Post on 25-Jan-2017

408 views

Category:

Technology

0 download

Embed Size (px)

TRANSCRIPT

  • Copyright 2014 Splunk Inc.

    Data Onboarding

    Ingestion without the

    Indigestion

    Jeff Meyers Sales Engineer

  • Major components involved in data indexing What happens to data within Splunk What the data pipeline is & how to influence it

    Shaping data understanding via props.conf Configuring data inputs via inputs.conf

    What goes where

    Heavy Forwarders vs. Universal Forwarders How to get your data into Splunk (mostly correctly)

    ~ 60 minutes from now...

  • SystemaMc way to bring new data sources into Splunk

    Make sure that new data is instantly usable & has maximum value for users

    Goes hand-in-hand with the User Onboarding process (sold separately)

    What is the Data Onboarding Process?

  • 4

    Machine Data > Business Value Index Untapped Data: Any Source, Type, Volume

    Online Services Web

    Services

    Servers Security GPS

    LocaMon

    Storage Desktops

    Networks

    Packaged ApplicaMons

    Custom ApplicaMons Messaging

    Telecoms Online

    Shopping Cart

    Web Clickstream

    s

    Databases

    Energy Meters

    Call Detail Records

    Smartphones and Devices

    RFID

    On- Premises

    Private Cloud

    Public Cloud

    Ask Any QuesMon

    ApplicaMon Delivery

    Security, Compliance and Fraud

    IT OperaMons

    Business AnalyMcs

    Industrial Data and the Internet of Things

  • Flavors of Machine Data

    Order Processing

    TwiRer

    Care IVR

    Middleware Error

  • Getting Data Into Splunk

    6

    Agent and Agent-less Approach for Flexibility

    perf

    shell code

    Mounted File Systems \\hostname\mount

    syslog TCP/UDP

    WMI Event Logs Performance

    AcMve Directory

    syslog compaMble hosts and network devices

    Unix, Linux and Windows hosts

    Windows hosts Custom apps and scripted API connecMons

    Local File Monitoring log files, config files dumps and trace files

    Windows Inputs Event Logs

    performance counters registry monitoring

    AcAve Directory monitoring

    virtual host

    Windows hosts

    Scripted Inputs shell scripts custom parsers batch loading

    Agent-less Data Input Splunk Forwarder

  • Splunk Data Ingest

    UF UF HF UF

    IDX

    SH

    Splunk Enterprise (with opMonal configs)

    Splunk Universal Forwarder

    Summary: when it comes to "core" Splunk, there are two dis8nct products: Splunk Universal Forwarder and Splunk Enterprise. "Everything else" Indexer, Search Head, License Server, Deployment Server, Cluster Master, Deployer, Heavy Forwarder, etc. are all instances of Splunk Enterprise with varying configs.

  • Data Pipeline (what the what?)

  • The Data Pipeline

  • The Data Pipeline

    Any QuesMons?

  • The Data Pipeline

  • Input Processors: Monitor, FIFO, UDP, TCP, Scripted

    No events yet-- just a stream of bytes

    Break data stream into 64KB blocks

    Annotate stream with metadata keys (host, source, sourcetype, index, etc.)

    Can happen on UF, HF or indexer

    Inputs Where it all starts

  • Check character set

    Break lines

    Process headers

    Can happen on HF or indexer

    Parsing

  • Merge lines for mulM-line events

    IdenMfy events (finally!)

    Extract Mmestamps

    Exclude events based on Mmestamp (MAX_DAYS_AGO, ..)

    Can happen on HF or indexer

    AggregaMon/Merging

  • Do regex replacement (field extracMon, punctuaMon extracMon, event rouMng, host/source/sourcetype overrides)

    Annotate events with metadata keys (host, source, sourcetype, ..)

    Can happen on HF or indexer

    Typing

  • Output processors: TCP, syslog, HTTP indexAndForward Sign blocks Calculate license volume and throughput metrics Index [Write to disk ] / [forward elsewhere] / ... Can happen on HF or indexer

    Indexing

  • The Data Pipeline

  • Data Pipeline: UF & Indexer

  • Data Pipeline: HF & Indexer

  • Data Pipeline: UF, IF & Indexer

  • UF vs. HF

    209.160.24.63 - - [23/Feb/2016:18:22:16] "GET /oldlink?itemId=EST-6&JSESSIONID=SD0SL6FF7AD... 209.160.24.63 - - [23/Feb/2016:18:22:17] "GET /product.screen?productId=BS-AG-G09&JSESSION... 209.160.24.63 - - [23/Feb/2016:18:22:19] "POST /category.screen?categoryId=STRATEGY&JSESSI... 209.160.24.63 - - [23/Feb/2016:18:22:20] "GET /product.screen?productId=FS-SG-G03&JSESSION... 209.160.24.63 - - [23/Feb/2016:18:22:20] "POST /cart.do?acMon=addtocart&itemId=EST-21&pro... 209.160.24.63 - - [23/Feb/2016:18:22:21] "POST /cart.do?acMon=purchase&itemId=EST-21&JSES... 209.160.24.63 - - [23/Feb/2016:18:22:22] "POST /cart/success.do?JSESSIONID=SD0SL6FF7ADFF49... 209.160.24.63 - - [23/Feb/2016:18:22:21] "GET /cart.do?acMon=remove&itemId=EST-11&product... 209.160.24.63 - - [23/Feb/2016:18:22:22] "GET /oldlink?itemId=EST-14&JSESSIONID=SD0SL6FF7A... 112.111.162.4 - - [23/Feb/2016:18:26:36] "GET /product.screen?productId=WC-SH-G04&JSESSION...

    209.160.24.63 - - [23/Feb/2016:18:22:16] "GET /oldlink?itemId=EST-6&JSESSIONID=SD0SL6FF7AD...

    209.160.24.63 - - [23/Feb/2016:18:22:17] "GET /product.screen?productId=BS-AG-G09&SSN=xxxyyyzzz...

    sourcetype=access_combined, _8me=1456251739, index=foo, host=bar,

    sourcetype=access_combined, _8me=1456251739, index=foo, host=bar,

    sourcetype=access_combined, index=foo, host=bar,

    UF

    HF emits events

    emits chunks of data

  • Splunk Data Ingest

    UF UF HF UF

    IDX

    SH

    Parsing

    Not Parsing

    Note: the data is parsed at the first component that has a parsing engine and not again This effects where you put certain props.conf and transforms.conf files (a.k.a. some8mes they go on the forwarder)

  • Data Onboarding Process (bringing it together)

  • IdenMfy the specific sourcetype(s) - onboard each separately Check for pre-exisMng app/TA on splunk.com-- don't reinvent the wheel! Gather info

    Where does this data originate/reside? How will Splunk collect it? Which users/groups will need access to this data? Access controls? Determine the indexing volume and data retenMon requirements Will this data need to drive exisMng dashboards (ES, PCI, etc.)? Who is the SME for this data?

    Map it out Get a "big enough" sample of the event data IdenMfy and map out fields Assign sourcetype and TA names according to CIM convenMons

    On-boarding Process

  • Dev Create (or use) an app Props / inputs definiMon

    Sourcetype definiMon Use data import wizard Import, tweak, repeat Oneshot [hook up monitor]

    On-boarding Process

    Prod Deploy app Validate Monitor

    Test Deploy app Oneshost Validate Hook up monitor Validate

    1 2

    3

  • General: Use apps for configs

    Use TAs / add-ons from Splunk if possible Use dev, test, prod

    Dev can be laptop, test can be ephemeral UF when possible

    HF only if filtering / transforming is required in foreign land Unique Sourcetype per event stream Don't send data through Search Heads Don't send data direct to Indexers

    Good Hygiene

  • inputs.conf As specific as possible Set sourcetype, if possible

    Don't let splunk auto-sourcetype (no ...too_small) Specify index if possible

    props.conf Set: TIME_PREFIX, TIME_FORMAT, MAX_TIMESTAMP_LOOKAHEAD

    OpMmally: SHOULD_LINEMERGE = false, LINE_BREAKER, TRUNCATE

    Good Hygiene

  • Data Onboarding Process (details)

  • IdenMfy the specific sourcetype(s) - onboard each separately Check for pre-exisMng app/TA on splunk.com-- don't reinvent the wheel! Gather info

    Where does this data originate/reside? How will Splunk collect it? Which users/groups will need access to this data? Access controls? Determine the indexing volume and data retenMon requirements Will this data need to drive exisMng dashboards (ES, PCI, etc.)? Who is the SME for this data?

    Map it out Get a "big enough" sample of the event data IdenMfy and map out fields Assign sourcetype and TA names according to CIM convenMons

    Pre-Board

  • The Common InformaMon Model (CIM) defines relaMonships in the underlying data, while leaving the raw machine data intact

    A naming convenMon for fields, evensypes & tags More advanced reporMng and correlaMon requires that the data be normalized, categorized, and parsed

    CIM-compliant data sources can drive CIM-based dashboards (ES, PCI, others)

    Tangent: What is the CIM and why should I care?

  • IdenMfy necessary configs (inputs, props and transforms) to properly handle:

    Mmestamp extracMon, Mmezone, event breaking, sourcetype/host/source assignments

    Do events contain sensiMve data (i.e., PII, PAN, etc.)? Create masking transforms if necessary

    Package all index-Mme configs into the TA

    Build the index-Mme configs

  • Assign sourcetype according to event format; events with similar format should have the same sourcetype

    When do I need a separate index? When the data volume will be very large, or when it will be searched exclusively a lot

    When access to the data needs to be controlled When the data requires a specific data retenMon policy

    Resist the temptaMon to create lots of indexes

    Tangent: Best & Worst PracMces

  • Always specify a sourcetype