apache hbase 0.98

29
Apache HBase 0.98 Andrew Purtell Committer, Apache HBase, Apache Software Foundation Big Data US Research And Development, Intel 中國翻譯: Tianyou Li,

Upload: apurtell

Post on 20-Oct-2015

116 views

Category:

Documents


5 download

DESCRIPTION

An introduction to new features in Apache HBase, with Chinese translation.

TRANSCRIPT

  • Apache HBase 0.98

    Andrew Purtell Committer, Apache HBase, Apache Software Foundation

    Big Data US Research And Development, Intel

    : Tianyou Li,

  • Who Am I?

    Committer and PMC Member, Apache HBase project

    Apache HBase Committer

    Member of the Big Data Research And Development Group at Intel

    Release manager for Apache HBase 0.98

    Apache HBase 0.98

  • What is Apache HBase?

    A high performance horizontally scalable datastore engine for Big Data, suitable as the store of record for

    mission critical data

    Apache Software Foundation community project

    Apache

    Open source

    Free license

  • HBase and Big Data

    1994-2006: Large Internet companies first encounter Big Data

    1994-2006:

    (Today: 94% corporate data growth YoY)

    (:94%)

  • HBase and Big Data

    2006-today: The openness of the early leaders provides a blueprint for motivated and talented open

    source communities

    2006-: .

    Google Apache, Yahoo,

    FB ?

    Distributed filesystem

    GFS HDFS

    Horizontally scalable database

    BigTable HBase

    Parallel programming model

    MapReduce Hadoop

    Distributed lock manager

    Chubby ZooKeeper

  • HBase and Big Data

    Now: HBase is a foundation of Big Data use cases

    : HBase

  • HBase and Hadoop Sq

    oo

    p

    RD

    B D

    ata

    Co

    llect

    or

    Flu

    me

    Lo

    g D

    ata

    Co

    llect

    or

    Zoo

    kee

    per

    C

    oo

    rdin

    atio

    n

    YARN (MRv2) Cluster Resource

    Manager / MapReduce

    HDFS 2.0 Hadoop Distributed File System

    Giraph Graph analysis

    framework

    HBase Coprocessors Data execution engine

    HBase Distributed Database

    The Java Virtual Machine Hadoop

    Common JNI

    Spark Iterative In-Memory

    Computation

    Mahout Data mining

    Pig Data Manipulation

    Hive Structured Query

    Oozie Data Flow

    Shark Structured Query

    R Statistics

  • The HBase Data Model (HBase )

    (Tablespaces)

    Not a spreadsheet, think of a distributed sorted map

  • How HBase Achieves Scalability

    HBase

    RegionServers

    Table A

    Table B Splits

    Assignments

    Regions

  • HBase As Data Application Platform

    HBase Coprocessors()

    In-process system extension framework()

    Observers

    (Like triggers) () Endpoints

    (Like stored procedures)

    ()

    System integrators can deploy application code that runs where the data resides

  • HBase Differentiators

    HBase RDBMS

    HBase

    Data layout

    Row oriented

    Column oriented

    Transactions

    Multi-row ACID ACID

    Multi-row within region only region

    Query language

    Native SQL SQL

    No native query language SQL

    Security

    AuthN and AuthZ (ACL)

    AuthN and AuthZ (ACL, Visibility labels) new in 0.98 and(, ) 0.98

    Indexes

    On arbitrary columns

    Single row index only

    Max data size

    Terabytes TB

    Petabytes PB

    R/W throughput limits

    1000s of operations per second 1000

    Millions of operations per second

  • New In Apache HBase 0.98.0

    Apache HBase 0.98.0

    New security features and improvements

    Cell tags

    HFile v3

    Transparent server side encryption (HBASE-7544)

    Per-cell ACLs (HBASE-7662)

    Cell level visibility labels (HBASE-7663)

    EXEC access permission checks for Endpoints (HBASE-6104)

    Endpoints EXEC

  • New In Apache HBase 0.98.0

    Apache HBase 0.98.0 New features

    Reverse scans (HBASE-4811)

    MapReduce over snapshots (HBASE-8369)

    MapReduce

    Performance improvements

    Improved WAL write threading model (HBASE-8755)

    WAL Stripe compactions (HBASE-7667)

    REST streaming scans (HBASE-9343)

    REST

  • Cell Tags()

    All values written to HBase are stored into cells

    HBase(cells)

    Cells can now also carry one or more tags

    Cells(tags) Metadata, considered distinct from the key and the value

    , (key and value)

    We use tags to implement per cell ACLs and visibility labels

    (tags)cell

  • HFile Version 3

    New file format, supporting cell tags and block encryption

    Enabled with a site configuration file change

    hfile.format.version = 3

    HFile v2 data is transparently migrated over time as new files are written by flushes and compactions

    HFile v2 flush compaction

  • Transparent Encryption (HBASE-7544)

    Built on a new cryptographic codec and key management framework inside HBase

    HBase

    Transparent encryption of HBase on disk data

    HBase

    Supports schema design that places sensitive information in only a subset of column families

    column families

  • Transparent Encryption (HBASE-7544)

  • Per-Cell ACLs (HBASE-7662)

    Extends the existing HBase ACL model with support

    for persisting and checking per-cell ACL data in tags

    HBasetags

    Backwards compatible

    We timestamp ACLs on a cell like any other

    HBase data for

    straightforward policy

    evolution

  • Visibility Labels (HBASE-7663)

    Visibility expression support via new security coprocessor

    Labels: arbitrary strings

    :

    Expressions: Labels joined in boolean expressions

    :

    Operators: &, |, !, ( )

    : &, |, !, ( )

    secret

    secret | topsecret

    ( secret | topsecret ) & !probationary

  • Visibility Labels (HBASE-7663)

    New client APIs and new shell commands for label management, similar to those of Apache Accumulo, for easy

    migration

    API Apache Accumulo,

    Users specify visibility expressions on cells

    cell

    Users ask for authorizations on Gets and Scans

    (Gets Scans)

    The server decides which authorizations are valid

    Scan results are filtered according to the users visibility

    Scan

  • Endpoint EXEC Grants (HBASE-6104)

    HBase ACLs grant a familiar set of privileges to users and groups:

    HBase : (R)ead, (W)rite, E(X)excute, (C)reate, (A)dmin

    , , ,,

    However, versions prior to 0.98.0 ignore X

    , 0.98.0 E(X)excute ()

    Now access to coprocessor Endpoint invocations can be controlled on a global, per-table, or per-column

    family basis

    (coprocessor Endpoint)column-family

  • Reverse Scans (HBASE-4811)

    A new scanner type that seeks to the end of a range and then steps backwards

    (Scan)

    No longer necessary to manually maintain reverse index tables for descending sorts

    Exposed at the client with a new Scan option

    Scan Scan#setReversed(boolean reversed)

    Performance is on par with normal (forward) scanning

    (Scan)

  • MapReduce Over Snapshots (HBASE-8369)

    Adds MapReduce utilities supporting jobs over snapshots of table data

    MapReduce snapshotMapreduce job

    Clients can skip the HBase API and read HFiles directly on disk from a table snapshot

    HBase API Can increase throughput ~5x by skipping many system layers

    5

    Not recommended from a security perspective

    Built in access control is completely bypassed

  • Improved WAL Write Throughput (HBASE-8755)

    WAL

    Introduces a new threading model for WAL writes that reduces lock contention

    WAL

    Provides better write throughput when under load, a ~15% improvement in write ops/sec at high write

    concurrency

    15%

  • Stripe Compactions (HBASE-7667)

    Stripe compactions split the data inside the region by row key and create sub-ranges of data

    Stripe compactionsrowkeyRegion Sub-ranges are compacted independently

    compact

    Can reduce read latency variability and reduce compaction data volume (write amplification)

    compact

    Some use cases can benefit but the feature is complex to configure and tune, consult the documentation for detail

    ,,

  • REST Streaming Scans (HBASE-9343)

    REST

    Introduces a new scanning mode to the REST API for stateless scanning

    REST API (Scan)

    The client manages paging and limits

    Instead of forcing a batching up of results as they come back from the RegionServers into multiple HTTP

    transactions, the stateless scanner can stream all

    results back to the client over one HTTP connection

    HTTP RegionServersHTTP

  • Upgrading to HBase 0.98.0

    HBase 0.98.0

    Direct upgrade possible from 0.94 0.98 using an offline data migration procedure

    0.94 0.98

    Upgrade from 0.96 0.98 is seamless

    0.96 0.98 Wire compatibility

    Mixed clientserver and serverserver operation with 0.96 possible as long as no 0.98 specific features enabled

    0.98 -> ->

    Binary API compatibility not guaranteed, some applications may need minor changes

    Binary API,

  • Future of HBase 0.98.x Branch

    HBase 0.98.x Branch Minor releases (0.98.1, 0.98.2, etc.) expected, these

    will contain:

    (0.98.1, 0.98.2 .), : Bug fixes

    Bugs Performance improvements

    Deprecations of some APIs for HBase 1.0

    APIsHBase 1.0 Tag compression in HFile

    Tag Hfile Performance improvements for encryption

  • End

    Questions?