building apps with hbase - big data techcon boston

52
1 Headline Goes Here Speaker Name or Subhead Goes Here Building applica2ons with HBase Amandeep Khurana | Solu7ons Architect Big Data TechCon, Boston, April 2013 Friday, April 12, 13

Upload: amansk

Post on 08-May-2015

1.261 views

Category:

Technology


0 download

TRANSCRIPT

Page 1: Building apps with HBase - Big Data TechCon Boston

1

Headline  Goes  HereSpeaker  Name  or  Subhead  Goes  Here

Building  applica2ons  with  HBaseAmandeep  Khurana  |  Solu7ons  ArchitectBig  Data  TechCon,  Boston,  April  2013

Friday, April 12, 13

Page 2: Building apps with HBase - Big Data TechCon Boston

About  me

•Solu2ons  Architect,  Cloudera  Inc•Amazon  Web  Services•Interested  in  large  scale  distributed  systems•Co-­‐author,  HBase  In  Ac2on•TwiHer:  amansk

2

M A N N I N G

Nick DimidukAmandeep Khurana

Friday, April 12, 13

Page 3: Building apps with HBase - Big Data TechCon Boston

About  the  talk

•Setup  local  HBase  environment•Simple  client•Deeper  dive  on  API•Data  model

3Friday, April 12, 13

Page 4: Building apps with HBase - Big Data TechCon Boston

What  is  HBase?

•Scalable,  distributed  data  store•Open  source  avatar  of  Google’s  Bigtable•Sorted  map  of  maps•Sparse•Mul2  dimensional•Tightly  integrated  with  Hadoop•Not  a  RDBMS

4Friday, April 12, 13

Page 5: Building apps with HBase - Big Data TechCon Boston

So,  what’s  the  big  deal?

•Give  up  system  complexity  and  sophis2cated  features  for  ease  of  scale  and  manageability

•Simple  system  !=  easy  to  build  applica2ons  on  top•Goal:  predictable  read  and  write  performance  at  scale

•Effec2ve  API  usage•Op2mal  schema  design•Understand  internals  and  leverage  at  scale

5Friday, April 12, 13

Page 6: Building apps with HBase - Big Data TechCon Boston

Use  cases

•Canonical  use  case:  storing  crawl  data  and  indices  for  search

6

1

1 Crawlers constantly scour the Internet for new pages.Those pages are stored as individual records in Bigtable. 3

2 A MapReduce job runs over the entire table, generatingsearch indexes for the Web Search application.

4

2

5

Web Searchpowered by Bigtable

Indexing the Internet

3 The user initiates a Web Search request.

4 The Web Search application queries the Search Indexesand retries matching documents directly from Bigtable.

5 Search results are presented to the user.

Searching the Internet

BigtableInternetsCrawlers

CrawlersCrawlers

Crawlers

MapReduce

You

Search IndexSearch

IndexSearch Index

Web Search

Friday, April 12, 13

Page 7: Building apps with HBase - Big Data TechCon Boston

Use  cases  (con2nued)

•Capturing  incremental  trickle  data•Content  serving  /  OLTP•Blob  store

7Friday, April 12, 13

Page 8: Building apps with HBase - Big Data TechCon Boston

Installing  HBase

•Get  it  here  -­‐>  hHp://hbase.apache.org/•Downloads  -­‐>  Choose  a  mirror  -­‐>  Choose  a  release•Download  the  tarball•Untar  it  and  run  it

8Friday, April 12, 13

Page 9: Building apps with HBase - Big Data TechCon Boston

Basics

•Modes  of  opera2on•Standalone•Pseudo-­‐distributed•Fully  distributed

•Start  up  in  standalone  mode•cd  hbase-­‐0.94.1•bin/start-­‐hbase.sh•Logs  are  in  ./logs/

•Check  out  hHp://localhost:60010  in  your  browser

9Friday, April 12, 13

Page 10: Building apps with HBase - Big Data TechCon Boston

Ways  to  interact

•Shell•Java  client•Thrid  interface•REST  interface•Asynchbase

10Friday, April 12, 13

Page 11: Building apps with HBase - Big Data TechCon Boston

Shell

•Basic  interac2on  tool•WriHen  in  JRuby•Uses  the  Java  client  library•Good  place  to  start  poking  around

11Friday, April 12, 13

Page 12: Building apps with HBase - Big Data TechCon Boston

Shell  (con2nued)

•bin/hbase  shell•Create  table

•create  ‘users’  ,  ‘info’•List  tables

•list•Describe  table

•describe  ‘users’

12Friday, April 12, 13

Page 13: Building apps with HBase - Big Data TechCon Boston

Shell  (con2nued)

•Put  a  row•put  ‘users’  ,  ‘u1’,  ‘info:name’  ,  ‘Mr.  Rogers’

•Get  a  row•get  ‘users’  ,  ‘u1’

•Put  more•put  ‘users’  ,  ‘u1’  ,  ‘info:email’  ,  ‘[email protected]’•put  ‘users’  ,  ‘u1’  ,  ‘info:password’  ,  ‘foobar’

•Get  a  row•get  ‘users’  ,  ‘u1’

•Scan  table• scan  ‘users’

13Friday, April 12, 13

Page 14: Building apps with HBase - Big Data TechCon Boston

14

Demo-­‐  Standalone  mode-­‐  Shell

Friday, April 12, 13

Page 15: Building apps with HBase - Big Data TechCon Boston

15

...  can’t  really  build  an  applica7on  just  with  the  shell

Java  API

Friday, April 12, 13

Page 16: Building apps with HBase - Big Data TechCon Boston

Important  terms

• Table• Consists  of  rows  and  columns

• Row• Has  a  bunch  of  columns.• Iden2fied  by  a  rowkey  (primary  key)

• Column  Qualifier• Dynamic  column  name

• Column  Family• Column  groups  -­‐  logical  and  physical  (Similar  access  paHern)

• Cell• The  actual  element  that  contains  the  data  for  a  row-­‐column  intersec2on

• Version• Every  cell  has  mul2ple  versions.

16Friday, April 12, 13

Page 17: Building apps with HBase - Big Data TechCon Boston

Introducing  TwitBase

•TwiHer  clone  built  on  HBase•Designed  to  scale• It’s  an  example,  not  a  produc2on  app•Start  with  users  and  twits  for  now•Users

•Have  profiles•Post  twits

•Twits•Have  text  posted  by  users

•Get  sample  code  at:  hHps://github.com/hbaseinac2on/twitbase

17Friday, April 12, 13

Page 18: Building apps with HBase - Big Data TechCon Boston

Java  API

•Configura)on  -­‐  holds  config  informa2on  about  the  cluster.  Roughly  equivalent  to  JDBC  connec2on  string

•HConnec)on  -­‐  represents  connec2on  to  a  cluster•HBaseAdmin  -­‐  admin  tool  to  handle  DDL  opera2ons

•HTablePool  -­‐  pool  of  connec2ons  to  cluster•HTable  (HTableInterface)  -­‐  handle  on  a  single  HBase  table.  Sends  commands  to  table

18Friday, April 12, 13

Page 19: Building apps with HBase - Big Data TechCon Boston

Connec2ng  to  HBase

•Using default confs:HTableInterface usersTable = new HTable("users");

•Passing custom conf:Configuration myConf = HBaseConfiguration.create();myConf.set("parameter_name", "parameter_value");HTableInterface usersTable = new HTable(myConf, "users");

•Connection poolingHTablePool pool = new HTablePool();HTableInterface usersTable = pool.getTable("users"); // work with the tableusersTable.close();

19Friday, April 12, 13

Page 20: Building apps with HBase - Big Data TechCon Boston

Inser2ng  data

•Put  call  on  HTablePut p = new Put(Bytes.toBytes("TheRealMT")); p.add(Bytes.toBytes("info"), Bytes.toBytes("name"), Bytes.toBytes("Mark Twain")); p.add(Bytes.toBytes("info"), Bytes.toBytes("email"), Bytes.toBytes("[email protected]")); p.add(Bytes.toBytes("info"), Bytes.toBytes("password"), Bytes.toBytes("Langhorne"));

usersTable.put(p);

20Friday, April 12, 13

Page 21: Building apps with HBase - Big Data TechCon Boston

Data  coordinates

•Row  is  addressed  using  rowkey•Cell  is  addressed  using              [rowkey  +  family  +  qualifier]

21Friday, April 12, 13

Page 22: Building apps with HBase - Big Data TechCon Boston

Upda2ng  data

•Update  ==  WritePut p = new Put(Bytes.toBytes("TheRealMT"));

p.add(Bytes.toBytes("info"),

Bytes.toBytes("password"),

Bytes.toBytes("abc123"));

usersTable.put(p);

22Friday, April 12, 13

Page 23: Building apps with HBase - Big Data TechCon Boston

Write  path

23

Memstore

HFile

HFile

WAL

Write goes to the write-ahead log (WAL)as well as the in memory write buffer called

the Memstore.Client's don't interact directly with the

underlying HFiles during writes.Client

Memstore

New HFile

HFile

Memstore gets flushed to disk toform a new HFile

when filled up with writes.

HFile

Friday, April 12, 13

Page 24: Building apps with HBase - Big Data TechCon Boston

Reading  data

•Get  the  en2re  rowGet g = new Get(Bytes.toBytes("TheRealMT"));

Result r = usersTable.get(g);

•Get  selected  columnsGet g = new Get(Bytes.toBytes("TheRealMT"));

g.addColumn(Bytes.toBytes("info"),

Bytes.toBytes("password"));

Result r = usersTable.get(g);

•Extract  the  value  out  of  the  Result  objectbyte[] b = r.getValue(Bytes.toBytes("info"),

Bytes.toBytes("email"));

String email = Bytes.toString(b);

24Friday, April 12, 13

Page 25: Building apps with HBase - Big Data TechCon Boston

Reading  data  (con2nued)

•ScanScan s = new Scan(startRow, stopRow);

ResultsScanner rs = twits.getScanner(s);

for(Result r : rs) {

// iterate over scanner results

}

•Single threaded iteration over defined set of rows. Could be the entire table too.

25Friday, April 12, 13

Page 26: Building apps with HBase - Big Data TechCon Boston

What  happens  when  you  read  data?

26

Memstore

HFile

HFile

Client

BlockCache

Friday, April 12, 13

Page 27: Building apps with HBase - Big Data TechCon Boston

Dele2ng  data

•Another  simple  API  call•Delete  en2re  rowDelete d = new Delete(Bytes.toBytes("TheRealMT"));

usersTable.delete(d);

•Delete  selected  columnsDelete d = new Delete(Bytes.toBytes("TheRealMT"));

d.deleteColumns(Bytes.toBytes("info"),

Bytes.toBytes("email"));

usersTable.delete(d);

27Friday, April 12, 13

Page 28: Building apps with HBase - Big Data TechCon Boston

Delete  internals

•Tombstone  markers•Dele2on  only  happens  during  major  compac2ons•Compac2ons

•Minor•Major

28Friday, April 12, 13

Page 29: Building apps with HBase - Big Data TechCon Boston

Compac2ons

29

Memstore

HFile

HFile

Memstore

CompactedHFile

Two or more HFiles get combinedto form a single HFile during

a compaction.

HFile HFile

Friday, April 12, 13

Page 30: Building apps with HBase - Big Data TechCon Boston

DAO

•Like  any  other  applica2on•Makes  data  access  management  easy

•Cleaner  code•Table  info  at  a  single  place

•Op2miza2ons  like  connec2on  pooling  etc•Easier  management  of  database  configs

30Friday, April 12, 13

Page 31: Building apps with HBase - Big Data TechCon Boston

AdminHBaseAdmin

• Administra2ve  API  to  the  cluster.

• Provides  an  interface  to  manage  HBase  database  tables  metadata  and  general  administra2ve  func2ons.

Table  Opera2ons

• createTable()

• deleteTable()

• addColumn()

• modifyColumn()

• deleteColumn()

• listTables()

Friday, April 12, 13

Page 32: Building apps with HBase - Big Data TechCon Boston

32

Demo-­‐  Java  client

Friday, April 12, 13

Page 33: Building apps with HBase - Big Data TechCon Boston

More  API  features

•Filters•Compare  and  Swap•Coprocessors

33Friday, April 12, 13

Page 34: Building apps with HBase - Big Data TechCon Boston

34

It’s  not  a  rela7onal  database  system

Data  Model

Friday, April 12, 13

Page 35: Building apps with HBase - Big Data TechCon Boston

HBase  is  ...

•Column  family  oriented  database•Column  family  oriented•Tables  consis2ng  of  rows  and  columns

•Persisted  Map•Sparse•Mul2  dimensional•Sorted• Indexed  by  rowkey,  column  and  2mestamp

•Key  Value  store• [rowkey,  col  family,  col  qualifier,  2mestamp]  -­‐>  cell  value

35Friday, April 12, 13

Page 36: Building apps with HBase - Big Data TechCon Boston

HBase  is  not  ...

•A  rela2onal  database•No  SQL  query  language•No  joins•No  secondary  indexing•No  transac2ons

36Friday, April 12, 13

Page 37: Building apps with HBase - Big Data TechCon Boston

Tabular  representa2on

37

Column Family - Info

password

abc123

abc123

abc123

[email protected]

TheRealMT

HMS_Surprise

[email protected]

Sir Arthur Conan Doyle [email protected]

[email protected]

SirDoyle

GrandpaD

Fyodor Dostoyevsky

Patrick O'Brien

Mark Twain

Rowkey name email

Cells

Each cell has multiple versions,

typically represented by the timestamp

of when they were inserted into the table

(ts2>ts1)

ts1=1329088321289 ts2=1329088818321

The table is lexicographicallysorted on the rowkeys

Langhorneabc123

12

3

4

The coordinates used to identify data in an HBase table are:(1) rowkey, (2) column family, (3) column qualifier, (4) version

Friday, April 12, 13

Page 38: Building apps with HBase - Big Data TechCon Boston

Key-­‐Value  store

38

[TheRealMT, info, password, 1329088818321]

[TheRealMT, info, password, 1329088321289]

abc123

Langhorne

Keys Values

A single KeyValue instance

Friday, April 12, 13

Page 39: Building apps with HBase - Big Data TechCon Boston

Key-­‐Value  store

39

[TheRealMT, info, password, 1329088818321] abc123

[TheRealMT, info, password] 1329088818321 : "abc123",1329088321289 : "Langhorne"

}

{

[TheRealMT, info] },

"name" : { 1329088321289 : "Mark Twain"

"email" : { 1329088321289 : "[email protected]" },

"password" : { 1329088818321 : "abc123", 1329088321289 : "Langhorne" } }

{

},

"info" : {

"name" : { 1329088321289 : "Mark Twain"

"email" : { 1329088321289 : "[email protected]" },

"password" : { 1329088818321 : "abc123", 1329088321289 : "Langhorne" } } }

{

[TheRealMT]

Keys

Values

Start with coordinates of full precision1

Drop version and you're left with a map of version to values2

Omit qualifier and you have a map of qualifiers to the previous maps3

Finally, drop the column family and you have a row, a map of maps4

Friday, April 12, 13

Page 40: Building apps with HBase - Big Data TechCon Boston

Sorted  map  of  maps

40

Rowkey

Column family

Column qualifiers

Versions

Values

{

},

"TheRealMT" : { "info" : {

"name" : { 1329088321289 : "Mark Twain"

"email" : { 1329088321289 : "[email protected]" },

"password" : { 1329088818321 : "abc123", 1329088321289 : "Langhorne" } } }}

Friday, April 12, 13

Page 41: Building apps with HBase - Big Data TechCon Boston

HFiles  and  physical  data  model

•HFiles  are•Immutable•Sorted  on  rowkey  +  qualifier  +  2mestamp•In  the  context  of  a  column  family  per  region

41

, , , "TheRealMT" "info" "password" , 1329088321289 "Langhorne", , , "TheRealMT" "info" "password" , 1329088818321 "abc123",

, , , , "TheRealMT" "info" "email" 1329088321289 "[email protected]", , , , "TheRealMT" "info" "name" 1329088321289 "Mark Twain"

HFile for the info column family in the users table

Friday, April 12, 13

Page 42: Building apps with HBase - Big Data TechCon Boston

42

...  what  really  goes  on  under  the  hood

HBase  in  distributed  mode

Friday, April 12, 13

Page 43: Building apps with HBase - Big Data TechCon Boston

Distributed  mode

•Table  is  split  into  Regions

43

00001 John 415-111-123400002 Paul 408-432-992200003 Ron 415-993-212400004 Bob 818-243-998800005 Carly 206-221-912300006 Scott 818-231-256600007 Simon 425-112-987700008 Lucas 415-992-443200009 Steve 530-288-983200010 Kelly 916-992-123400011 Betty 650-241-119200012 Anne 206-294-1298

00001 John 415-111-123400002 Paul 408-432-992200003 Ron 415-993-212400004 Bob 818-243-9988

00005 Carly 206-221-912300006 Scott 818-231-256600007 Simon 425-112-987700008 Lucas 415-992-4432

00009 Steve 530-288-983200010 Kelly 916-992-123400011 Betty 650-241-119200012 Anne 206-294-1298

00001 John 415-111-123400002 Paul 408-432-992200003 Ron 415-993-212400004 Bob 818-243-9988

00005 Carly 206-221-912300006 Scott 818-231-256600007 Simon 425-112-987700008 Lucas 415-992-4432

00009 Steve 530-288-983200010 Kelly 916-992-123400011 Betty 650-241-119200012 Anne 206-294-1298

Full table T1 Table T1 split into 3 regions - R1, R2 and R3

T1R1

T1R2

T1R3

Friday, April 12, 13

Page 44: Building apps with HBase - Big Data TechCon Boston

Distributed  mode  (con2nued)

•Regions  are  served  by  RegionServers

•RegionServers  are  Java  processes,  typically  collocated  with  the  DataNodes

44

DataNode RegionServer

DataNode RegionServer

DataNode RegionServer

00001 John 415-111-123400002 Paul 408-432-992200003 Ron 415-993-212400004 Bob 818-243-9988

00005 Carly 206-221-912300006 Scott 818-231-256600007 Simon 425-112-987700008 Lucas 415-992-4432

00009 Steve 530-288-983200010 Kelly 916-992-123400011 Betty 650-241-119200012 Anne 206-294-1298

Host1

Host2

Host3

T1R1

T1R2

T1R3

Friday, April 12, 13

Page 45: Building apps with HBase - Big Data TechCon Boston

Finding  your  region

•Two  special  tables:  -­‐ROOT-­‐  and  .META.

45

-ROOT-

M1 M2 M3

-ROOT- table, consisting of only 1 regionhosted on region server RS1

.META. table, consisting of 3 regions,hosted on RS1, RS2 and RS3

User tables T1 and T2, consisting of4 and 3 regions respectively,

distributed amongst RS1, RS2 and RS3

RS1

RS2

T2R1

RS2

T2R2

RS2

T2R3

RS1

RS1RS3

T1R1

RS3

T1R3

RS3

T1R2

RS1

T2R4

RS1

Friday, April 12, 13

Page 46: Building apps with HBase - Big Data TechCon Boston

Regions  distributed  across  the  cluster

46

DataNode RegionServer

00001 John 415-111-123400002 Paul 408-432-992200003 Ron 415-993-212400004 Bob 818-243-9988

00005 Carly 206-221-912300006 Scott 818-231-256600007 Simon 425-112-987700008 Lucas 415-992-4432

00009 Steve 530-288-983200010 Kelly 916-992-123400011 Betty 650-241-119200012 Anne 206-294-1298

Host1

Host2

M:T100001-M:T100009 M1 RS2M:T100010-M:T100012 M2 RS1

T1:00001-T1:00004 R1 RS1T1:00005-T1:00008 R2 RS2

T1:00009-T1:00012 R3 RS2

T1-R1

T1-R2

T1-R3

.META. - M1

-ROOT-

.META. - M2

RegionServer 1 (RS1), hosting regions R1 of the user table T1 and M2 of .META.

RegionServer 2 (RS2), hosting regions R2, R3 of the user table and M1 of .META.

RegionServer 3 (RS3), hosting just -ROOT-

Host3

DataNode RegionServer

DataNode RegionServer

Friday, April 12, 13

Page 47: Building apps with HBase - Big Data TechCon Boston

What  the  client  does

47

ZooKeeper

TheRealMT

-ROOT-

.META.

MyTable

Friday, April 12, 13

Page 48: Building apps with HBase - Big Data TechCon Boston

What  the  client  does

47

ZooKeeper

TheRealMT

-ROOT-

.META.

MyTable

Friday, April 12, 13

Page 49: Building apps with HBase - Big Data TechCon Boston

What  the  client  does

47

ZooKeeper

TheRealMT

-ROOT-

.META.

MyTable

Row per META region

Friday, April 12, 13

Page 50: Building apps with HBase - Big Data TechCon Boston

What  the  client  does

47

ZooKeeper

TheRealMT

-ROOT-

.META.

MyTable

Row per META region

Row per table region

Friday, April 12, 13

Page 51: Building apps with HBase - Big Data TechCon Boston

What  the  client  does

47

ZooKeeper

TheRealMT

-ROOT-

.META.

MyTable

Row per META region

Row per table region

Friday, April 12, 13

Page 52: Building apps with HBase - Big Data TechCon Boston

48Friday, April 12, 13