intro to nosql and mongodb

68
1 NoSQL: Introduction Asya Kamsky

Upload: dataversity

Post on 20-Aug-2015

3.178 views

Category:

Technology


0 download

TRANSCRIPT

Page 1: Intro to NoSQL and MongoDB

1

NoSQL: Introduction

Asya Kamsky

Page 2: Intro to NoSQL and MongoDB

2

• 1970's Relational Databases Invented

– Storage is expensive

– Data is normalized

– Data storage is abstracted away from app

Page 3: Intro to NoSQL and MongoDB

3

• 1970's Relational Databases Invented

– Storage is expensive

– Data is normalized

– Data storage is abstracted away from app

• 1980's RDBMS commercialized

– Client/Server model

– SQL becomes the standard

Page 4: Intro to NoSQL and MongoDB

4

• 1970's Relational Databases Invented

– Storage is expensive

– Data is normalized

– Data storage is abstracted away from app

• 1980's RDBMS commercialized

– Client/Server model

– SQL becomes the standard

• 1990's Things begin to change

– Client/Server=> 3-tier architecture

– Rise of the Internet and the Web

Page 5: Intro to NoSQL and MongoDB

5

• 2000's Web 2.0

– Rise of "Social Media"

– Acceptance of E-Commerce

– Constant decrease of HW prices

– Massive increase of collected data

Page 6: Intro to NoSQL and MongoDB

6

• 2000's Web 2.0

– Rise of "Social Media"

– Acceptance of E-Commerce

– Constant decrease of HW prices

– Massive increase of collected data

• Result

– Constant need to scale dramatically

– How can we scale?

Page 7: Intro to NoSQL and MongoDB

7

Computers in 1985

• x286 5-35 mhz

• 56 kbps

• 64 KB RAM

• 10 MB HDD

Page 8: Intro to NoSQL and MongoDB

8

Computers in 1985

• x286 5-35 mhz

• 56 kbps

• 64 KB RAM

• 10 MB HDD

Computers in 1995

• Pentium 100 mhz

• 20-50 Mbps

• 16 MB RAM

• 200 MB HDD

Page 9: Intro to NoSQL and MongoDB

9

Computers in 1985

• x286 5-35 mhz

• 56 kbps

• 64 KB RAM

• 10 MB HDD

Computers in 1995

• Pentium 100 mhz

• 20-50 Mbps

• 16 MB RAM

• 200 MB HDD

Phone in 2012

• Dual core 1.2 Ghz

• WiFi 802.11n - 300+Mbps

• 1 GB RAM

• 48 GB SSD

Page 10: Intro to NoSQL and MongoDB

10

Computers in 1985

• x286 5-35 mhz

• 56 kbps

• 64 KB RAM

• 10 MB HDD

Computers in 1995

• Pentium 100 mhz

• 20-50 Mbps

• 16 MB RAM

• 200 MB HDD

Computers in 2012

• Dual core 1.8 Ghz

• WiFi 802.11n - 300+Mbps

• 180+ Gbps

• 8 GB RAM

• 512 GB SSD

Page 11: Intro to NoSQL and MongoDB

11

• Agile Development Methodology • Shorter development cycles

• Constant evolution of requirements

• Flexibility at design time

Page 12: Intro to NoSQL and MongoDB

12

• Agile Development Methodology • Shorter development cycles

• Constant evolution of requirements

• Flexibility at design time

• Relational Schema • Hard to evolve

• long painful migrations

• must stay in sync with

application

• few developers interact directly

Page 13: Intro to NoSQL and MongoDB

13

OLTP / operational

BI / reporting

+ complex transactions

+ tabular data

+ ad hoc queries

- O<->R mapping hard

- speed/scale problems

- not super agile

+ ad hoc queries

+ SQL standard

protocol between

clients and servers

+ scales horizontally

better than oper dbs.

- some scale limits at

massive scale

- schemas are rigid

- no real time; great at

bulk nightly data loads

fewer issues here

a lot more issues here

Page 14: Intro to NoSQL and MongoDB

14

OLTP / operational

BI / reporting

caching

flat files

map/reduce

app layer partitioning

+ complex transactions

+ tabular data

+ ad hoc queries

- O<->R mapping hard

- speed/scale problems

- not super agile

+ ad hoc queries

+ SQL standard

protocol between

clients and servers

+ scales horizontally

better than oper dbs.

- some scale limits at

massive scale

- schemas are rigid

- no real time; great at

bulk nightly data loads

Page 15: Intro to NoSQL and MongoDB

15

Page 16: Intro to NoSQL and MongoDB

16

• Agile Development Methodology • Shorter development cycles

• Constant evolution of requirements

• Flexibility at design time

Page 17: Intro to NoSQL and MongoDB

17

• Agile Development Methodology • Shorter development cycles

• Constant evolution of requirements

• Flexibility at design time

• Relational Schema • Hard to evolve

• long painful migrations

• must stay in sync with

application

• few developers interact directly

Page 18: Intro to NoSQL and MongoDB

18

Page 19: Intro to NoSQL and MongoDB

19

• Horizontal scaling

• Run anywhere

• Flexible data model

• Faster development

• Low upfront cost

• Low cost of ownership

Page 20: Intro to NoSQL and MongoDB

20

Relational

vs

Non-Relational

What is NoSQL?

Page 21: Intro to NoSQL and MongoDB

21

scalable nonrelational

("nosql")

OLTP / operational

BI / reporting

+ speed and scale

- ad hoc query limited

- not very transactional

- no sql/no standard

+ fits OO well

+ agile

Page 22: Intro to NoSQL and MongoDB

22

Non-relational next generation

operation data stores and databases

A collection of very different products

• Different data models (Not relational)

• Most are not using SQL for queries

• No predefined schema

• Some allow flexible data structures

Page 23: Intro to NoSQL and MongoDB

23

• Relational

• Key-Value

• Document

• XML

• Graph

• Column

Page 24: Intro to NoSQL and MongoDB

24

• Relational

• ACID

• Key-Value

• Document

• XML

• Graph

• Column

• BASE

• Some ACID properties

Page 25: Intro to NoSQL and MongoDB

25

• Relational

• ACID

• Two-phase commit

• Key-Value

• Document

• XML

• Graph

• Column

• BASE

• Some ACID properties

• Atomic transactions on

document level

Page 26: Intro to NoSQL and MongoDB

26

• Relational

• ACID

• Two-phase commit

• Joins

• Key-Value

• Document

• XML

• Graph

• Column

• BASE

• Some ACID properties

• Atomic transactions on

document level

• No Joins

Page 27: Intro to NoSQL and MongoDB

27

Page 28: Intro to NoSQL and MongoDB

28

• Fits your use case

• Reliability

• Maintainability

• Ease of Use

• Scalability

• Cost

Page 29: Intro to NoSQL and MongoDB

29

MongoDB: Introduction

Page 30: Intro to NoSQL and MongoDB

30

Page 31: Intro to NoSQL and MongoDB

31

• Designed and developed by founders of Doubleclick, ShopWiki, GILT groupe, etc.

• GOAL: create high performance, fully consistent, horizonally scalable general purpose data store.

• Coding started fall 2007

• Open Source – AGPL, written in C++

• First production site March 2008 - businessinsider.com

• Currently version 2.2 – August 2012

Page 32: Intro to NoSQL and MongoDB

32

MongoDB

Design Goals

Page 33: Intro to NoSQL and MongoDB

33

Page 34: Intro to NoSQL and MongoDB

34

• Document-oriented

Storage

• Based on JSON

Documents

• Data serialized to BSON

• Flexible Schema

• Scalable Architecture

• Replication

• High availability

• Auto-sharding

• Extensive use of memory

mapped files

• Durable

• Strong Consistency

• Key Features Include:

• Full featured indexes

• Ad-hoc Query Language

• Interactive shell

• Aggregation queries

• Map/Reduce

Page 35: Intro to NoSQL and MongoDB

35

• Rich data models

• Seamlessly map to native programming

language types

• Flexible for dynamic data

• Better data locality

Page 36: Intro to NoSQL and MongoDB

36

Blogging website:

Register users

Users post blog entries

Comment on others' entries

Considering:

Tagging, Voting, ???

Page 37: Intro to NoSQL and MongoDB

37

join

table

Page 38: Intro to NoSQL and MongoDB

38

{

_id : ObjectId("4e2e3f92268cdda473b628f6"),

title : "My Very Important Thoughts",

published: ISODate("2011-07-26T19:49:00.147Z"),

author : { name:"Asya Kamsky", username:"asya" },

text : "It was a long and stormy night ..."

}

Page 39: Intro to NoSQL and MongoDB

39

{

_id : ObjectId("4e2e3f92268cdda473b628f6"),

title : "My Very Important Thoughts",

published: ISODate("2011-07-26T19:49:00.147Z"),

author : { name:"Asya Kamsky", username:"asya" },

text : "It was a long and stormy night ..."

tags : ["business", "news", "north america"]

}

> db.posts.ensureIndex( { tags : 1 } )

Page 40: Intro to NoSQL and MongoDB

40

{

_id : ObjectId("4e2e3f92268cdda473b628f6"),

title : "My Very Important Thoughts",

published: ISODate("2011-07-26T19:49:00.147Z"),

author : { name:"Asya Kamsky", username:"asya" },

text : "It was a long and stormy night ..."

tags : ["business", "news", "north america"]

}

> db.posts.find( { tags : "news" } )

Page 41: Intro to NoSQL and MongoDB

41

{

_id : ObjectId("4e2e3f92268cdda473b628f6"),

title : "My Very Important Thoughts",

published: ISODate("2011-07-26T19:49:00.147Z"),

author : { name:"Asya Kamsky", username:"asya" },

text : "It was a long and stormy night ..."

tags : ["business", "news", "north america"]

}

> db.posts.find( { tags : "news" } ) .explain()

{ "cursor" : "BtreeCursor tags_1",

"isMultiKey" : true,

"n" : 1,

"nscannedObjects" : 1,

"scanAndOrder" : false,

"indexOnly" : false,

"nYields" : 0,

"nChunkSkips" : 0,

"millis" : 0,

"indexBounds" : {

"tags" : [

[

"news",

"news"

Page 42: Intro to NoSQL and MongoDB

42

{

_id : ObjectId("4e2e3f92268cdda473b628f6"),

title : "My Very Important Thoughts",

published: ISODate("2011-07-26T19:49:00.147Z"),

author : { name:"Asya Kamsky", username:"asya" },

text : "It was a long and stormy night ..."

tags : ["business", "news", "north america"],

votes : 3,

voters : ["dmerr", "sj", "jane" ]

}

> db.posts.update( { }, – query for documents to update

{ } – update to perform

)

Page 43: Intro to NoSQL and MongoDB

43

{

_id : ObjectId("4e2e3f92268cdda473b628f6"),

title : "My Very Important Thoughts",

published: ISODate("2011-07-26T19:49:00.147Z"),

author : { name:"Asya Kamsky", username:"asya" },

text : "It was a long and stormy night ..."

tags : ["business", "news", "north america"],

votes : 3,

voters : ["dmerr", "sj", "jane" ]

}

> db.posts.update( {_id:..., voters:{$ne:"asya"} },

{ $push: {voters:"asya"},

$inc : {votes: 1}

} )

Page 44: Intro to NoSQL and MongoDB

44

{

_id : ObjectId("4e2e3f92268cdda473b628f6"),

title : "My Very Important Thoughts",

published: ISODate("2011-07-26T19:49:00.147Z"),

author : { name:"Asya Kamsky", username:"asya" },

text : "It was a long and stormy night ..."

tags : ["business", "news", "north america"],

votes : 4,

voters : ["dmerr", "sj", "jane", "asya" ],

comments : [

{ by : "tim157", text : "great story", ... },

{ by : "gora", text : "i don’t think so", ... },

{ by : "dmerr", text : "also check out..." }

]

}

Page 45: Intro to NoSQL and MongoDB

45

{

_id : ObjectId("4e2e3f92268cdda473b628f6"),

title : "My Very Important Thoughts",

published: ISODate("2011-07-26T19:49:00.147Z"),

author : { name:"Asya Kamsky", username:"asya" },

text : "It was a long and stormy night ..."

tags : ["business", "news", "north america"],

votes : 4,

voters : ["dmerr", "sj", "jane","asya" ],

comments : [

{ by : "tim157", text : "great story" },

{ by : "gora", text : "i don’t think so" },

{ by : "dmerr", text : "also check out..." }

]

}

> db.posts.ensureIndex( { "comments.by" : 1 } )

Page 46: Intro to NoSQL and MongoDB

46

{

_id : ObjectId("4e2e3f92268cdda473b628f6"),

title : "My Very Important Thoughts",

published: ISODate("2011-07-26T19:49:00.147Z"),

author : { name:"Asya Kamsky", username:"asya" },

text : "It was a long and stormy night ..."

tags : ["business", "news", "north america"],

votes : 4,

voters : ["dmerr", "sj", "jane","asya" ],

comments : [

{ by : "tim157", text : "great story" },

{ by : "gora", text : "i don’t think so" },

{ by : "dmerr", text : "also check out..." }

]

}

> db.posts.find( { "comments.by" : "gora" } )

Page 47: Intro to NoSQL and MongoDB

47

Seek = 5+ ms Read = really really fast

Post

Author Comment

Page 48: Intro to NoSQL and MongoDB

48

Post

Author

Comment Comment Comment Comment Comment

Disk seeks and data locality

Page 49: Intro to NoSQL and MongoDB

49

• High Availability

• Data Redundancy

• Increase capacity with no downtime

• Transparent to the application

Page 50: Intro to NoSQL and MongoDB

50

• A cluster of N servers

• Any (one) node can be primary

• All writes to primary

• Reads go to primary (default) optionally to a secondary

• Consensus election of primary

• Automatic failover

• Automatic recovery

Node 3

Node 1

Node 2

Primary

Pick me!

Page 51: Intro to NoSQL and MongoDB

51

Replica Sets

• High Availability/Automatic Failover

• Data Redundancy

• Disaster Recovery

• Transparent to the application

• Perform maintenance with no down time

Page 52: Intro to NoSQL and MongoDB

52

Asynchronous

Replication

Page 53: Intro to NoSQL and MongoDB

53

Asynchronous

Replication

Page 54: Intro to NoSQL and MongoDB

54

Asynchronous

Replication

Page 55: Intro to NoSQL and MongoDB

55

Page 56: Intro to NoSQL and MongoDB

56

Automatic

Election

Page 57: Intro to NoSQL and MongoDB

57

Page 58: Intro to NoSQL and MongoDB

58

• Increase capacity with no downtime

• Transparent to the application

• Range based partitioning

• Partitioning and balancing is automatic

Page 59: Intro to NoSQL and MongoDB

59

Primary

Secondary

Secondary

Primary

Secondary

Secondary

Primary

Secondary

Secondary

Primary

Secondary

Secondary

Key Range

min..25

Key Range

26..50

Key Range

51..75

Key Range

76.. max

Page 60: Intro to NoSQL and MongoDB

60

Primary

Secondary

Secondary

Primary

Secondary

Secondary

Primary

Secondary

Secondary

Primary

Secondary

Secondary

Key Range

min..25

Key Range

26..50

Key Range

51..75

Key Range

76.. max

MongoS

Application

Page 61: Intro to NoSQL and MongoDB

61

Primary

Secondary

Secondary

Primary

Secondary

Secondary

Primary

Secondary

Secondary

Primary

Secondary

Secondary

Key Range

min..25

Key Range

26..50

Key Range

51..75

Key Range

76.. max

MongoS MongoS MongoS

Application

Page 62: Intro to NoSQL and MongoDB

62

Primary

Secondary

Secondary

Primary

Secondary

Secondary

Primary

Secondary

Secondary

Primary

Secondary

Secondary

Key Range

min..25

Key Range

26..50

Key Range

51..75

Key Range

76.. max

MongoS MongoS MongoS

Config Config Config

Application

MongoS

MongoS

Application Application Application

Page 63: Intro to NoSQL and MongoDB

63

• Few configuration options

• Does the right thing out of the box

• Easy to deploy and manage

Page 64: Intro to NoSQL and MongoDB

64

Better data locality

Relational MongoDB

In-Memory

Caching

Auto-Sharding

Write scaling Re

ad

sca

ling

We just can't get any faster than the way MongoDB handles our data.

Tony Tam CTO, Wordnik

Page 65: Intro to NoSQL and MongoDB

65

• Supported Platforms:

– Linux, Windows, Solaris, Mac OS X

– Packages available for all popular distributions

No external/third party software dependencies

10gen maintains drivers for over dozen languages

Page 66: Intro to NoSQL and MongoDB

66

User Data Management High Volume Data Feeds

Content Management Operational Intelligence E-Commerce

Page 67: Intro to NoSQL and MongoDB

67

Page 68: Intro to NoSQL and MongoDB

68

Open source, high performance database