mongodb schema design basics

51
opensource, highperformance, documentoriented database

Upload: alvin-john-richards

Post on 08-Apr-2015

5.670 views

Category:

Documents


5 download

DESCRIPTION

What is document design in MongoDB? In this talk we will cover the history of normalization, how data design changes from a relational to a document design and basic patterns for handling, One-Many, Many-Many, Trees and Stacks.

TRANSCRIPT

Page 1: MongoDB schema design basics

open-­‐source,  high-­‐performance,  document-­‐oriented  database  

Page 2: MongoDB schema design basics

Schema Design Basics���

Alvin Richards���[email protected]

Page 3: MongoDB schema design basics

This talk Part One

‣  Intro ‣  Terms / Definitions

‣  Getting a flavor ‣  Creating a Schema

‣  Indexes

‣  Evolving the Schema

Part Two

‣  Data modeling ‣  DBRef

‣  Single Table Inheritance

‣  Many – Many

‣  Trees

‣  Lists / Queues / Stacks

Page 4: MongoDB schema design basics

So why model data?

Page 5: MongoDB schema design basics

A brief history of normalization •  1970 E.F.Codd introduces 1st Normal Form (1NF)

•  1971 E.F.Codd introduces 2nd and 3rd Normal Form (2NF, 3NF)

•  1974 Codd & Boyce define Boyce/Codd Normal Form (BCNF)

•  2002 Date, Darween, Lorentzos define 6th Normal Form (6NF)

Goals:

•  Avoid anomalies when inserting, updating or deleting

•  Minimize redesign when extending the schema

•  Make the model informative to users

•  Avoid bias towards a particular style of query

* source : wikipedia

Page 6: MongoDB schema design basics

Relational made normalized data look like this

Page 7: MongoDB schema design basics

Document databases make normalized data look like this

Page 8: MongoDB schema design basics

Some terms before we proceed

RDBMS Document DBs

Table Collection

Row(s) JSON Document

Index Index

Join Embedding & Linking across documents

Partition Shard

Partition Key Shard Key

Page 9: MongoDB schema design basics

DB Considerations How can we manipulate

this data ?

•  Dynamic Queries

•  Secondary Indexes

•  Atomic Updates

•  Map Reduce

Access Patterns ?

•  Read / Write Ratio

•  Types of updates

•  Types of queries

•  Data life-cycle

Considerations •  No Joins •  Document writes are atomic

Page 10: MongoDB schema design basics

Design Session

Design documents that simply map to your application

post  =  {author:  “kyle”,                  date:  new  Date(),                  text:  “my  blog  post...”,                  tags:  [“mongodb”,  “intro”]}  

>db.post.save(post)  

Page 11: MongoDB schema design basics

>db.posts.find()

{ _id : ObjectId("4c4ba5c0672c685e5e8aabf3"), author : "kyle", date : "Sat Jul 24 2010 19:47:11 GMT-0700 (PDT)", text : "My first blog", tags : [ "mongodb", "intro" ] }

Notes: •  ID must be unique, but can be anything you’d like •  MongoDB will generate a default ID if one is not supplied

Find the document

Page 12: MongoDB schema design basics

Secondary index for “author”

// 1 means ascending, -1 means descending

>db.posts.ensureIndex({author: 1})

>db.posts.find({author: 'kyle'})

{ _id : ObjectId("4c4ba5c0672c685e5e8aabf3"), author : "kyle", ... }

Add and index, find via Index

Page 13: MongoDB schema design basics

Verifying indexes exist

>db.system.indexes.find()

// Index on ID { name : "_id_", ns : "test.posts", key : { "_id" : 1 } }

// Index on author { _id : ObjectId("4c4ba6c5672c685e5e8aabf4"), ns : "test.posts", key : { "author" : 1 }, name : "author_1" }

Page 14: MongoDB schema design basics

Query operators Conditional operators: $ne, $in, $nin, $mod, $all, $size, $exists, $type, .. $lt, $lte, $gt, $gte, $ne,

// find posts with any tags >db.posts.find({tags: {$exists: true}})

Page 15: MongoDB schema design basics

Query operators Conditional operators: $ne, $in, $nin, $mod, $all, $size, $exists, $type, .. $lt, $lte, $gt, $gte, $ne,

// find posts with any tags >db.posts.find({tags: {$exists: true}})

Regular expressions: // posts where author starts with k >db.posts.find({author: /^k*/i })

Page 16: MongoDB schema design basics

Query operators Conditional operators: $ne, $in, $nin, $mod, $all, $size, $exists, $type, .. $lt, $lte, $gt, $gte, $ne,

// find posts with any tags >db.posts.find({tags: {$exists: true}})

Regular expressions: // posts where author starts with k >db.posts.find({author: /^k*/i })

Counting: // posts written by mike    >db.posts.find({author:  “mike”}).count()  

Page 17: MongoDB schema design basics

Extending the Schema

new_comment = {author: “fred”, date: new Date(), text: “super duper”}

new_info = { ‘$push’: {comments: new_comment}, ‘$inc’: {comments_count: 1}}

 >db.posts.update({_id:  “...”  },  new_info)  

Page 18: MongoDB schema design basics

{ _id : ObjectId("4c4ba5c0672c685e5e8aabf3"), author : "kyle", date : "Sat Jul 24 2010 19:47:11 GMT-0700 (PDT)", text : "My first blog", tags : [ "mongodb", "intro" ], comments_count: 1, comments : [

{ author : "Fred", date : "Sat Jul 24 2010 20:51:03 GMT-0700 (PDT)", text : "Super Duper" }

]}

Extending the Schema

Page 19: MongoDB schema design basics

// create index on nested documents: >db.posts.ensureIndex({"comments.author": 1})

>db.posts.find({comments.author:”kyle”})

Extending the Schema

Page 20: MongoDB schema design basics

// create index on nested documents: >db.posts.ensureIndex({"comments.author": 1})

>db.posts.find({comments.author:”kyle”})

// find last 5 posts: >db.posts.find().sort({date:-1}).limit(5)

Extending the Schema

Page 21: MongoDB schema design basics

// create index on nested documents: >db.posts.ensureIndex({"comments.author": 1})

>db.posts.find({comments.author:”kyle”})

// find last 5 posts: >db.posts.find().sort({date:-1}).limit(5)

// most commented post: >db.posts.find().sort({comments_count:-1}).limit(1)

When sorting, check if you need an index

Extending the Schema

Page 22: MongoDB schema design basics

Map Reduce

Aggregation and batch manipulation

Collection in, Collection out

Parallel in sharded environments

Page 23: MongoDB schema design basics

Map reduce mapFunc = function () { this.tags.forEach(function (z) {emit(z, {count:1});}); }

reduceFunc = function (k, v) { var total = 0; for (var i = 0; i < v.length; i++) { total += v[i].count; } return {count:total}; }

res = db.posts.mapReduce(mapFunc, reduceFunc)

>db[res.result].find() { _id : "intro", value : { count : 1 } } { _id : "mongodb", value : { count : 1 } }

Page 24: MongoDB schema design basics

Review So Far: - Started out with a simple schema - Queried Data - Evolved the schema - Queried / Updated the data some more

Page 25: MongoDB schema design basics

Wordnik 9B records, 100M queries / week, 1.2TB {

entry : { header: { id: 0, headword: "m", sourceDictionary: "GCide", textProns : [ {text: "(em)", seq:0} ], syllables: [ {id: 0, text: "m"} ], sourceDictionary: "1913 Webster", headWord: "m", id: 1, definitions: : [ {text: "M, the thirteenth letter..."}, {text: "As a numeral, M stands for 1000"}] } }

}

Page 26: MongoDB schema design basics

Review So Far: - Started out with a simple schema - Queried Data - Evolved the schema - Queried / Updated the data some more

Observations: - Using Rich Documents works well - Simplify relations by embedding them - Iterative development is easy with MongoDB

Page 27: MongoDB schema design basics
Page 28: MongoDB schema design basics

Single Table Inheritance

>db.shapes.find() { _id: ObjectId("..."), type: "circle", area: 3.14, radius: 1} { _id: ObjectId("..."), type: "square", area: 4, d: 2} { _id: ObjectId("..."), type: "rect", area: 10, length: 5, width: 2}

// find shapes where radius > 0 >db.shapes.find({radius: {$gt: 0}})

// create index >db.shapes.ensureIndex({radius: 1})

Page 29: MongoDB schema design basics

One to Many

- Embedded Array / Array Keys - slice operator to return subset of array - hard to find latest comments across all documents

Page 30: MongoDB schema design basics

One to Many

- Embedded Array / Array Keys - slice operator to return subset of array - hard to find latest comments across all documents

- Embedded tree - Single document - Natural - Hard to query

Page 31: MongoDB schema design basics

One to Many

- Embedded Array / Array Keys - slice operator to return subset of array - hard to find latest comments across all documents

- Embedded tree - Single document - Natural - Hard to query

- Normalized (2 collections) - most flexible - more queries

Page 32: MongoDB schema design basics

Many - Many

Example:

- Product can be in many categories - Category can have many products

Products - product_id

Category - category_id

Prod_Categories -  id -  product_id -  category_id

Page 33: MongoDB schema design basics

products: { _id: ObjectId("4c4ca23933fb5941681b912e"), name: "Sumatra Dark Roast", category_ids: [ ObjectId("4c4ca25433fb5941681b912f"), ObjectId("4c4ca25433fb5941681b92af”]}

Many - Many

Page 34: MongoDB schema design basics

products: { _id: ObjectId("4c4ca23933fb5941681b912e"), name: "Sumatra Dark Roast", category_ids: [ ObjectId("4c4ca25433fb5941681b912f"), ObjectId("4c4ca25433fb5941681b92af”]}

categories: { _id: ObjectId("4c4ca25433fb5941681b912f"), name: "Indonesia", product_ids: [ ObjectId("4c4ca23933fb5941681b912e"), ObjectId("4c4ca30433fb5941681b9130"), ObjectId("4c4ca30433fb5941681b913a"]}

Many - Many

Page 35: MongoDB schema design basics

products: { _id: ObjectId("4c4ca23933fb5941681b912e"), name: "Sumatra Dark Roast", category_ids: [ ObjectId("4c4ca25433fb5941681b912f"), ObjectId("4c4ca25433fb5941681b92af”]}

categories: { _id: ObjectId("4c4ca25433fb5941681b912f"), name: "Indonesia", product_ids: [ ObjectId("4c4ca23933fb5941681b912e"), ObjectId("4c4ca30433fb5941681b9130"), ObjectId("4c4ca30433fb5941681b913a"]}

//All categories for a given product >db.categories.find({product_ids: ObjectId("4c4ca23933fb5941681b912e")})

Many - Many

Page 36: MongoDB schema design basics

products: { _id: ObjectId("4c4ca23933fb5941681b912e"), name: "Sumatra Dark Roast", category_ids: [ ObjectId("4c4ca25433fb5941681b912f"), ObjectId("4c4ca25433fb5941681b92af”]}

categories: { _id: ObjectId("4c4ca25433fb5941681b912f"), name: "Indonesia", product_ids: [ ObjectId("4c4ca23933fb5941681b912e"), ObjectId("4c4ca30433fb5941681b9130"), ObjectId("4c4ca30433fb5941681b913a"]}

//All categories for a given product >db.categories.find({product_ids: ObjectId("4c4ca23933fb5941681b912e")})

//All products for a given category >db.products.find({category_ids: ObjectId("4c4ca25433fb5941681b912f")})

Many - Many

Page 37: MongoDB schema design basics

products: { _id: ObjectId("4c4ca23933fb5941681b912e"), name: "Sumatra Dark Roast", category_ids: [ ObjectId("4c4ca25433fb5941681b912f"), ObjectId("4c4ca25433fb5941681b92af”]}

categories: { _id: ObjectId("4c4ca25433fb5941681b912f"), name: "Indonesia"}

Alternative

Page 38: MongoDB schema design basics

products: { _id: ObjectId("4c4ca23933fb5941681b912e"), name: "Sumatra Dark Roast", category_ids: [ ObjectId("4c4ca25433fb5941681b912f"), ObjectId("4c4ca25433fb5941681b92af”]}

categories: { _id: ObjectId("4c4ca25433fb5941681b912f"), name: "Indonesia"}

// All products for a given category >db.products.find({category_ids: ObjectId("4c4ca25433fb5941681b912f")})

Alternative

Page 39: MongoDB schema design basics

products: { _id: ObjectId("4c4ca23933fb5941681b912e"), name: "Sumatra Dark Roast", category_ids: [ ObjectId("4c4ca25433fb5941681b912f"), ObjectId("4c4ca25433fb5941681b92af”]}

categories: { _id: ObjectId("4c4ca25433fb5941681b912f"), name: "Indonesia"}

// All products for a given category >db.products.find({category_ids: ObjectId("4c4ca25433fb5941681b912f")})

// All categories for a given product product = db.products.find(_id : some_id) >db.categories.find({_id : {$in : product.category_ids}})

Alternative

Page 40: MongoDB schema design basics

Trees

Full Tree in Document

{ comments: [ { author: “rpb”, text: “...”, replies: [ {author: “Fred”, text: “...”, replies: []} ]} ]}

Pros: Single Document, Performance, Intuitive Cons: Hard to search, Partial Results, 4MB limit

Page 41: MongoDB schema design basics

Trees

Parent Links - Each node is stored as a document - Contains the id of the parent

Child Links - Each node contains the id’s of the children - Can support graphs (multiple parents / child)

Page 42: MongoDB schema design basics

Array of Ancestors - Store Ancestors of a node { _id: "a" } { _id: "b", ancestors: [ "a" ], parent: "a" } { _id: "c", ancestors: [ "a", "b" ], parent: "b" } { _id: "d", ancestors: [ "a", "b" ], parent: "b" } { _id: "e", ancestors: [ "a" ], parent: "a" } { _id: "f", ancestors: [ "a", "e" ], parent: "e" } { _id: "g", ancestors: [ "a", "b", "d" ], parent: "d" }

Page 43: MongoDB schema design basics

Array of Ancestors - Store Ancestors of a node { _id: "a" } { _id: "b", ancestors: [ "a" ], parent: "a" } { _id: "c", ancestors: [ "a", "b" ], parent: "b" } { _id: "d", ancestors: [ "a", "b" ], parent: "b" } { _id: "e", ancestors: [ "a" ], parent: "a" } { _id: "f", ancestors: [ "a", "e" ], parent: "e" } { _id: "g", ancestors: [ "a", "b", "d" ], parent: "d" }

//find all descendants of b: >db.tree2.find({ancestors: ‘b’})

Page 44: MongoDB schema design basics

Array of Ancestors - Store Ancestors of a node { _id: "a" } { _id: "b", ancestors: [ "a" ], parent: "a" } { _id: "c", ancestors: [ "a", "b" ], parent: "b" } { _id: "d", ancestors: [ "a", "b" ], parent: "b" } { _id: "e", ancestors: [ "a" ], parent: "a" } { _id: "f", ancestors: [ "a", "e" ], parent: "e" } { _id: "g", ancestors: [ "a", "b", "d" ], parent: "d" }

//find all descendants of b: >db.tree2.find({ancestors: ‘b’})

//find all ancestors of f: >ancestors = db.tree2.findOne({_id:’f’}).ancestors >db.tree2.find({_id: { $in : ancestors})

Page 45: MongoDB schema design basics

findAndModify Queue example

//Example: find highest priority job and mark

job = db.jobs.findAndModify({ query: {inprogress: false}, sort: {priority: -1), update: {$set: {inprogress: true, started: new Date()}}, new: true})

Page 46: MongoDB schema design basics

Cool Stuff - Aggregation - Capped collections - GridFS - Geo

Page 47: MongoDB schema design basics

Learn More •  Kyle’s presentation + video: http://www.slideshare.net/kbanker/mongodb-schema-design http://www.blip.tv/file/3704083

•  Dwight’s presentation http://www.slideshare.net/mongosf/schema-design-with-mongodb-dwight-merriman

•  Documentation Trees: http://www.mongodb.org/display/DOCS/Trees+in+MongoDB Queues: http://www.mongodb.org/display/DOCS/findandmodify+Command Aggregration: http://www.mongodb.org/display/DOCS/Aggregation Capped Col. : http://www.mongodb.org/display/DOCS/Capped+Collections Geo: http://www.mongodb.org/display/DOCS/Geospatial+Indexing GridFS: http://www.mongodb.org/display/DOCS/GridFS+Specification

Page 48: MongoDB schema design basics

Thank You :-) �

Page 49: MongoDB schema design basics

Download MongoDB�

http://www.mongodb.org  

and  let  us  know  what  you  think  @mongodb  

Page 50: MongoDB schema design basics

DBRef DBRef {$ref: collection, $id: id_value}

- Think URL - YDSMV: your driver support may vary

Sample Schema: nr = {note_refs: [{"$ref" : "notes", "$id" : 5}, ... ]}

Dereferencing: nr.forEach(function(r) { printjson(db[r.$ref].findOne({_id: r.$id})); }

Page 51: MongoDB schema design basics

BSON Mongodb stores data in BSON internally

Lightweight, Traversable, Efficient encoding

Typed boolean, integer, float, date, string, binary, array...