mongodb - chewbii.com · mongodb esilv [email protected] applications withmongodb metlife:...
TRANSCRIPT
MongoDB
MongoDB
Introduction
• Distributed NoSQL database▫ Large data management
(Humongous)▫ Document-oriented NoSQL
� JSON documents� (BSON Objects serialization)
• Key points:▫ Simple to integrate for developers▫ Rich query language
� MQL: Mongo Query Language▫ Several optimization features
� Attributes indexing (Shard/Btree/RTree)
▫ Smart data allocation� Sharding (GridFS) [+ tagging /
clustering]▫ Several deployement
� Private/Public Cloud, Local (On Premise)
2
MongoDB
Applications with MongoDBMetlife : unified viewCisco : e-commerceBosch : IoTHSBC : digital transformationThe Weather Channel : mobilityExpedia : trip recommendationsebay : productsAstraZeneca : medics loggingComcast : Database-as-a-serviceKPMG : AnalyticsX.ai : AIEA : videos games
3
MongoDB
Evolutions• V3.0▫ Compression, OpsManager
• V3.2▫ Schema validation, $lookup, BI/Spark
connectors
• V3.4▫ Load balancing++, $facet, zones++
• V3.6▫ Causal Consistency, sessions
• V4.0▫ ACID Transactions (local), Types
conversion▫ MongoDB Stitch (lightweight local /
mobile)▫ Security SHA2 (cryptage TLS1.1)
• V4.2▫ Transactions ACID in sharding !!!
4
MongoDB
Interactions: JavaScript
• JS objects▫ Attributes + functions▫ db: database▫ sh: sharding▫ rs: replica set• MQL▫ "JSON" = pattern• MapReduce▫ Functions (untyped)
5
Studio3t
Robo3t
MongoDBCompass
MongoDB
Starting Commands• Select database (shell)
▫ Command: > use myBD ;
• Collections of documents▫ Create: > db.createCollection(‘users’);
▫ Manipulate: > db.users. <command> ;
� Commands : find(), save(), delete(), update(), aggregate(), distinct(), mapReduce()…
� Meaning in SQL: FROM
• Documents
▫ JSON:
▫ Insert: > db.users.save ( ); //no quotes!
{
"_id": ObjectId("4efa8d2b7d284dad101e4bc7"),
"name": "James Bond", "login": "james", "age": 50,
"address”: {"street": "30 Wellington Square", "city": "London”},
"job" : ["agent", "MI6"]
}
6
MongoDB
MQL: find queries1 (1/2)• Document oriented queries• Command: > db.users.find( <filter> , <projection> );
• Filter:• “JSON” pattern that must be matched on the document• “Key/Value” format• Can embed exact matching, operations, nesting, arrays
• Operations: $op (no quotes)• Meaning in SQL: WHERE
• Projection:• “Key/Value” pairs that are given in output• Meaning in SQL: SELECT (no aggregates)
• Example:> db.users.find( {“login” : "james"} , {“name” : 1, “age” : 1});
71 - find queries are simple queries for your report
MongoDB
MQL: find queries (2/2)• Exact match:
> db.users.find( { “login” : "james" } , {“name” : 1, “age” : 1});
• With nesting "." for the path to the given key> db.users.find( { “address.city” : "London" } );
• Operations> db.users.find( { “age” : { $gt : 40 } } );
//$gt, $gte, $lt, $lte, $ne, $in, $nin, $or, $and, $exists, $type, $size, $cond…
• Regular expressions1
> db.users.find( { ”name” : {$regex : “james”, $options : “i" }} );
• Arrays> db.users.find( { ”job” : “MI6”} ); //Check in the list> db.users.find( { ”job.1” : “MI6”} ); //2nd position in the list> db.users.find( { ”job” : [“MI6”]} ); //List exact match
81 - regex : https://docs.mongodb.com/manual/reference/operator/query/regex/
MongoDB
MQL: Distinct - Count
• List of distinct values from a key> db.users.distinct( “name” );> db.users.distinct( “address.city” );
• Number of documents> db.users.count();> db.users.find( { “age” : 50 }).count();
9
MongoDB
MQL: pipeline2 (1/3)• aggregate() : Ordered sequence of operators aggregation pipeline
• Command:> db.users.aggregate( [ {$op1 : {}}, {$op2 : {}}, … ] );
• Operators:• ����� : simple queries //meaning in SQL: WHERE
• �������� : projections //meaning in SQL: SELECT
• ������: sort result set //meaning in SQL: ORDER BY
• ������ : normalization in 1NF• �������: group by value + aggregate function // meaning in SQL: GROUP BY + fn
• � ����� : left outer join (since 3.2 – work locally)• �����: store the output in a collection (since 3.2)• �������� : sort by nearest points (lat/long)• …
102 – Pipeline queries (aggregate) are complex queries. For long pipelines, it can be hard queries
MongoDB
MQL: pipeline (2/3)• Pipeline : Each step (operator) gives its output to the following operator (input)
> db.users.aggregate([{$match : {"address.city" : "London"}},
{$project : {"login" : 1, "age" : 1}},
{$sort : {"age" : 1, "login" : -1}}
]);
• $unwind
▫ Unnest an array and produce a new document for each item in the list> db.users.aggregate([ {$unwind : "$job"}} ]);
{
"_id": ObjectId("4efa8d2b7d284dad101e4bc7"),
"name": "James Bond", "login": "james", "age": 50,
"address”: {"street": "30 Wellington Square", "city": "London”},
"job" : "agent"}
{
"_id": ObjectId("4efa8d2b7d284dad101e4bc7"),
"name": "James Bond", "login": "james", "age": 50,
"address”: {"street": "30 Wellington Square", "city": "London”},
"job" : "MI6"}
{
"_id": ObjectId("4efa8d2b7d284dad101e4bc7"),
"name": "James Bond", "login": "james", "age": 50,
"address”: {"street": "30 Wellington Square", "city": "London”},
"job" : ["agent", "MI6"]}
11
MongoDB
MQL: pipeline (3/3)• $group : key (_id) + aggregate ($sum / $avg / …)
No grouping key: only one output> db.users.aggregate([ {$group : {"_id" : "age", "res": {$sum : 1}}} ]);
Group by value: $key> db.users.aggregate([ {$group:{"_id" : "$age", "res": {$sum : 1}}} ]);
Apply an aggregate function: $key> db.users.aggregate([{$group:{"_id":"$address.city", "avg": {$avg: "$age"}}} ]);
• Sequence example
> db.users.aggregate([{$match: {"address.city" : "London"} },{$unwind : "$job" },
{$group : {"_id" : "$job", "val": {$avg: "$age"}} },
{$match : {"val" : {$gt : 30}} },
{$sort : { "val" : -1} } ]);
12
MongoDB
MQL: updates> db.users.update (
{ “_id” : ObjectId("4efa8d2b7d284dad101e4bc7") },
{ “$inc” : { “age” : 1 } } );
• Atomic updates (one document)▫ $set – modify value▫ $unset – remove a key/value▫ $inc – Increment▫ $push – Add inside an array▫ $pushAll – Add several values▫ $pull – Remove a value in an array▫ $pullAll – Remove several values
UpdateAll – all documents are updated
13
MongoDB
Replica Set• Data replication• Asynchronus• Primary server: writes• Secondary servers: reads
• Fault tolerance• New primary server election• Needs an arbiter
• Consistency vs availability• Reads applied on the primary (by
default)• Updates via oPlog (log file)
15
MongoDB
Sharding• Data distribution on a cluster• Data balancing▫ Sharding key choice: begining• 1 Shard = 1 Replica Set• Min config:▫ Config Servers (x3)▫ Mongos (router x3)• Partitioning key (sharding)▫ Ranged-based (GridFS : sort)> sh.shardCollection( "myBD.users", "login" );
▫ Hash-based (md5 sur clé)> sh.shardCollection( "myBD.users", {"_id":"hashed"} )
▫ By zones/tags (+ sharding)16
MongoDB
Cluster management
• OpsManager▫ Shard & ReplicaSet management� Nodes template .yaml
• Cloud Atlas▫ Deployment template▫ Sharding zones
MongoDB
Indexing• Create a BTree
> db.users.createIndex( {"age":1} ) ;
▫ All queries on « age » become more efficient� Even for Map/Reduce with « queryParam »� Execution plan « .explain() »
▫ No combination of indexes
18
MongoDB
Indexing: 2DSphere
Apply geolocalized queries> db.users.ensureIndex( { "address.location" : "2dsphere" } );
• Localization format"location":{"type": "Point", "coordinates" : [51.489220, -0.162866]}
• Spherical queriesvar near = {$near: {$geometry:{"type":"Point","coordinates":[51.489220,-0.162866]},
$maxDistance : 10000}};db.users.find( {"address.location":near}, {"name":1, "_id":0});
• $geoNear operator (aggregate)var geoNear = {"near":{"type":"Point", "coordinates" : [51.489220,-0.162866]},
"maxDistance":10000, "distanceField" : "outputDistance", "spherical":true}; db.users.aggregate([ {"$geoNear": geoNear} ]);
• « polygonales » queriesvar polygon = {$geoWithin : {$geometry : {"type" : "Polygon", "coordinates" : [ [P1], [P2], [P3], [P1] }}}
19http://docs.mongodb.org/manual/tutorial/query-a-2d-index/
MongoDB
Storage Engine• Wired Tiger
▫ By default since 3.2▫ Created for BerkeleyDB, used for Oracle NoSQL▫ Concurrency purpose: multi-version
• MMAPV1▫ Original engine▫ Mapping in main memory▫ Lock on collections (long writes)
• Encrypted storage▫ Since 3.2▫ Based on Wired Tiger▫ KMIP (Key Management Interoperability Protocol)
• In Memory▫ Since 3.2.6▫ Based on Wired Tiger▫ Must fit in main memory (error : WT_CACHE_FULL)
20
MongoDB
ACID Transactions• Data versioning▫ Causal Consistency▫ "Snapshots" of documents (per transaction)
� Updates on snapshots and try to apply it▫ Timeout: 60s, oplog max: 16MB• ACIS Transactions in a ReplicaSet (v4.0)▫ Maybe in sharding mode in v4.2• Commands:
with client.start_session() as s:s.start_transaction()collection_one.insert_one(d1, session=s)collection_one.insert_one(d2, session=s)s.commit_transaction()
22
MongoDB
Map/Reduce (1/2)Language : Javascript
var mapFunction = function () {if( this.age > 30 && this.job.contains(“MI6”) )
emit (this.address.city, this.age);} //emit : key, value, this > current document
var reduceFunction = function (key, values) {return Array.sum(values);
} //returns a unique value aggregated from the list of "values" (for a given key)
var queryParam = {query : {}, out : "result_set"}//query: filter before applying the map, $match//out: output collection
db.users.mapReduce (mapFunction, reduceFunction, queryParam);db.result_set.find();
23
MongoDB
Map/Reduce (2/2)• Map
▫ Several « emit » per map (key/value)▫ Applied on every documents
� Unless: index + queryParam
• Reduce
▫ Optimization : local and global▫ No reduce if only one « value »ØHeterogeneous results/computations
• Beware of non associative functions (averages, lists, etc.)
• Shuffle
▫ Group values by keys (from « emit »)▫ Optimize on sharding key (by default _id)
24
MongoDB
Reduce: Average function (1/2)
25
(a,3) (b,2) (a,5) (c,2) (c,4) (a,2) (a,5) (a,1)chunks
LocalReduce
(a,[5,3])(b,[2])
Shuffle
GlobalReduce
(a,[4,2,3]) (b,[2]) (c,[3,1])
(a,3) (b,2) (c,2)
(a,4)
(c,[2,4]) (a,[2]) (a,[5,1])
(b,2) (c,3) (a,2) (a,3)
(c,1)
(c,[1])
(c,1)
Normaly: (a, 3.2), (b, 2), (c, 2.33)
MongoDB
Reduce: Average function (2/2)• Non associative function computation▫ Transform Map & Reduce values in associative operations
� Avg = sum / nbØProduce sum (sums of sum values) and nb (sums of nb values)Ø{"avg" : sum/nb, "sum" : sum, "nb" : nb}
• Same structure for values in map & reduce
26
MongoDB
MongoDB Softwares
• Platform Cloud Atlas▫ PaaS
• MongoDB Charts
28
MongoDB
Drivers / API
• Drivers: http://docs.mongodb.org/ecosystem/drivers/
▫ Python, Ruby, Java, Javascript (Node.js), C++, C#, PHP, Perl, Scala…▫ Syntax: http://docs.mongodb.org/ecosystem/drivers/syntax-table/
29
MongoDB
Driver Java Connection
ServerAddress server = new ServerAddress ("localhost", 27017);Mongo mongo = new Mongo(server);
Environment (DB + collection)
DB db = mongoClient.getDB( "maBD" );DBCollection users = db.getCollection("users");
Authentification
db.authenticate(login, passwd.toCharArray());Document inserting (DBObject)
DBObject doc = new BasicDBObject("name", "MongoDB").append("type", "database").append("count", 1).append("info", new BasicDBObject("x",
203).append("y", 102));// or doc = (DBObject) JSON.parse(jsonText) ;
users.insert(doc);
Querying (pattern query + cursor)
BasicDBObject query = new BasicDBObject("name", "James Bond");
DBCursor cursor = coll.find(query);try {
while(cursor.hasNext()) {System.out.println(cursor.next());
}} finally {
cursor.close();}
Map/Reduce :
MapReduceCommand cmd = new MapReduceCommand("users", map,
reduce, "outputColl", MapReduceCommand. OutputType.REPLACE, query);
MapReduceOutput out = users.mapReduce(cmd);for (DBObject o : out.results()) {
System.out.println(o.toString());}//outputType : INLINE, REPLACE, MERGE, REDUCE
30http://api.mongodb.org/java/current/index.html?_ga=1.177444515.761372013.1398850293
MongoDB
Driver C#References/Libraries
MongoDB.Bson.dllusing MongoDB.Bson;
MongoDB.Driver.dllusing MongoDB.Driver;
Connection
var connectionString = "mongodb://localhost";var client = new MongoClient(connectionString);var server = client.GetServer();
Database & Collection
var db = server.GetDatabase("maBD");var coll = db.GetCollection<User>("users");
Objet « User » to define
public class User{public ObjectId Id { get; set; }public string name { get; set; }public string login { get; set; }public int age { get; set; }public Address address { get; set; }
}
Get a document :
var query=Query<User>.EQ(e=>e.login,"james");var entity = coll.FindOne(query);
Package for output results: LINQ
using MongoDB.Driver.Linq;Queries:
var query = coll.AsQueryable<User>().Where(e => e.age > 40).OrderBy(c => c.name);
Answor:
foreach (var user in query){
// traitement}
MapReduce :
var mr = coll.MapReduce(map, reduce);foreach (var document in mr.GetResults()) {
Console.WriteLine(document.ToJson());}
http://www.nuget.org/packages/mongocsharpdriver/
31
MongoDB
Driver PythonImportation
>>> import pymongoConnexion
>>> from pymongo import MongoClient>>> client = MongoClient()>>> client = MongoClient('localhost', 27017)
Database & collection
>>> db = client.maBD>>> coll = db.users
GET one document
coll.find_one ( { "login" : "james"} )Query
>>> for c in coll.find ( {"age": 40} ) :… pprint.pprint (c)
MapReduce
>>> from bson.code import Code>>> map = Code("function () {" ... " this.tags.forEach(function(z) {" ... " emit(z, 1);" ... " });" ... "}")>>> reduce = Code("function (key, values) {" ... " var total = 0;" ... " for (var i = 0; i < values.length; i++) {" ... " total += values[i];}" ... " return total; } ")>>> result = coll.map_reduce(map, reduce,
"myresults")>>> for doc in myresults.find(): ... print doc
32