inferring versioned schemas from nosql databases and its applications

23
Inferring Versioned Schemas from NoSQL Databases and its Applications ER’15 Stockholm, October 2015 [{ ”id” : ”90234af” , ”value” : { ”author” : ”Diego Sevilla Ruiz” , ”e-mail” : [email protected], ”institution” : ”U. of Murcia” }}, { ”id” : ”a243bb5” , ”value” : { ”author” : ”Severino Feliciano Morales” , ”e-mail” : [email protected], ”institution” : ”U. of Murcia” }}, { ”id” : ”096705d” , ”value” : { ”author” : ”Jesús García Molina” , ”e-mail” : [email protected], ”institution” : ”U. of Murcia” }}]

Upload: diego-sevilla-ruiz

Post on 12-Apr-2017

528 views

Category:

Data & Analytics


0 download

TRANSCRIPT

Page 1: Inferring Versioned Schemas from NoSQL Databases and its Applications

Inferring VersionedSchemas from NoSQLDatabases and its

ApplicationsER’15

Stockholm, October 2015

[{ ”id”: ”90234 af”, ”value”: { ”author”: ”Diego Sevilla Ruiz”,”e-mail”: ”[email protected]”,”institution”: ”U. of Murcia”}},

{ ”id”: ”a243bb5”, ”value”: { ”author”: ”Severino Feliciano Morales”,”e-mail”: ”[email protected]”,”institution”: ”U. of Murcia”}},

{ ”id”: ”096705d”, ”value”: { ”author”: ”Jesús García Molina”,”e-mail”: ”[email protected]”,”institution”: ”U. of Murcia”}}]

Page 2: Inferring Versioned Schemas from NoSQL Databases and its Applications

Motivation

NoSQL Databases are Schemaless

Benefits▶ No need to previouslydefine an Schema

▶ Non-uniform data▶ Custom fields▶ Non-uniform types

▶ Easier evolution

Drawbacks▶ Harder to reason aboutthe DB

▶ Static checking is lost▶ Some of the data logic isin the application code(more error prone)

▶ Some utilities needSchema information towork

Page 3: Inferring Versioned Schemas from NoSQL Databases and its Applications

Schemas for NoSQL Databases

▶ How to alleviate the problems of schemalessdatabases? ⇒ Inferring a Schema

▶ The Schema Model contains information aboutEntities and Relationships

▶ Take into account the different Entity Versions inthe Database

▶ Heterogeneity usually because of slight variations onEntities

▶ We obtain a precise database model▶ The Schema allows us to automate the constructionof tools:

▶ migration, refactoring, visualization, …

Page 4: Inferring Versioned Schemas from NoSQL Databases and its Applications

Related Work▶ JSON Schema

▶ Object versions and relationships are not considered▶ Apache Spark SQL/Drill: SQL-like schemas

▶ Union of all fields, nullable ⇒ incorrect combinations▶ Over-generalization to String▶ Aggregations and Reference relations not considered

▶ MongoDB-Schema▶ Prototype to infer schemas from MongoDBcollections

▶ Same limitations than Spark SQL▶ JSON Discoverer

▶ A MDE solution to infer domain models from RESTweb services (i.e. JSON documents)

▶ Not database-oriented; Object versions notconsidered

Page 5: Inferring Versioned Schemas from NoSQL Databases and its Applications

Spark SQL Example

{”name”:”Michael”}{”name”:”Andy”, ”age”:30}{”name”:”Justin”, ”age”:19}{”name”:”Peter”, ”age”:”tiny”}{”name”:”Martina”, ”address”:”home!”}

> people.printSchemaroot|-- address: string (nullable = true)|-- age: string (nullable = true)|-- name: string (nullable = true)

▶ age promoted to string▶ age and address are never part of the same object

Page 6: Inferring Versioned Schemas from NoSQL Databases and its Applications

{”rows”:[{”content”:{”chapters”:33,”pages”:527

},”authors”:[{”company”:{”country”:”USA”,”name”:”IBM”},

”name”:”Grady Booch”,”_id”:”210”

},{”company”:{”country”:”USA”,”name”:”IBM”

},”name”:”James Rumbaugh”,”_id”:”310”

},{”country”:”USA”,”company”:”Ivar Jacobson Consulting”,”name”:”Ivar Jacobson”,”_id”:”410”

}],”type”:”book”,”year”:2013,”publisher_id”:”345679”,”title”:”The Unified Modeling Language”,”_id”:”1”

},{”discipline”:”software engineering”,”issn”:[”0098 -5589”,”1939 -3520”

],”name”:”IEEE Trans. on Software Engineering”,”type”:”journal”,”_id”:”11”

},

{”name”:”Automated Software Engineering”,”issn”:[”0928 -8910”,”1573 -7535”

],”discipline”:”software engineering”,”type”:”journal”,”_id”:”12”,”number”:10515

},{”city”:”Barcelona”,”name”:”Omega”,”type”:”publisher”,”_id”:”123451”

},{”type”:”publisher”,”city”:”Newton”,”name”:”O’Reilly Media”,”_id”:”928672”

},{”type”:”book”,”author”:{”_id”:”101”,”name”:”Bradley Holt”,”company”:{

”country”:”USA”,”name”:”IBM Cloudant”,

}},”title”:”Writing and Querying MapReduce Views in

CouchDB”,”publisher_id”:”928672”,”_id”:”2”

},{”name”:”Addison -Wesley”,”type”:”publisher”,”_id”:”345679”

},{”type”:”publisher”,”journals”:[”11”,”12”

],”name”:”IEEE Publications”,”_id”:”907863”

}]}

Page 7: Inferring Versioned Schemas from NoSQL Databases and its Applications

NoSQL Database Model▶ Objects (Entities) and Entity Versions

▶ Attributes▶ Relationships

▶ Aggregation▶ References

{”type”:”publisher”,”city”:”Newton”,”name”:”O’Reilly Media”,”_id”:”928672”

},{”type”:”book”,”author”:{”_id”:”101”,”name”:”Bradley Holt”,”company”:{

”country”:”USA”,”name”:”IBM Cloudant”,

}},”title”:”Writing and Querying MapReduce Views in CouchDB”,”publisher_id”:”928672”,”_id”:”2”

},

Page 8: Inferring Versioned Schemas from NoSQL Databases and its Applications

Schema & Entity Versions DescriptionEntity Publisher {

Version 1 {

name: String

city: String

}

Version 2 {

name: String

}

Version 3 {

name: String

journal[+]: [Ref]->[Journal] (opposite=False)

}

}

Entity Journal {

Version 1 {

issn: Tuple [String, String]

name: String

discipline: String

}

Version 2 {

issn: Tuple [String, String]

name: String

discipline: String

number: int

}

}

Entity Book {

Version 1 {

title: String

year: int

publisher[1]: [Ref]->[Publisher] (opossite=False)

content[1]: [Aggregate]Content1

author[+]: [Aggregate]Author1

}

Version 2 {

title: String

publisher[1]: [Ref]->[Publisher] (opossite=False)

author[1]: [Aggregate]Author1

}

}

Entity Author {

Version 1 {

name: String

company[1]: [Aggregate]Company

}

Version 2 {

country: String

company: String

name: String

}

}

Entity Company {

Version 1 {

name: String

country: String

}

}

Entity Content {

Version 1 {

chapters: int

pages: int

}

}

(a) (b)

[1..1] company

[1..1] publisher[1..1] content[1..*] authors

[1..*] journals

Page 9: Inferring Versioned Schemas from NoSQL Databases and its Applications

Solution Design Considerations

▶ We have to process all the objects in the Database⇒Map-Reduce

▶ Natural data processing on NoSQL databases▶ LeverageMDE technologies

▶ Reuse EMF/Ecore tooling to show entity diagrams▶ Automation & Code Generation byMetamodeling &Model Transformations

Page 10: Inferring Versioned Schemas from NoSQL Databases and its Applications

Proposed MDE Architecture

NoSQLDatabase

MapReduceObjectVersions(JSON)

JSONInjection

JSONModel

JSONMetamodel

SchemaReverseEng

SchemaModel

ApplicationGeneration

SchemaViewer/Data

Validator/MigrationAssistant

Applications SchemaMetamodel

instan

ce

instance

Page 11: Inferring Versioned Schemas from NoSQL Databases and its Applications

Reverse Engineering Process (i)

▶ Map-Reduce process▶ Map: obtains the Raw Schema for each object▶ Reduce: selects an archetype for each Entity Version▶ Entity Type

▶ Root objects ⇒ “type” field or collection name▶ Aggregated objects ⇒ key of the pair (e.g. “author”)

JSON object Raw Schema

{name:“Omega”, city:“Barcelona”} {name:String, city:String}

{title:“Writing and...”,publisher_id:“928672”,author:{name:“Bradley Holt”,

company:{country:“USA”,name:“IBM Cloudant”} } }

{title:String,publisher_id:String,author:{name:String,

company:{country:String,name:String} } }

Page 12: Inferring Versioned Schemas from NoSQL Databases and its Applications

Reverse Engineering Process (ii)

▶ Attributes: primitive or tuple▶ Aggregated Entities

▶ Value of the pair is an Object (or array of objects)▶ Entity type inferred from the key

▶ References▶ Heuristics/Conventions▶ Key: <entity_name>_id▶ Value: MongoDB’s DBRef abstraction:{”$ref”: ”<entity_name>”, ”$id”, <id_value>}

▶ Honor cardinalities (arrays)

Page 13: Inferring Versioned Schemas from NoSQL Databases and its Applications
Page 14: Inferring Versioned Schemas from NoSQL Databases and its Applications

Example NoSQL Applications

▶ From the DBSchema model, using ModelTransformations and Model-to-Text transformations(Code Generation), we can:

▶ Generate models that Characterize each EntityVersion

▶ That characterization can be used to Visualize theDatabase

▶ And also to generate code to Validate objectsentering the Database

▶ Generate models that allow Database Migration tothe desired Entity Versions

Page 15: Inferring Versioned Schemas from NoSQL Databases and its Applications

Type Discrimination/CharacterizationMetamodel

Page 16: Inferring Versioned Schemas from NoSQL Databases and its Applications

function isOfExactTypeBook_2(obj) {if (! (”type” in obj)) {

return false;}if (obj[type] !== ”Book”) {

return false;}if (! (”title” in obj)) {

return false;}if (! (”author” in obj)) {

return false;}if (”publisher” in obj) {

return false;}if (”content” in obj) {

return false;}if (”year” in obj) {

return false;}

return true;}

Generated using a Model-to-Text transformationfrom an instance of theprevious Type Discrimina-tion Metamodel

Page 17: Inferring Versioned Schemas from NoSQL Databases and its Applications
Page 18: Inferring Versioned Schemas from NoSQL Databases and its Applications
Page 19: Inferring Versioned Schemas from NoSQL Databases and its Applications

Entity Versions

Page 20: Inferring Versioned Schemas from NoSQL Databases and its Applications

Alternate: D3.js Treemap

Page 21: Inferring Versioned Schemas from NoSQL Databases and its Applications

Type Transformation Metamodel

db.<collection >. update(<query >,<update >,{

multi: true}

)

Obtained by Entity Type Characterization

Generate the correct updateMongoDB statement using $set,$push, etc., maybe via user assis-tance through a DSL.For example, for Journal_1 toJournal_2:

$set: { ”number”: 1 }

Page 22: Inferring Versioned Schemas from NoSQL Databases and its Applications

Conclusions & Future work

▶ A process for obtaining Conceptual Model Schemasfor NoSQL Databases is shown

▶ The process takes into account the different EntityVersions present in the Database

▶ AMDE process allows us to automate theproduction of several applications from the Schemas

▶ Example applications that allow DatabaseVisualization andMigration are shown

Page 23: Inferring Versioned Schemas from NoSQL Databases and its Applications

Conclusions & Future work (ii)

▶ Future work includes:▶ Building a NoSQL Database Tool Set (NoSQL DataEngineering)

▶ DSL for Entity Version migration▶ Refining the Schema to allow a richer Type System

▶ Allow value ranges or enumerated sets▶ Infer attribute dependencies (derived attributes,

i.e. the value of an attribute dictates the value ofanother attribute)

▶ etc.