Ciel ! Mes données ne sont plus relationnelles
BLEND WEB MIX 01 Octobre 2013
1
Xavier Gorse
2
@xgorse
3
Association Française des Utilisateurs de PHP
• Crée en 2001• Forum PHP ( 21 & 22 Novembre 2013 à Paris)• AperoPHP et Rendez Vous• Antennes Locale• Président en 2009 www.afup.org
• Initié en 2010 par Hugo Hamon• Pas encore une vraie association• Sfpot mensuel avec conférence suivie d’un apéro• Antenne à Marseille, Lyon ??
Association Francophone des utilisateurs de SYmfony
www.afsy.fr
4
Elao
• Fondateur en 2005
• Lyon & Paris
• Agence Web Technique de 15 personnes
• Symfony depuis 2006
• Partenaire officiel SensioLabs
www.elao.com
5
Plan
• Trend
• Key-value databases
• Document databases
• Graph databases
• Column-oriented databases
6
RDBMS performance
7
Data complexity
Perfo
rman
ce
Relational database
Requirement of application
Salary&list&
Most&Web&apps&
Social&Network&
Loca5on7based&services&
Source @ianSrobinson - @jimwebber from NeoTechnology
complexity = f(size, connectedness, uniformity)
8
Data Size
9
2007 2008 2009 2010 20112012
2013
Data Size
• 500 million page views a day
• ~3TB of new data to store a day
• Posts are about 50GB a day. Follower list updates are about 2.7TB a day.
10
Connectedness
11
Source @ianSrobinson - @jimwebber from NeoTechnology
1990 2010 20202000
web 2.0 “web 3.0”web 1.0
Inform
a(on
)con
nec(vity)
Text)Documents)
Hypertext)
Feeds)
Blogs)
Wikis)
UGC)
Tagging)Folksonomies)
RDFa)
Ontologies)
GGG)
Uniformity
• Semi-‐structured data
• Different data lifecycle
• Store more data about each en7ty
• Individualisa7on & decentraliza7on of content genera7on
12
NoSQLNot Only SQL
13
NoSQL
• Non-‐Rela7onal
• Cluster Friendly
• Schema less
• Distributed architecture
14
ACID & CAP Theorem
ACID
• Atomicity
• Consistency
• Isola7on
• Durability
15
Cap Theorem
• Consistency
• Availability
• Par77on Tolerance
Column 1 : value
Column 2 : value
Column 3 : value
Key
Key
Field 1 : value
Field A : value
Field B : value
Field 2 : valueNode 3
Node 2
Node 4
Node 5
Node 1
Key/Value Column-oriented
Document
Column-oriented
Graph
ValueKey
ValueKey
ValueKey
ValueKey
16
Column 1 : value
Column 2 : value
Column 3 : value
Key
Key
Field 1 : value
Field A : value
Field B : value
Field 2 : valueNode 3
Node 2
Node 4
Node 5
Node 1
Key/Value Column-oriented
Document
Column-oriented
Graph
ValueKey
ValueKey
ValueKey
ValueKey
17
Key-value databases
• Inspired by Amazon’s Dynamo (2007)
• Global collection of key-value
• Big scalable HashMap
18
• Strengths
• Simple data model
• High performance
• Great at scaling out horizontally
• Weaknesses
• Simplistic data model
• Poor for complex data
19
Key-value databases
• Written in C - BSD License - 2009
• Very fast and light-weigth
• All data in memory
• Persistence
• Master/Slave Replication
• Used for caching, session or working queue
20
Key-value databases
http://redis.io/
• Riak
• Memcache (RAM)
• Voldemort
• Amazon DynamoDB (Saas)
• IronCache (Saas)
21
Key-value databases
Column 1 : value
Column 2 : value
Column 3 : value
Key
Key
Field 1 : value
Field A : value
Field B : value
Field 2 : valueNode 3
Node 2
Node 4
Node 5
Node 1
Key/Value Column-oriented
Document
Column-oriented
Graph
ValueKey
ValueKey
ValueKey
ValueKey
22
Document databases
• Inspired by IBM Lotus Notes/Domino
• Idem from Key/Value with value as a document
• A document is a key-value collection
• Flexible schema
• Non-relational, data is de-normalized
23
Document databases • Strengths
• Simple, powerful data model
• Good scaling, Easy/Auto sharding
• Usually “ACID” compliant
• Weaknesses
• Unsuited for interconnected data
• Query model limited to keys (and indexes)
24
Document databases • Written in C++ - License AGPL - 2009
• JSON-style documents
• Full Index Support
• Fast In-Place Updates
• Auto-Sharding
• Replication & High Availability
• A lot of Connector
• Big Community
• Commercial Support
25
http://www.mongodb.org
Document databases
• Lotus Notes / Domino
• CouchDB written in Erlang, Javascript for Query
• OrientDBwritten in Java, relationship as graph
26
Column 1 : value
Column 2 : value
Column 3 : value
Key
Key
Field 1 : value
Field A : value
Field B : value
Field 2 : valueNode 3
Node 2
Node 4
Node 5
Node 1
Key/Value Column-oriented
Document
Column-oriented
Graph
ValueKey
ValueKey
ValueKey
ValueKey
27
Graph databases
• Nodes with properties
• Named relationships with properties
• Focus on the data structure
• Direct pointer to its adjacent element and no indexlookups are necessary
28
Graph databases• Strengths
• Powerful data model
• Fast for connected data
• A new data architecture
• Weaknesses
• No Sharding : All data in one instance
• Using Node/Relation property for Query kill performance
• A new data architecture
29
Graph databases• Java - GPL/Commercial - 2007
• Query language : Cypher / Gremlin
• REST Interface
• Embed Mode
• High Availability ( Master / Slave)
• Commercial Support
30
http://neo4j.org
GraphDB - Products
• Titan
• OrientDB
• InfiniteGraph
• AllegroGraph
31
Column 1 : value
Column 2 : value
Column 3 : value
Key
Key
Field 1 : value
Field A : value
Field B : value
Field 2 : valueNode 3
Node 2
Node 4
Node 5
Node 1
Key/Value Column-oriented
Document
Column-oriented
Graph
ValueKey
ValueKey
ValueKey
ValueKey
32
Column-oriented database
• A big table, with column families
• Data stored by column instead of row
• Build for distributed architecture
• Map-reduce for querying/processing
• Flexible schema
• Easy sharding (partitioning)
33
Column-oriented database• Strengths
• Data model supports semi-structured data
• Naturally indexed (columns)
• Horizontally scalable – RW increase linearly
• Fault tolerant – no single point of failure
• Weaknesses
• Unsuited for interconnected data
34
Column-oriented database• Java - Apache License 2 - 2008
• Developed by Facebook
• Decentralized
• Supports replication and multi data center replication
• Scalability
• Fault-tolerant
• MapReduce support
35
http://cassandra.apache.org/
Column-oriented database
• HBase (Apache)
• HyperTable
• BigTable (Google)
36
Conclusion
• Application architecture impact
• Store your data in the way you want to query it
• Denormalize your data and try to keep them up-to-date !
37
38
Merci