ciel, mes données ne sont plus relationnelles
Embed Size (px)
DESCRIPTION
Quand la gestion des données de nos applications web dépasse la simple persistance dans une base de données relationnelle (type SGBD), l’utilisation de technologies alternatives dites « NoSql » est nécessaire. Nous aborderons les 4 grandes familles de NoSql (Key/Value, Document, Column-oriented et Graph) ainsi que leur intégration dans des applications PHP.TRANSCRIPT

Ciel ! Mes données ne sont plus relationnelles
BLEND WEB MIX 01 Octobre 2013
1

Xavier Gorse
2
@xgorse

3
Association Française des Utilisateurs de PHP
• Crée en 2001• Forum PHP ( 21 & 22 Novembre 2013 à Paris)• AperoPHP et Rendez Vous• Antennes Locale• Président en 2009 www.afup.org
• Initié en 2010 par Hugo Hamon• Pas encore une vraie association• Sfpot mensuel avec conférence suivie d’un apéro• Antenne à Marseille, Lyon ??
Association Francophone des utilisateurs de SYmfony
www.afsy.fr

4
Elao
• Fondateur en 2005
• Lyon & Paris
• Agence Web Technique de 15 personnes
• Symfony depuis 2006
• Partenaire officiel SensioLabs
www.elao.com

5

Plan
• Trend
• Key-value databases
• Document databases
• Graph databases
• Column-oriented databases
6

RDBMS performance
7
Data complexity
Perfo
rman
ce
Relational database
Requirement of application
Salary&list&
Most&Web&apps&
Social&Network&
Loca5on7based&services&
Source @ianSrobinson - @jimwebber from NeoTechnology

complexity = f(size, connectedness, uniformity)
8

Data Size
9
2007 2008 2009 2010 20112012
2013

Data Size
• 500 million page views a day
• ~3TB of new data to store a day
• Posts are about 50GB a day. Follower list updates are about 2.7TB a day.
10

Connectedness
11
Source @ianSrobinson - @jimwebber from NeoTechnology
1990 2010 20202000
web 2.0 “web 3.0”web 1.0
Inform
a(on
)con
nec(vity)
Text)Documents)
Hypertext)
Feeds)
Blogs)
Wikis)
UGC)
Tagging)Folksonomies)
RDFa)
Ontologies)
GGG)

Uniformity
• Semi-‐structured data
• Different data lifecycle
• Store more data about each en7ty
• Individualisa7on & decentraliza7on of content genera7on
12

NoSQLNot Only SQL
13

NoSQL
• Non-‐Rela7onal
• Cluster Friendly
• Schema less
• Distributed architecture
14

ACID & CAP Theorem
ACID
• Atomicity
• Consistency
• Isola7on
• Durability
15
Cap Theorem
• Consistency
• Availability
• Par77on Tolerance

Column 1 : value
Column 2 : value
Column 3 : value
Key
Key
Field 1 : value
Field A : value
Field B : value
Field 2 : valueNode 3
Node 2
Node 4
Node 5
Node 1
Key/Value Column-oriented
Document
Column-oriented
Graph
ValueKey
ValueKey
ValueKey
ValueKey
16

Column 1 : value
Column 2 : value
Column 3 : value
Key
Key
Field 1 : value
Field A : value
Field B : value
Field 2 : valueNode 3
Node 2
Node 4
Node 5
Node 1
Key/Value Column-oriented
Document
Column-oriented
Graph
ValueKey
ValueKey
ValueKey
ValueKey
17

Key-value databases
• Inspired by Amazon’s Dynamo (2007)
• Global collection of key-value
• Big scalable HashMap
18

• Strengths
• Simple data model
• High performance
• Great at scaling out horizontally
• Weaknesses
• Simplistic data model
• Poor for complex data
19
Key-value databases

• Written in C - BSD License - 2009
• Very fast and light-weigth
• All data in memory
• Persistence
• Master/Slave Replication
• Used for caching, session or working queue
20
Key-value databases
http://redis.io/

• Riak
• Memcache (RAM)
• Voldemort
• Amazon DynamoDB (Saas)
• IronCache (Saas)
21
Key-value databases

Column 1 : value
Column 2 : value
Column 3 : value
Key
Key
Field 1 : value
Field A : value
Field B : value
Field 2 : valueNode 3
Node 2
Node 4
Node 5
Node 1
Key/Value Column-oriented
Document
Column-oriented
Graph
ValueKey
ValueKey
ValueKey
ValueKey
22

Document databases
• Inspired by IBM Lotus Notes/Domino
• Idem from Key/Value with value as a document
• A document is a key-value collection
• Flexible schema
• Non-relational, data is de-normalized
23

Document databases • Strengths
• Simple, powerful data model
• Good scaling, Easy/Auto sharding
• Usually “ACID” compliant
• Weaknesses
• Unsuited for interconnected data
• Query model limited to keys (and indexes)
24

Document databases • Written in C++ - License AGPL - 2009
• JSON-style documents
• Full Index Support
• Fast In-Place Updates
• Auto-Sharding
• Replication & High Availability
• A lot of Connector
• Big Community
• Commercial Support
25
http://www.mongodb.org

Document databases
• Lotus Notes / Domino
• CouchDB written in Erlang, Javascript for Query
• OrientDBwritten in Java, relationship as graph
26

Column 1 : value
Column 2 : value
Column 3 : value
Key
Key
Field 1 : value
Field A : value
Field B : value
Field 2 : valueNode 3
Node 2
Node 4
Node 5
Node 1
Key/Value Column-oriented
Document
Column-oriented
Graph
ValueKey
ValueKey
ValueKey
ValueKey
27

Graph databases
• Nodes with properties
• Named relationships with properties
• Focus on the data structure
• Direct pointer to its adjacent element and no indexlookups are necessary
28

Graph databases• Strengths
• Powerful data model
• Fast for connected data
• A new data architecture
• Weaknesses
• No Sharding : All data in one instance
• Using Node/Relation property for Query kill performance
• A new data architecture
29

Graph databases• Java - GPL/Commercial - 2007
• Query language : Cypher / Gremlin
• REST Interface
• Embed Mode
• High Availability ( Master / Slave)
• Commercial Support
30
http://neo4j.org

GraphDB - Products
• Titan
• OrientDB
• InfiniteGraph
• AllegroGraph
31

Column 1 : value
Column 2 : value
Column 3 : value
Key
Key
Field 1 : value
Field A : value
Field B : value
Field 2 : valueNode 3
Node 2
Node 4
Node 5
Node 1
Key/Value Column-oriented
Document
Column-oriented
Graph
ValueKey
ValueKey
ValueKey
ValueKey
32

Column-oriented database
• A big table, with column families
• Data stored by column instead of row
• Build for distributed architecture
• Map-reduce for querying/processing
• Flexible schema
• Easy sharding (partitioning)
33

Column-oriented database• Strengths
• Data model supports semi-structured data
• Naturally indexed (columns)
• Horizontally scalable – RW increase linearly
• Fault tolerant – no single point of failure
• Weaknesses
• Unsuited for interconnected data
34

Column-oriented database• Java - Apache License 2 - 2008
• Developed by Facebook
• Decentralized
• Supports replication and multi data center replication
• Scalability
• Fault-tolerant
• MapReduce support
35
http://cassandra.apache.org/

Column-oriented database
• HBase (Apache)
• HyperTable
• BigTable (Google)
36

Conclusion
• Application architecture impact
• Store your data in the way you want to query it
• Denormalize your data and try to keep them up-to-date !
37

38

Merci