ciel, mes données ne sont plus relationnelles

Post on 21-Jun-2015

158 Views

Category:

Engineering

0 Downloads

Preview:

Click to see full reader

DESCRIPTION

Quand la gestion des données de nos applications web dépasse la simple persistance dans une base de données relationnelle (type SGBD), l’utilisation de technologies alternatives dites « NoSql » est nécessaire. Nous aborderons les 4 grandes familles de NoSql (Key/Value, Document, Column-oriented et Graph) ainsi que leur intégration dans des applications PHP.

TRANSCRIPT

Ciel ! Mes données ne sont plus relationnelles

BLEND WEB MIX 01 Octobre 2013

1

Xavier Gorse

2

@xgorse

3

Association Française des Utilisateurs de PHP

• Crée en 2001• Forum PHP ( 21 & 22 Novembre 2013 à Paris)• AperoPHP et Rendez Vous• Antennes Locale• Président en 2009 www.afup.org

• Initié en 2010 par Hugo Hamon• Pas encore une vraie association• Sfpot mensuel avec conférence suivie d’un apéro• Antenne à Marseille, Lyon ??

Association Francophone des utilisateurs de SYmfony

www.afsy.fr

4

Elao

• Fondateur en 2005

• Lyon & Paris

• Agence Web Technique de 15 personnes

• Symfony depuis 2006

• Partenaire officiel SensioLabs

www.elao.com

5

Plan

• Trend

• Key-value databases

• Document databases

• Graph databases

• Column-oriented databases

6

RDBMS performance

7

Data complexity

Perfo

rman

ce

Relational database

Requirement of application

Salary&list&

Most&Web&apps&

Social&Network&

Loca5on7based&services&

Source @ianSrobinson - @jimwebber from NeoTechnology

complexity = f(size, connectedness, uniformity)

8

Data Size

9

2007 2008 2009 2010 20112012

2013

Data Size

• 500 million page views a day

• ~3TB of new data to store a day

• Posts are about 50GB a day. Follower list updates are about 2.7TB a day.

10

Connectedness

11

Source @ianSrobinson - @jimwebber from NeoTechnology

1990 2010 20202000

web 2.0 “web 3.0”web 1.0

Inform

a(on

)con

nec(vity)

Text)Documents)

Hypertext)

Feeds)

Blogs)

Wikis)

UGC)

Tagging)Folksonomies)

RDFa)

Ontologies)

GGG)

Uniformity

• Semi-­‐structured  data

• Different  data  lifecycle

• Store  more  data  about  each  en7ty

• Individualisa7on    &  decentraliza7on  of  content  genera7on

12

NoSQLNot Only SQL

13

NoSQL

• Non-­‐Rela7onal

• Cluster  Friendly

• Schema  less

• Distributed  architecture

14

ACID & CAP Theorem

ACID

• Atomicity

• Consistency

• Isola7on

• Durability

15

Cap  Theorem

• Consistency

• Availability

• Par77on  Tolerance

Column 1 : value

Column 2 : value

Column 3 : value

Key

Key

Field 1 : value

Field A : value

Field B : value

Field 2 : valueNode 3

Node 2

Node 4

Node 5

Node 1

Key/Value Column-oriented

Document

Column-oriented

Graph

ValueKey

ValueKey

ValueKey

ValueKey

16

Column 1 : value

Column 2 : value

Column 3 : value

Key

Key

Field 1 : value

Field A : value

Field B : value

Field 2 : valueNode 3

Node 2

Node 4

Node 5

Node 1

Key/Value Column-oriented

Document

Column-oriented

Graph

ValueKey

ValueKey

ValueKey

ValueKey

17

Key-value databases

• Inspired by Amazon’s Dynamo (2007)

• Global collection of key-value

• Big scalable HashMap

18

• Strengths

• Simple data model

• High performance

• Great at scaling out horizontally

• Weaknesses

• Simplistic data model

• Poor for complex data

19

Key-value databases

• Written in C - BSD License - 2009

• Very fast and light-weigth

• All data in memory

• Persistence

• Master/Slave Replication

• Used for caching, session or working queue

20

Key-value databases

http://redis.io/

• Riak

• Memcache (RAM)

• Voldemort

• Amazon DynamoDB (Saas)

• IronCache (Saas)

21

Key-value databases

Column 1 : value

Column 2 : value

Column 3 : value

Key

Key

Field 1 : value

Field A : value

Field B : value

Field 2 : valueNode 3

Node 2

Node 4

Node 5

Node 1

Key/Value Column-oriented

Document

Column-oriented

Graph

ValueKey

ValueKey

ValueKey

ValueKey

22

Document databases

• Inspired by IBM Lotus Notes/Domino

• Idem from Key/Value with value as a document

• A document is a key-value collection

• Flexible schema

• Non-relational, data is de-normalized

23

Document databases • Strengths

• Simple, powerful data model

• Good scaling, Easy/Auto sharding

• Usually “ACID” compliant

• Weaknesses

• Unsuited for interconnected data

• Query model limited to keys (and indexes)  

24

Document databases • Written in C++ - License AGPL - 2009

• JSON-style documents

• Full Index Support

• Fast In-Place Updates

• Auto-Sharding

• Replication & High Availability

• A lot of Connector

• Big Community

• Commercial Support

25

http://www.mongodb.org

Document databases

• Lotus Notes / Domino

• CouchDB written in Erlang, Javascript for Query

• OrientDBwritten in Java, relationship as graph

26

Column 1 : value

Column 2 : value

Column 3 : value

Key

Key

Field 1 : value

Field A : value

Field B : value

Field 2 : valueNode 3

Node 2

Node 4

Node 5

Node 1

Key/Value Column-oriented

Document

Column-oriented

Graph

ValueKey

ValueKey

ValueKey

ValueKey

27

Graph databases

• Nodes with properties

• Named relationships with properties

• Focus on the data structure

• Direct pointer to its adjacent element and no indexlookups are necessary

28

Graph databases• Strengths

• Powerful data model

• Fast for connected data

• A new data architecture

• Weaknesses

• No Sharding : All data in one instance

• Using Node/Relation property for Query kill performance

• A new data architecture

29

Graph databases• Java - GPL/Commercial - 2007

• Query language : Cypher / Gremlin

• REST Interface

• Embed Mode

• High Availability ( Master / Slave)

• Commercial Support

30

http://neo4j.org

GraphDB - Products

• Titan

• OrientDB

• InfiniteGraph

• AllegroGraph

31

Column 1 : value

Column 2 : value

Column 3 : value

Key

Key

Field 1 : value

Field A : value

Field B : value

Field 2 : valueNode 3

Node 2

Node 4

Node 5

Node 1

Key/Value Column-oriented

Document

Column-oriented

Graph

ValueKey

ValueKey

ValueKey

ValueKey

32

Column-oriented database

• A big table, with column families

• Data stored by column instead of row

• Build for distributed architecture

• Map-reduce for querying/processing

• Flexible schema

• Easy sharding (partitioning)

33

Column-oriented database• Strengths

• Data model supports semi-structured data

• Naturally indexed (columns)

• Horizontally scalable – RW increase linearly

• Fault tolerant – no single point of failure

• Weaknesses

• Unsuited for interconnected data

34

Column-oriented database• Java - Apache License 2 - 2008

• Developed by Facebook

• Decentralized

• Supports replication and multi data center replication

• Scalability

• Fault-tolerant

• MapReduce support

35

http://cassandra.apache.org/

Column-oriented database

• HBase (Apache)

• HyperTable

• BigTable (Google)

36

Conclusion

• Application architecture impact

• Store your data in the way you want to query it

• Denormalize your data and try to keep them up-to-date !

37

38

Merci

top related