prof. dr. stefan edlich €¦ · oracle nosql database conshash config acid no single pf datac...
TRANSCRIPT
Prof. Dr. Stefan Edlich http://nosql-database.org
2011
The NoSQL
Year!
2011
The NoSQL
Year!
1. HTML5
2. MongoDB
3. iOS
4. Android
5. Mobile app
6. Puppet
7. Hadoop
8. jQuery
9. PaaS
10. Social Media
CouchDB and
Membase
merger!
1 year ago!
CouchDB and
Membase
merger!
1 year ago!
+
CouchDBCouchDB MembaseMembase
= ??
Roadmap?
No Apache ���� git trouble and more
Less Erlang ���� Code more C / C++
No CouchDB ���� CouchBase Server
����
Damien is leaving CouchDB
Community has to take care of it
no upward compatibility
“And I'm dead serious about
making it the easiest, fastest
and most reliable NoSQL
database. Easy for developers
to use, easy to deploy,
reliable on single machines or
large clusters, and fast as
hell.”
UnQL
a successful standard?
unstructured
Damien Katz & Richard Hipp (SQLite)
Richard Hipp (SQLite):
“Damien Katz intends to provide
an UnQL interface to CouchDB in
the near future, yes.”
jaql !
query
dest
Language for JavaScript Object Notation
source
Oracle ?
MS-SQL ?
IBM DB2 ?
Sybase ?
Ted Neward:“Well, the buzz certainly grew, and it surprised
me that the big storage guys (Microsoft, IBM, Oracle)
didn't do more to address it; I was expecting features
to emerge in their database products to address some
of the features present in MongoDB or CouchDB or some
of the others, such as "schemaless" or map/reduce-style
queries. Even just incorporating JavaScript into the engine
somewhere would've generated a reaction.”
“The NoSQL databases are beginning
to feel like an ice cream store that
entices you with a new flavor of the
month,” the white paper read. “[But]
you shouldn’t get too attached to any
of the flavors because it may not be
around for too long.”
white paper:
„debunking the (NoSQL) hype“
summer 2011
Oracle NoSQL Database
ConsHash
config ACID
no single PF
DataC Replication
Top Admin
“BerkleyDB reloaded”
Hadoop + Manager
+=
user defined functions in C++ & Java
����10x faster then SQL or Stored Procs
UDF connector for Hadoop ���� ☺☺☺☺
C++ APIs for Map Reduce ���� ☺☺☺☺
Greenplum, Pervasive
and 100 others too…
Storage configurable
• round robin automatic loadbalancing
• replicas
• gateway
- performance
SSD? RAM+DataCenter
+ scale + configure
Attacking:
Mongo & Riak & Cassandra
bad things too?
NoSQL = No Security?
less sensitive info?
Key Bruteforce
Array injection/login.php?username=admin&password[$ne]=1
View injection
REST injection
JSON injectiondb.foo.find({$or : {a:1},{b:2},{c:/.*/})
http attacks for listeners
wrong cache proxy configs
thrift avro security
2007 SIGOPS
• 15 years of experience from Dynamo, SimpleDB and S3
• ultra scalable and reliable
• uses SSD (!)
• fully managed & no maintanance window!
• mutiple syncronous availability zone replication = durability
• provisioned throughput configurable per table
• no fixed schema, any number of attributes & multi value attributes
• consistency and performance tradeoffs possible
• conditional writes & atomic counters
• index: simple hash or composite hash + key/range
• define a table => make a rw capacity reservation
• backup & restore (tables) into S3
• Cloud Watch & Alarms
• 40 million of requests per month free
2ms read 6-8ms
1 $ / Gbmonth
0.01 $ per 10 writes / hours
0.01 $ per 50 read / sec up to 1KB
Eventually Consistent = doubles the
read amount
{ Id = 101ProductName = „NoSQL Book„ISBN = "978-3446427532„Authors = [ "Author 1", "Author 2" ]Price = -42Dimensions = "8.5 x 11.0 x 0.5„PageCount = 500InPublication = 1ProductCategory = "Book"
}
{ Id = 101ProductName = „NoSQL Book„ISBN = "978-3446427532„Authors = [ "Author 1", "Author 2" ]Price = -42Dimensions = "8.5 x 11.0 x 0.5„PageCount = 500InPublication = 1ProductCategory = "Book"
}
db x tables x items x attribuesdb x tables x items x attribues
uses JSON as serialized transport format!
REST APITable � create,describe,list,updateData � put(create/update),get,update,delete,query,scan,batch
// This header is abbreviated.// For a sample of a complete header, see link.POST / HTTP/1.1x-amz-target: DynamoDB_20111205.PutItemcontent-type: application/x-amz-json-1.0
{"TableName":"Table1 ","Item ":{ "AttributeName1 ":{"AttributeValue1 ":"S"},
"AttributeName2 ":{"AttributeValue2 ":"N"},},"Expected":{"AttributeName3 ":{"Value ": {"S":"AttributeValue "},{"Exists":Boolean}},"ReturnValues":"ReturnValuesConstant"}
HTTP/1.1 200x-amzn-RequestId: 8966d095-71e9-11e0-a498-71d736f27375content-type: application/x-amz-json-1.0content-length: 85
{"Attributes":{"AttributeName3":{"S":"AttributeValue3"},"AttributeName2":{"SS":"AttributeValue2"},"AttributeName1":{"SS":"AttributeValue1"},},
"ConsumedCapacityUnits":1 }
AWS SDK for Java, .NET, PHP
// Java getprivate static void getBook(String id, String tableName) {
GetItemRequest getItemRequest = new GetItemRequest().withTableName(tableName).withKey(new Key().withHashKeyElement(new Attribute Value().withN(id)).withAttributesToGet(Arrays.asList("Id", "ISBN", "T itle", "Authors"));
GetItemResult result = client.getItem(getItemRequest) ;
System.out.println("Printing item after retrieving it....");printItem(result.getItem());
}
64 KB Data Limit
string + int
multi value string-ints
multiKV, references,
schemachecks, …
API ���� DSLs ☺☺☺☺
64 KB Data Limit
string + int
multi value string-ints
multiKV, references,
schemachecks, …
API ���� DSLs ☺☺☺☺
Here are the six urban myths that Mr. Stonebraker
says NoSQL advocates incorrectly perpetuate:
• Myth #1: SQL is too slow,
so use a lower level interface
• Myth #2: I like a K-V interface, so SQL
is a non-starter
• Myth #3: SQL systems don’t scale
• Myth #4: There are no open source,
scalable SQL engines
• Myth #5: ACID is too slow, so avoid using it
• Myth #6: in CAP, choose AP over CA
strikes back
© 451 Group Report / 5.4.2011
Overview
Java Stored Procedures!
RAM with 100.000 ops/sNode
“VoltDB claims to be 100 times
faster than MySQL, up to 13 times
faster than Cassandra, and 45 times
faster than Oracle, with near-linear
scaling.” (highscalability blog)
ACID with partitioned tables
Nearly SQL 99 and ALTER &DROP
schema changes require Shutdown
static query parametrization
Quelle: Pecond MySQL Performance Blog
SSD optimized and disk
C’t: 10-100 TB ok then weaker
10x faster
scaling across cores
random access read pattern
QPS on SSD
84.42614.763
5,5 x
faster
− memcached API more soon
− no structured data
− horizontal scaling for nodes
� "terabytes of data, billions of objects, and 200K plus
transactions per second per node, with sub-millisecond latency."
� e.g. real-time bidding
� transactions / ACID
� linear & elastic horizontal scalable
� flash/SSD support
RTARTARTARTATMTMTMTM
� data expiration
� append list
� API: C, C#, Java, Ruby, Python & PHP
� no master node
� 200k Ops/secNode read 50k Ops/secNode write
Check hybrid solutions!
easier & better then memcache + RDBMS
Problem: privilege checks, cach queries, connection pooling / thread creation,
parsing SQL, open, lock, exec plans, concurrency control, unlock, close, …
© fromdual.com
QUELLE: YOSHINORI MATSUNOBU
keep tables open & simple protocol
Performance
Transactions
Concurrent Access
No Cache / Crash-Safe
no SQL but more then
K/V: ranges, LIMIT, CRUD, multi_get,…
no Security
new API
© percona.com
200
100
Conclusion #1
Conclusion #2
There is no
“one perfect solution”
Check hybrid solutions
and NewSQL DBs too!
© geekandpoke.com