the matrix and datastax
DESCRIPTION
By: Hayato ShimizuTRANSCRIPT
What%is%Cassandra?%• Apache Cassandra™ is a massively
scalable NoSQL database.
• Cassandra is designed to handle big data workloads across multiple data centers with no single point of failure, providing enterprises with continuous availability without compromising performance.
Why%Cassandra%• Fast / Linear scalability • Elastic • No single point of failure • Very little moving parts • Enterprise / multi-data center / cloud data distribution • Location independence – read and write anywhere • Dynamic / Flexible data structure • Tunable data consistency (per operation) • Data compression • Cloud ready • Familiar SQL-Like language – CQL • Easy setup • No special hardware needed • No special caching layer needed
4% 2%
3%
1%
data1%
data1%
data2%
data2%
Network%Topology%
Data%Consistency%
• Any%• One%• Quorum%• Local_Quorum%• Each_Quorum%• All%
Writes%• One%• Quorum%• Local_Quorum%• Each_Quorum%• All%
Reads%
Durable%Writes%
INSERT%INTO…%
Commit&log& memtable&
SSTable&
Data%Structure%Keyspace:&Matrix&&&&&&replica7on_factor:&3&
Column%Family:%character_locaLons%
day1% morphius:<7meuuid>:&coordinates%neo:<7meuuid>:&coordinates%
day1:neo% <7meuuid>:&coordinates%
day1:morph% <7meuuid>:&coordinates% <7meuuid>:&coordinates%
<7meuuid>:&coordinates%
Column%Family:%character_informaLon%
neo% DOB:&2600H06H27%Actor:&Keanu&Reeves% email1:&Neo@matrix%
email2:&[email protected]%
Overview%of%DataStax%• Founded in April 2010 • Commercial leader in Apache Cassandra™ • 300+ customers (including 20 of the Fortune 100) • 100+ employees • Home to Apache Cassandra Chair & most
committers • Headquartered in San Mateo • Funded by prominent venture firms
DataStax%Enterprise%Architecture%
DataStax%Cassandra%
• Kerberos%authenLcaLon%• Encrypted%data%at%rest%• AudiLng%• iSECpartners%validated%
<schema%name="wikipedia"%version="1.1">%%<types>%%%<fieldType%name="string"%class="solr.StrField"/>%%%<fieldType%name="text"%class="solr.TextField">%%%%%<analyzer><tokenizer%class="solr.WikipediaTokenizerFactory"/></analyzer>%%%</fieldType>%%</types>%%<fields>%%%%%<field%name="id"%%type="string"%indexed="true"%%stored="true"/>%%%%%<field%name="name"%%type="text"%indexed="true"%%stored="true"/>%%%%%<field%name="body"%%type="text"%indexed="true"%%stored="true"/>%%%%%<field%name="Ltle"%%type="text"%indexed="true"%%stored="true"/>%%%%%<field%name="date"%%type="string"%indexed="true"%%stored="true"/>%%</fields>%%<defaultSearchField>body</defaultSearchField>%%<uniqueKey>id</uniqueKey>%
Searching%Data%
HTTP&
curl%"hZp://localhost:8983/solr/wiki.solr/select?\%q=Ltle%3AnaLo%2A%20AND%20Ltle%3A%5B2000%20TO%202010%5D"%%&
&
CQL3&
use%wiki;%select%Ltle%from%solr%where%solr_query='Ltle:naLo*%AND%Ltle:[2000%TO%2010]';%%%
Workload%IsolaLon%
Solr%
C*%
C*%
C*%
C*%
Solr%
Solr%
Solr%
Solr%Queries%
Cassandra%Queries%
Hive%
• {LEFT|RIGHT|FULL}%[OUTER]%JOIN%• GROUP%BY%• {SORT|DISTRIBUTE|CLUSTER|ORDER}%BY%• UNION%• Sub%Queries%%
Hive%p>%Cassandra%Example%
DROP%TABLE%IF%EXISTS%StockHist;%CREATE%EXTERNAL%TABLE%StockHist(row_key%string,%column_name%string,%value%double)%STORED%BY%'org.apache.hadoop.hive.cassandra.Cassand%raStorageHandler’%WITH%SERDEPROPERTIES%("cassandra.ks.name"%=%"PorvolioDemo",%%"cassandra.cf.validatorType"%=%"UTF8Type,UTF8Type,DoubleType"%);%%%
Pig%cassandra_data%=%LOAD%'cassandra://<keyspace>/<CF>'%%USING%CassandraStorage()%AS%(name,%columns:%bag%{T:%tuple(score,%value)});%%total_scores%=%FOREACH%cassandra_data%GENERATE%name,%COUNT(columns.score),%LongSum(columns.score)%as%total%PARALLEL%3;%%ordered_scores%=%ORDER%total_scores%BY%total%DESC%PARALLEL%3;%%STORE%ordered_scores%INTO%'cfs:///final_scores.txt'%USING%PigStorage();%
Workload%IsolaLon%
H*%
C*%
C*%
C*%
C*%
H*%
Solr%
Solr%
Solr%Queries%
Cassandra%Queries%
Hadoop%AnalyLcs%
Cassandra%Roadmap%
%