shard-query, an mpp database for the cloud using the lamp stack
Embed Size (px)
DESCRIPTION
This combined #SFMySQL and #SFPHP meetup talked about Shard-Query. You can find the video to accompany this set of slides here: https://www.youtube.com/watch?v=vC3mL_5DfEMTRANSCRIPT
Shard-Query AN MPP DATABASE FOR THE CLOUD USING THE LAMP STACK
Introduction Presenter Justin Swanhart Principal Support Engineer
at Percona Previously a trainer and consultant at Percona too
Developer Swanhart-tools Shard-Query MPP sharding middleware for
MySQL Flexviews Materialized views (fast refresh) for MySQL bcmath
UDF arbitrary precision math for MySQL Intended Audience MySQL
users with data too large to query efficiently using a single
machine Big Data Analytics / OLAP User generated content analysis
People interested in distributed database processing Terms MPP
Massively Parallel Processing An MPP system is a system that can
process a SQL statement in parallel on a single machine or even
many machines A collection of machines is often called a Grid MPP
is also sometimes called Grid Computing MPP (cont) Not many open
source databases (none?) support MPP Community editions of closed
source offerings are limited Some closed source databases include
Vertica, Greenplum, Redshift The Cloud Managed collection of
virtual servers Easy to add servers on demand Ideal for a
federated, distributed database grid Easy to scale up by moving to
a VM with more cores Easy to scale out by adding machines Amazon is
one of the most popular cloud environments LAMP stack Linux Amazon
Linux RHEL Ubuntu LTS, etc. Apache Web Server Most popular web
server on the planet MySQL The worlds most popular open source
database PHP High level language makes development easier Database
Middleware A piece of software that sits between an end-user
application and the database Operates on the queries submitted by
the application, then returns the results to the application
Usually a proxy of some sort MySQL proxy is the open source user
configurable proxy for MySQL Supports Lua scripts which intercept
queries Shard-Query can use MySQL Proxy out of the box Message
Queue / Job Server Accepts jobs or messages and places them in a
queue A worker reads jobs/messages from the queue and acts on them
Offers support for asynchronous jobs Gearman My job server of
choice for PHP Has two different PHP interfaces (pear and pecl) SQ
comes bundled with a modified version* of the pear interface
Excellent integration with MySQL as well (UDF) * Removes warnings
triggered by modern PHP strict mode Sharding It is a short for
Shared Nothing Means splitting up your data onto more than one
machine Tables that are split up are called sharded tables Lookup
tables are not sharded. In other words, they must be duplicated on
all nodes Shard-Query supports directory based or hash based
sharding Shard mapper Shard-Query supports DIRECTORY and HASH
mapping out of the box DIRECTORY based sharding allows you to add
or remove shards from the system, but lookups may go over the
network, reducing performance* compared to HASH mapping HASH based
sharding uses a hash algorithm to balance rows over the sharded
database. However, since a HASH algorithm is used, the number of
database shards can not change after initial data loading. * But
only for queries like select count(*) from table where customer_id
= 50 What is big data Most machine generated data Line order
information for a large organization like Wal-Mart Any data so
large that you cant effectively operate on it on one machine For
example, an important query that needs to run daily executes in
greater than 24 hours. It is impossible to meet the daily goal
unless you can find a way to make the query execute faster. These
kind of problems can happen on relatively small amounts of data
(tens of gigabytes) Analytics(OLAP) versus OLTP OLTP is focused on
short lived small transactions that read or write small amounts of
data OLAP is focused on bulk loading and reading large amounts of
data in a single query. Aggregation queries are OLAP queries
Shard-Query is designed for analytics (OLAP) not OLTP must parse
all commands sent to it (and make multiple round trips) Minium
query time of around 20ms PROBLEM: Single Threaded Queries THE
BIGGEST BOTTLENECK IN ANALYTICAL QUERIES IS THE SPEED OF A SINGLE
CORE Single thread queries in the database MySQL, PostgreSQL,
Firebird and all other major open source databases have single
threaded queries This means that a single query can only ever
utilize the resources of a single core As the data size grows,
analytical queries get slower and slower In memory, as the data
grows the speed decreases because the data is accessed in a single
query As the number of rows to be examined increases, performance
decreases Why single threaded MySQL is optimized for getting small
amounts of data quickly(OLTP) It was created at a time when having
more than one CPU was not common Adding parallelism now is a very
complex task, particularly since MySQL supports multiple storage
engines So adding parallel query is not a high priority (not even
on the roadmap) Designed to run LOTS of small queries
simultaneously, not one big query Single Threading bad for IO If
the data set is significantly larger than memory, single threaded
queries often cause the buffer pool to "churn For example, small
lookup tables can easily be pushed out of the buffer pool,
resulting in frequent IO to look up values While SSD may helps
somewhat, one database thread can not read from an SSD at maximum
device capacity While the disk may be capable of 1000+ MB/sec, a
single thread is generally limited to query($sql); $endtime =
microtime(true); if(!empty($shard_query->errors)) {
if(!empty($shard_query->errors)) { echo "ERRORS RETURNED BY
OPERATION:n"; print_r($shard_query->errors); } }
if(is_resource($stmt) || is_object($stmt)) { $count=0; while($row =
$shard_query->DAL->my_fetch_assoc($stmt)) { print_r($row);
++$count; } echo "$count rows returnedn";
$shard_query->DAL->my_free_result($stmt); } else {
if(!empty($shard_query->info)) print_r($shard_query->info);
echo "no query resultsn"; } echo "Exec time: " . ($endtime -
$stime) . "n"; Simple data access layer comes with Shard-Query
Errors are returned as a member of the object Run the query PHP OO
Apache Web Interface MySQL Proxy Gearman Message Queue Worker
Worker Worker Worker MySQL database shards Shard-Query Architecture
Apache web interface GUI Easy to set up Run queries and get results
Serves as an example of using Shard-Query in a web app with
asynchronous queries Submits queries via Gearman Simple HTTP
authentication PHP OO Apache Web Interface MySQL Proxy Gearman
Message Queue Worker Worker Worker Worker MySQL database shards
Shard-Query Architecture MySQL Proxy Interface LUA script for MySQL
Proxy Supports most SHOW commands Intercepts queries, and sends
them to Shard-Query using the MySQL Gearman UDF Serves as another
example of using Gearman to execute queries. Behaves slightly
differently than MySQL for some commands Query submitted SQL is
parsed Query rewrite for parallelism yields multiple queries
Gearman Jobs (map/combine) Final Aggregation (reduce) Return result
Shard-Query Data Flow Map/reduce like workflow Query submitted SQL
is parsed Query rewrite for parallelism yields multiple queries
Gearman Jobs (map/combine) Final Aggregation (reduce) Return result
Shard-Query Data Flow SQL Parser Find it at
http://github.com/greenlion/php-sql-parser Supports
SELECT/INSERT/UPDATE/DELETE REPLACE RENAME SHOW/SET DROP/CREATE
INDEX/CREATE TABLE EXPLAIN/DESCRIBE Used by SugarCRM too, as well
as other open source projects. Query submitted SQL is parsed Query
rewrite for parallelism yields multiple queries Gearman Jobs
(map/combine) Final Aggregation (reduce) Return result Shard-Query
Data Flow Query Rewrite for parallelism Shard-Query has to
manipulate the SQL statement so that it can be executed over more
than on partition or machine COUNT() turns into SUM of COUNTs from
each query AVG turns into SUM and COUNT SEMI-JOIN is turned into a
materialized join STDDEV/VARIANCE are rewritten as well use the sum
of squares method Push down LIMIT when possible Query Rewrite for
parallelism (cont) Because lookup tables are duplicated on all
shards, the query executes in a shared-nothing way All joins,
filtering and aggregation are pushed down Mean very little data
must flow between nodes in most cases High performance Meets or
beats Amazon Redshift in testing at 200GB of data Query submitted
SQL is parsed Query rewrite for parallelism yields multiple queries
Gearman Jobs (map/combine) Final Aggregation (reduce) Return result
Shard-Query Data Flow Map/Combine The store_resultset gearman
worker runs SQL and stores the result in a table To keep the number
of rows in the table (and the time it takes to aggregate results in
the end) small, an INSERT ON DUPLICATE KEY UPDATE (ODKU) statement
is used when inserting the rows There is a UNIQUE KEY over the
GROUP BY attributes to facilitate the upsert Query submitted SQL is
parsed Query rewrite for parallelism yields multiple queries
Gearman Jobs (map/combine) Final Aggregation (reduce) Return result
Shard-Query Data Flow Final aggregation Shard-Query has to return a
proper result, combining the results in the result table together
to return the correct answer Again, for example COUNT must be
rewritten as SUM to combine all the counts (from each shard) in the
result table Aggregated result is returned to the client
Shard-Query Flow as SQL [[email protected] bin]$ ./run_query
--verbose select count(*) from lineorder; Shard-Query optimizer
messages: SQL TO SEND TO SHARDS: Array ( [0] => SELECT COUNT(*)
AS expr_2913896658 FROM lineorder PARTITION(p0) AS `lineorder`
WHERE 1=1 [1] => SELECT COUNT(*) AS expr_2913896658 FROM
lineorder PARTITION(p1) AS `lineorder` WHERE 1=1 [2] => SELECT
COUNT(*) AS expr_2913896658 FROM lineorder PARTITION(p2) AS
`lineorder` WHERE 1=1 [3] => SELECT COUNT(*) AS expr_2913896658
FROM lineorder PARTITION(p3) AS `lineorder` WHERE 1=1 ) SQL TO SEND
TO COORDINATOR NODE: SELECT SUM(expr_2913896658) AS ` count ` FROM
`aggregation_tmp_58392079` Array ( [count ] => 0 ) 1 rows
returned Exec time: 0.03083610534668 Initial query Query rewrite /
map Final aggregation / reduce Final result Map/Combine example
select LO_OrderDateKey, count(*) from lineorder group by
LO_OrderDateKey; Shard-Query optimizer messages: * The following
projections may be selected for a UNIQUE CHECK on the storage node
operation: expr$0 * storage node result set merge optimization
enabled: ON DUPLICATE KEY UPDATE expr_2445085448=expr_2445085448 +
VALUES(expr_2445085448) SQL TO SEND TO SHARDS: Array ( [0] =>
SELECT LO_OrderDateKey AS expr$0,COUNT(*) AS expr_2445085448 FROM
lineorder PARTITION(p0) AS `lineorder` WHERE 1=1 GROUP BY expr$0
[1] => SELECT LO_OrderDateKey AS expr$0,COUNT(*) AS
expr_2445085448 FROM lineorder PARTITION(p1) AS `lineorder` WHERE
1=1 GROUP BY expr$0 [2] => SELECT LO_OrderDateKey AS
expr$0,COUNT(*) AS expr_2445085448 FROM lineorder PARTITION(p2) AS
`lineorder` WHERE 1=1 GROUP BY expr$0 [3] => SELECT
LO_OrderDateKey AS expr$0,COUNT(*) AS expr_2445085448 FROM
lineorder PARTITION(p3) AS `lineorder` WHERE 1=1 GROUP BY expr$0 )
SQL TO SEND TO COORDINATOR NODE: SELECT expr$0 AS
`LO_OrderDateKey`,SUM(expr_2445085448) AS ` count ` FROM
`aggregation_tmp_12033903` GROUP BY expr$0 combine reduce Use cases
Machine generated data Sensor readings Metrics Logs Any large table
with short lookup tables Star schema are ideal Call detail records
Shard-Query is used in the billing system of a large cellular
provider CDRs generate a lot of data Shard-Query includes a fast
PERCENTILE function Green energy meter processing High volume of
data means sharding is necessary With Shard-Query, reporting is
possible over all the shards, making queries possible that would
not work with Fabric or other sharding solutions Used in India for
reporting on a green power grid Log analysis Performance logs from
a web application for example Aggregate many different statistics
and shard if log volumes are high enough Search text logs with
regular expressions Performance Star Schema Benchmark SF 20 119
million rows of data (12GB) Infobright Community Database Only 1st
query from each flight selected Unsharded compared to four shards
(box has 4 cpu - Amazon m1.xlarge) COLD MySQL 35.39s Shard-Query
11.62s HOT MySQL 10.99s Shard-Query 2.95s Query 1 select
sum(lo_extendedprice*lo_discount) as revenue from lineorder join
dim_date on lo_orderdatekey = d_datekey where d_year = 1993 and
lo_discount between 1 and 3 and lo_quantity < 25; COLD MySQL
34.24s Shard-Query 12.74s HOT MySQL 12.74s Shard-Query 3.26s Query
2 select sum(lo_revenue), d_year, p_brand from lineorder join
dim_date on lo_orderdatekey = d_datekey join part on lo_partkey =
p_partkey join supplier on lo_suppkey = s_suppkey where p_category
= 'MFGR#12' and s_region = 'AMERICA' group by d_year, p_brand order
by d_year, p_brand; COLD MySQL 27.29s Shard-Query 7.97s HOT MySQL
18.89 Shard-Query 5.06s Query 3 select c_nation, s_nation, d_year,
sum(lo_revenue) as revenue from customer join lineorder on
lo_custkey = c_customerkey join supplier on lo_suppkey = s_suppkey
join dim_date on lo_orderdatekey = d_datekey where c_region =
'ASIA' and s_region = 'ASIA' and d_year >= 1992 and d_year