using q4m - a message queue for mysql #osdc.tw
DESCRIPTION
TRANSCRIPT
Using Q4Ma message queue storage engine for
MySQL
Kazuho Oku
Who am I?
Name: Kazuho Oku ( 奥 一穂 )Original Developer of Palmscape /
XiinoThe oldest web browser for Palm OS
Worked Cybozu Labs in 2005-2010Research subsidiary of Cybozu, Inc. in JapanDeveloped Japanize / Mylingual, Q4M, etc.
Now working at DeNA Co., Ltd.with the developers of the HandlerSocket
pluginMar 26 2011 Using Q4M 2
What is Q4M?
A message queueruns as a storage plugin of MySQL 5.1
Why is it a MySQL plugin?accessible by using existing MySQL clients
no need for a new client libraryadministrable by using SQL
friendly to DB admins
Mar 26 2011 Using Q4M 3
Design Goals of Q4M
RobustDoes not lose data on OS crash or power
failurenecessary for Tokyo wo. nuclear power plants…
orz
FastTransfer thousands of messages per second
Easy to UseUse SQL for access / maintenanceIntegration into MySQL
no more separate daemons to take care ofMar 26 2011 Using Q4M 4
Users of Q4M
Many leading web services in JapanDeNA Co., Ltd.livedoor Co., Ltd.mixi, Inc.Zynga Japan (formerly Unoh, Inc.)
Mar 26 2011 Using Q4M 5
Agenda
What MQ is in GeneralApplications of Q4MBrief Tutorial
Mar 26 2011 Using Q4M 6
Mar 26 2011 Using Q4M 7
What is a Message Queue?
Mar 26 2011 Using Q4M 8
What is a Message Queue?
Middleware for persistent asynchronous communicationcommunicate between fixed pairs (parties)a.k.a. Message Oriented Middleware
MQ is intermediate storageRDBMS is persistent storage
Senders / receivers may go down
Mar 26 2011 Using Q4M 9
Minimal Configuration of a MQ
Senders and receivers access a single queue
Sender Receiver
Queue
Mar 26 2011 Using Q4M 10
MQ and Relays
Separate queue for sender and receiver
Messages relayed between queues
Sender
Queue
Receiver
Queue
Relay
Mar 26 2011 Using Q4M 11
Merits of Message Relays
Destination can be changed easilyRelays may transfer messages to different
locations depending on their headers
Robustness against network failureno loss or duplicates when the relay fails
Logging and Multicasting, etc.
Mar 26 2011 Using Q4M 12
Message Brokers
Publish / subscribe modelSeparation between components and their
integrationComponents read / write to predefined
queuesIntegration is definition of routing rules
between the message queuesMessages are often transformed (filtered)
within the relay agent
Mar 26 2011 Using Q4M 13
What about Q4M?
Q4M itself is a message queueCan connect Q4M instances to
create a message relayProvides API for creating message
relays and brokers
Performance of Q4M
over 7,000 mess/sec.message size: avg. 512 bytessyncing to disk
Outperforming most needsif you need more, just scale outCan coexist with other storage engines
without sacrificing their performance
see http://labs.cybozu.co.jp/blog/kazuhoatwork/2008/06/q4m_06_release_and_benchmarks.phpMar 26 2011 Using Q4M 14
Mar 26 2011 Using Q4M 15
Applications of Q4M
Asynchronous Updates
DeNA uses Q4M for sending notifications to users asynchronouslyhttp://engineer.dena.jp/2010/03/dena-technical-seminar-1-2.html
Mar 26 2011 Using Q4M 16
Mar 26 2011 Using Q4M 17
Delay Peak Demands
Mixi (Japan's one of the largest SNS) uses Q4M to buffer writes to DB, to delay peak demands
from http://alpha.mixi.co.jp/blog/?p=272
Mar 26 2011 Using Q4M 18
Connecting Distant Servers
Pathtraq uses Q4M to create a relay between its database and content analysis processes
PathtraqDB
ContentAnalysis
ProcessesMySQL conn.over SSL,gzip
→ Contents to be analyzed →
← Results of the analysis ←
Prefetch Data
livedoor Reader (web-based feed aggregator) uses Q4M to prefetch data from database to memcached
uses Q4M for scheduling web crawlers as well
from http://d.hatena.ne.jp/mala/20081212/1229074359
Mar 26 2011 Using Q4M 19
Mar 26 2011 Using Q4M 20
Scheduling Web Crawlers
Web crawlers with retry-on-errorSample code included in Q4M dist.
URLDB
Request Queue
Spiders
Retry Queue
Re-scheduler
Store Result
Read URL
If failed to fetch, store URL in retry queue
Delayed Content Generation
Hatetter (RSS feed-to-twitter-API gateway) uses Q4M to delay content generationSource code:
github.com/yappo/website-hatetter
Mar 26 2011 Using Q4M 21
Installing Q4M
Mar 26 2011 Using Q4M 22
Installing Q4M
Compatible with MySQL 5.1Download from q4m.github.com
Binary releases available for some platforms
Installing from source:requires source code of MySQL./configure && make && make installrun support-files/install.sql
Mar 26 2011 Using Q4M 23
Mar 26 2011 Using Q4M 24
Configuration Options of Q4M
--with-sync=no|fsync|fdatasync|fcntlControls synchronization to diskdefault: fdatasync on linux
--enable-mmapMmap’ed reads lead to higher throughputdefault: yes
--with-delete=pwrite|msyncmsync recommended on linux>=2.6.20 if
you need really high performance
Mar 26 2011 Using Q4M 25
Q4M Basics
The Model
Mar 26 2011 Using Q4M 26
Q4M table
Subscribers
Publisher
Publisher
Publisher
Various publishers write to queueSet of subscribers consume the entries in
queue
Creating a Q4M Table
ENGINE=QUEUE creates a Q4M table
No primary keys or indexes
Sorted by insertion order (it’s a queue)
Mar 26 2011 Using Q4M 27
mysql> CREATE TABLE qt ( -> id int(10) unsigned NOT NULL, -> message varchar(255) NOT NULL -> ) ENGINE=QUEUE;Query OK, 0 rows affected (0.42 sec)
Modifying Data on a Q4M Table
No restrictions for INSERT and DELETE
No support for UPDATE
Mar 26 2011 Using Q4M 28
mysql> INSERT INTO qt (id,message) -> VALUES -> (1,'Hello'), -> (2,'Bonjour'), -> (3,'Hola');Query OK, 3 rows affected (0.02 sec)
mysql> SELECT * FROM qt;+----+---------+| id | message |+----+---------+| 1 | Hello | | 2 | Bonjour | | 3 | Hola | +----+---------+3 rows in set (0.00 sec)
SELECT from a Q4M Table
Works the same as other storage engines
SELECT COUNT(*) is cached
Mar 26 2011 Using Q4M 29
mysql> SELECT * FROM qt;+----+---------+| id | message |+----+---------+| 1 | Hello | | 2 | Bonjour | | 3 | Hola | +----+---------+3 rows in set (0.00 sec)
mysql> SELECT COUNT(*) FROM qt;+----------+| COUNT(*) |+----------+| 3 | +----------+1 row in set (0.00 sec)
How to subscribe to a queue?
Calling queue_wait()
After calling, only one row becomes visible from the connection
Mar 26 2011 Using Q4M 30
mysql> SELECT * FROM qt;+----+---------+| id | message |+----+---------+| 1 | Hello | | 2 | Bonjour | | 3 | Hola | +----+---------+3 rows in set (0.00 sec)
mysql> SELECT queue_wait('qt');+------------------+| queue_wait('qt') |+------------------+| 1 | +------------------+1 row in set (0.00 sec)
mysql> SELECT * FROM qt;+----+---------+| id | message |+----+---------+| 1 | Hello | +----+---------+1 row in set (0.00 sec)
OWNER Mode and NON-OWNER Mode
In OWNER mode, only the OWNED row is visible
OWNED row becomes invisible from other connections
rows of other storage engines are visible
Mar 26 2011 Using Q4M 31
NON-OWNER Mode
1,'Hello'2,'Bonjour'3,'Hola'
OWNER Mode
1,'Hello'
queue_wait()
queue_end()queue_abort()
Returning to NON-OWNER mode
By calling queue_abort, the connection returns to NON-OWNER mode
Mar 26 2011 Using Q4M 32
mysql> SELECT QUEUE_ABORT();+---------------+| QUEUE_ABORT() |+---------------+| 1 | +---------------+1 row in set (0.00 sec)
mysql> SELECT * FROM qt;+----+---------+| id | message |+----+---------+| 1 | Hello | | 2 | Bonjour | | 3 | Hola | +----+---------+3 rows in set (0.01 sec)
Consuming a Row
By calling queue_end, the OWNED row is deleted, and connection returns to NON-OWNER mode
Mar 26 2011 Using Q4M 33
mysql> SELECT queue_wait('qt');(snip)mysql> SELECT * FROM qt;+----+---------+| id | message |+----+---------+| 1 | Hello | +----+---------+1 row in set (0.01 sec)
mysql> SELECT queue_end();+-------------+| queue_end() |+-------------+| 1 | +-------------+1 row in set (0.01 sec)
mysql> SELECT * FROM qt;+----+---------+| id | message |+----+---------+| 2 | Bonjour | | 3 | Hola | +----+---------+2 rows in set (0.00 sec)
Writing a Subscriber
Call two functions: queue_wait, queue_end Multiple subscribers can be run concurrently
each row in the queue is consumed only once
while (true) { SELECT queue_wait('qt'); # switch to owner mode rows := SELECT * FROM qt; # obtain data if (count(rows) != 0) # if we have any data, then handle_row(rows[0]); # consume the row SELECT queue_end(); # erase the row from queue}
Mar 26 2011 Using Q4M 34
Writing a Subscriber (cont'd)
Or call queue_wait as a conditionWarning: conflicts with trigger-based
insertions
while (true) { rows := SELECT * FROM qt WHERE queue_wait('qt'); if (count(rows) != 0) handle_row(rows[0]); SELECT queue_end();}
Mar 26 2011 Using Q4M 35
The Model – with code
Mar 26 2011 Using Q4M 36
while (true) { rows := SELECT * FROM qt WHERE queue_wait('qt'); if (count(rows) != 0) handle_row(rows[0]); SELECT queue_end();}
Q4M table
Subscribers
INSERT INTO queue ...
INSERT INTO queue ...
INSERT INTO queue ...
Publisher
Publisher
Publisher
Three Functions in Detail
Mar 26 2011 Using Q4M 37
Mar 26 2011 Using Q4M 38
queue_wait(table)
Enters OWNER mode0 〜 1 row becomes OWNED
Enters OWNER mode even if no rows were available
Default timeout: 60 secondsReturns 1 if a row is OWNED (0 on timeout)
If called within OWNER mode, the owned row is deleted
Revisiting Subscriber Code
Calls to queue_end just before queue_wait can be omitted
while (true) { rows := SELECT * FROM qt WHERE queue_wait('qt'); if (count(rows) != 0) handle_row(rows[0]); SELECT queue_end();}
Mar 26 2011 Using Q4M 39
Mar 26 2011 Using Q4M 40
Conditional queue_wait()
Consume rows of certain conditionRows that do not match will be left
untouchedOnly numeric columns can be checkedFast - condition tested once per each row
examples: SELECT queue_wait('table:(col_a*3)+col_b<col_c'); SELECT queue_wait('table:retry_count<5');
Mar 26 2011 Using Q4M 41
queue_wait(tbl_cond,[tbl_cond…,timeout])
Accepts multiple tables and timeoutData searched from leftmost table
to rightReturns table index (the leftmost
table is 1) of the newly owned rowReturns zero if no rows are being owned
example: SELECT queue_wait('table_A','table_B',60);
Mar 26 2011 Using Q4M 42
Functions for Exiting OWNER Mode
queue_endDeletes the owned row and exits OWNER
mode
queue_abortReleases (instead of deleting) the owned
row and exits OWNER modeClose of a MySQL connection does the same
thing
Relaying and Routing Messages
Mar 26 2011 Using Q4M 43
The Model
Relay (or router) consists of more than 3 processes, 2 conns
No losses, no duplicates on crash or disconnection
Mar 26 2011 Using Q4M 44
Q4M Table(source)
Q4M Table(dest.)Relay Program
Internal Row ID
Every row have a internal row IDinvisible from Q4M table definitionmonotonically increasing 64-bit integer
Used for detecting duplicatesUse two functions to skip duplicatesData loss prevented by using queue_wait /
queue_end
Mar 26 2011 Using Q4M 45
queue_rowid()
Returns row ID of the OWNED row (if any)Returns NULL if no row is OWNED
Call when retrieving data from source
Mar 26 2011 Using Q4M 46
queue_set_srcid(src_tbl_id, mode, src_row_id)
Call before inserting a row to destination tableChecks if the row is already inserted into
the table, and ignores next INSERT if true
Parameters:src_tbl_id - id to determine source table
(0 〜 63)mode - "a" to drop duplicates, "w" to resetsrc_row_id - row ID obtained from source
tableMar 26 2011 Using Q4M 47
Pseudo Code
Relays data from src_tbl to dest_tbl
while (true) { # wait for data SELECT queue_wait(src_tbl) => src_db; # read row and rowid row := (SELECT * FROM src_tbl => src_db); rowid := (SELECT queue_rowid() => src_db); # insert the row after setting srcid SELECT queue_set_srcid(src_tbl_id, 'a', rowid) => dest_db; INSERT INTO dest_tbl (row) => dest_db;}
Mar 26 2011 Using Q4M 48
q4m-forward
Simple forwarder scriptinstalled into mysql-dir/bin
usage: q4m-forward [options] src_addr dest_addrexample: % support-files/q4m-forward \ "dbi:mysql:database=db1;table=tbl1;user=foo;password=XXX" \ "dbi:mysql:database=db2;table=tbl2;host=bar;user=foo"options: --reset reset duplicate check info. --sender=idx slot no. used for checking duplicates (0..63, default: 0) --help
Mar 26 2011 Using Q4M 49
Mar 26 2011 Using Q4M 50
Limitations and the Future of Q4M
Mar 26 2011 Using Q4M 51
Things that Need to be Fixed
Table compactions is a blocking operationruns when live data becomes <25% of log
filevery bad, though not as bad as it seems
it's fast since it's a sequential write operation
Relays are slowsince transfer is done row-by-row
Binlog does not worksince MQ replication should be synchronous
Mar 26 2011 Using Q4M 52
Future of Q4M (maybe)
Support for MySQL 5.5not request yet from current users :-p
2-phase commit with other storage enginesqueue consumption and InnoDB updates
can become atomic operation