using q4m - a message queue storage engine for mysql
DESCRIPTION
Explains how to use Q4M. Slides used at MySQL Conference & Expro 2009.TRANSCRIPT
Using Q4Ma message queue storage engine for
MySQL
Cybozu Labs, Inc.Kazuho Oku
Background
Apr 22 2009 Using Q4M 2
Who am I?
Name: Kazuho Oku ( 奥 一穂 )Original Developer of Palmscape /
XiinoThe oldest web browser for Palm OS
Working at Cybozu Labs since 2005Research subsidiary of Cybozu, Inc. in Japan
Apr 22 2009 Using Q4M 3
About Cybozu, Inc.
Japan’s largest groupware vendorMostly provides as software products, not
as servicesSome of our apps bundle MySQL as storage
Apr 22 2009 Using Q4M 4
About Pathtraq
Started in Aug. 2007Web ranking service
One of Japan’s largestlike Alexa, but semi-realtime, and per-pagerunning on MySQL
Need for a fast and reliable message relayfor communication between the main
server and content analysis server(s)
Apr 22 2009 Using Q4M 5
Design Goals of Q4M
RobustDo not lose data on OS crash or power
failure
FastTransfer thousands of messages per second
Easy to UseUse SQL for access / maintenanceIntegration into MySQL
no more separate daemons to take care of
Apr 22 2009 Using Q4M 6
Apr 22 2009 Using Q4M 7
What is a Message Queue?
Apr 22 2009 Using Q4M 8
What is a Message Queue?
Middleware for persistent asynchronous communicationcommunicate between fixed pairs (parties)a.k.a. Message Oriented Middleware
MQ is intermediate storageRDBMS is persistent storage
Senders / receivers may go down
Apr 22 2009 Using Q4M 9
Minimal Configuration of a MQ
Senders and receivers access a single queue
Sender Receiver
Queue
Apr 22 2009 Using Q4M 10
MQ and Relays
Separate queue for sender and receiver
Messages relayed between queues
Sender
Queue
Receiver
Queue
Relay
Apr 22 2009 Using Q4M 11
Merits of Message Relays
Destination can be changed easilyRelays may transfer messages to different
locations depending on their headers
Robustness against network failureno loss or duplicates when the relay fails
Logging and Multicasting, etc.
Apr 22 2009 Using Q4M 12
Message Brokers
Publish / subscribe modelSeparation between components and their
integrationComponents read / write to predefined
queuesIntegration is definition of routing rules
between the message queuesMessages are often transformed (filtered)
within the relay agent
Apr 22 2009 Using Q4M 13
What about Q4M?
Q4M itself is a message queueCan connect Q4M instances to
create a message relayProvides API for creating message
relays and brokers
Performance of Q4M
over 7,000 mess/sec.message size: avg. 512 bytessyncing to disk
Outperforming most needsif you need more, just scale outCan coexist with other storage engines
without sacrificing their performance
see http://labs.cybozu.co.jp/blog/kazuhoatwork/2008/06/q4m_06_release_and_benchmarks.phpApr 22 2009 Using Q4M 14
Apr 22 2009 Using Q4M 15
Applications of Q4M
Apr 22 2009 Using Q4M 16
Asynchronous Updates
Mixi (Japan's one of the largest SNS) uses Q4M to buffer writes to DB, to offload peak demands
from http://alpha.mixi.co.jp/blog/?p=272
Apr 22 2009 Using Q4M 17
Connecting Distant Servers
Pathtraq uses Q4M to create a relay between its database and content analysis processes
PathtraqDB
ContentAnalysis
ProcessesMySQL conn.over SSL,gzip
→ Contents to be analyzed →
← Results of the analysis ←
To Prefetch Data
livedoor Reader (web-based feed aggregator) uses Q4M to prefetch data from database to memcached
uses Q4M for scheduling web crawlers as well
from http://d.hatena.ne.jp/mala/20081212/1229074359
Apr 22 2009 Using Q4M 18
Apr 22 2009 Using Q4M 19
Scheduling Web Crawlers
Web crawlers with retry-on-errorSample code included in Q4M dist.
URLDB
Request Queue
Spiders
Retry Queue
Re-scheduler
Store Result
Read URL
If failed to fetch, store URL in retry queue
Delayed Content Generation
Hatetter (RSS feed-to-twitter-API gateway) uses Q4M to delay content generationSource code:
github.com/yappo/website-hatetter
Apr 22 2009 Using Q4M 20
Apr 22 2009 Using Q4M 21
User Notifications
For sending notifications from web services
DB
Queue(s)
App. Logic SMTP Agent
IM Agent
Installing Q4M
Apr 22 2009 Using Q4M 22
Installing Q4M
Compatible with MySQL 5.1Download from q4m.31tools.com
Binary releases available for some platforms
Installing from source:requires source code of MySQL./configure && make && make installrun support-files/install.sql
Apr 22 2009 Using Q4M 23
Apr 22 2009 Using Q4M 24
Configuration Options of Q4M
--with-sync=no|fsync|fdatasync|fcntlControls synchronization to diskdefault: fdatasync on linux
--enable-mmapMmap’ed reads lead to higher throughputdefault: yes
--with-delete=pwrite|msyncmsync recommended on linux>=2.6.20 if
you need really high performance
Apr 22 2009 Using Q4M 25
Q4M Basics
The Model
Apr 22 2009 Using Q4M 26
Q4M table
Subscribers
Publisher
Publisher
Publisher
Various publishers write to queueSet of subscribers consume the entries in
queue
Creating a Q4M Table
ENGINE=QUEUE creates a Q4M table
No primary keys or indexes
Sorted by insertion order (it’s a queue)
Apr 22 2009 Using Q4M 27
mysql> CREATE TABLE qt ( -> id int(10) unsigned NOT NULL, -> message varchar(255) NOT NULL -> ) ENGINE=QUEUE;Query OK, 0 rows affected (0.42 sec)
Modifying Data on a Q4M Table
No restrictions for INSERT and DELETE
No support for UPDATE
Apr 22 2009 Using Q4M 28
mysql> INSERT INTO qt (id,message) -> VALUES -> (1,'Hello'), -> (2,'Bonjour'), -> (3,'Hola');Query OK, 3 rows affected (0.02 sec)
mysql> SELECT * FROM qt;+----+---------+| id | message |+----+---------+| 1 | Hello | | 2 | Bonjour | | 3 | Hola | +----+---------+3 rows in set (0.00 sec)
SELECT from a Q4M Table
Works the same as other storage engines
SELECT COUNT(*) is cached
Apr 22 2009 Using Q4M 29
mysql> SELECT * FROM qt;+----+---------+| id | message |+----+---------+| 1 | Hello | | 2 | Bonjour | | 3 | Hola | +----+---------+3 rows in set (0.00 sec)
mysql> SELECT COUNT(*) FROM qt;+----------+| COUNT(*) |+----------+| 3 | +----------+1 row in set (0.00 sec)
How to subscribe to a queue?
Calling queue_wait()
After calling, only one row becomes visible from the connection
Apr 22 2009 Using Q4M 30
mysql> SELECT * FROM qt;+----+---------+| id | message |+----+---------+| 1 | Hello | | 2 | Bonjour | | 3 | Hola | +----+---------+3 rows in set (0.00 sec)
mysql> SELECT queue_wait('qt');+------------------+| queue_wait('qt') |+------------------+| 1 | +------------------+1 row in set (0.00 sec)
mysql> SELECT * FROM qt;+----+---------+| id | message |+----+---------+| 1 | Hello | +----+---------+1 row in set (0.00 sec)
OWNER Mode and NON-OWNER Mode
In OWNER mode, only the OWNED row is visible
OWNED row becomes invisible from other connections
rows of other storage engines are visible
Apr 22 2009 Using Q4M 31
NON-OWNER Mode
1,'Hello'2,'Bonjour'3,'Hola'
OWNER Mode
1,'Hello'
queue_wait()
queue_end()queue_abort()
Returning to NON-OWNER mode
By calling queue_abort, the connection returns to NON-OWNER mode
Apr 22 2009 Using Q4M 32
mysql> SELECT QUEUE_ABORT();+---------------+| QUEUE_ABORT() |+---------------+| 1 | +---------------+1 row in set (0.00 sec)
mysql> SELECT * FROM qt;+----+---------+| id | message |+----+---------+| 1 | Hello | | 2 | Bonjour | | 3 | Hola | +----+---------+3 rows in set (0.01 sec)
Consuming a Row
By calling queue_end, the OWNED row is deleted, and connection returns to NON-OWNER mode
Apr 22 2009 Using Q4M 33
mysql> SELECT queue_wait('qt');(snip)mysql> SELECT * FROM qt;+----+---------+| id | message |+----+---------+| 1 | Hello | +----+---------+1 row in set (0.01 sec)
mysql> SELECT queue_end();+-------------+| queue_end() |+-------------+| 1 | +-------------+1 row in set (0.01 sec)
mysql> SELECT * FROM qt;+----+---------+| id | message |+----+---------+| 2 | Bonjour | | 3 | Hola | +----+---------+2 rows in set (0.00 sec)
Writing a Subscriber
Call two functions: queue_wait, queue_end Multiple subscribers can be run concurrently
each row in the queue is consumed only once
while (true) { SELECT queue_wait('qt'); # switch to owner mode rows := SELECT * FROM qt; # obtain data if (count(rows) != 0) # if we have any data, then handle_row(rows[0]); # consume the row SELECT queue_end(); # erase the row from queue}
Apr 22 2009 Using Q4M 34
Writing a Subscriber (cont'd)
Or call queue_wait as a conditionWarning: conflicts with trigger-based
insertions
while (true) { rows := SELECT * FROM qt WHERE queue_wait('qt'); if (count(rows) != 0) handle_row(rows[0]); SELECT queue_end();}
Apr 22 2009 Using Q4M 35
The Model – with code
Apr 22 2009 Using Q4M 36
while (true) { rows := SELECT * FROM qt WHERE queue_wait('qt'); if (count(rows) != 0) handle_row(rows[0]); SELECT queue_end();}
Q4M table
Subscribers
INSERT INTO queue ...
INSERT INTO queue ...
INSERT INTO queue ...
Publisher
Publisher
Publisher
Three Functions in Detail
Apr 22 2009 Using Q4M 37
Apr 22 2009 Using Q4M 38
queue_wait(table)
Enters OWNER mode0 〜 1 row becomes OWNED
Enters OWNER mode even if no rows were available
Default timeout: 60 secondsReturns 1 if a row is OWNED (0 on timeout)
If called within OWNER mode, the owned row is deleted
Revisiting Subscriber Code
Calls to queue_end just before queue_wait can be omitted
while (true) { rows := SELECT * FROM qt WHERE queue_wait('qt'); if (count(rows) != 0) handle_row(rows[0]); SELECT queue_end();}
Apr 22 2009 Using Q4M 39
Apr 22 2009 Using Q4M 40
Conditional queue_wait()
Consume rows of certain conditionRows that do not match will be left
untouchedOnly numeric columns can be checkedFast - condition tested once per each row
examples: SELECT queue_wait('table:(col_a*3)+col_b<col_c'); SELECT queue_wait('table:retry_count<5');
Apr 22 2009 Using Q4M 41
queue_wait(tbl_cond,[tbl_cond…,timeout])
Accepts multiple tables and timeoutData searched from leftmost table
to rightReturns table index (the leftmost
table is 1) of the newly owned rowReturns zero if no rows are being owned
example: SELECT queue_wait('table_A','table_B',60);
Apr 22 2009 Using Q4M 42
Functions for Exiting OWNER Mode
queue_endDeletes the owned row and exits OWNER
mode
queue_abortReleases (instead of deleting) the owned
row and exits OWNER modeClose of a MySQL connection does the same
thing
Relaying and Routing Messages
Apr 22 2009 Using Q4M 43
The Problem
Relay (or router) consists of more than 3 processes, 2 conns
No losses, no duplicates on crash or disconnection
Apr 22 2009 Using Q4M 44
Q4M Table(source)
Q4M Table(dest.)Relay Program
Internal Row ID
Every row have a internal row IDinvisible from Q4M table definitionmonotonically increasing 64-bit integer
Used for detecting duplicatesUse two functions to skip duplicatesData loss prevented by using queue_wait /
queue_end
Apr 22 2009 Using Q4M 45
queue_rowid()
Returns row ID of the OWNED row (if any)Returns NULL if no row is OWNED
Call when retrieving data from source
Apr 22 2009 Using Q4M 46
queue_set_srcid(src_tbl_id, mode, src_row_id)
Call before inserting a row to destination tableChecks if the row is already inserted into
the table, and ignores next INSERT if true
Parameters:src_tbl_id - id to determine source table
(0 〜 63)mode - "a" to drop duplicates, "w" to resetsrc_row_id - row ID obtained from source
tableApr 22 2009 Using Q4M 47
Pseudo Code
Relays data from src_tbl to dest_tbl
while (true) { # wait for data SELECT queue_wait(src_tbl) => src_db; # read row and rowid row := (SELECT * FROM src_tbl => src_db); rowid := (SELECT queue_rowid() => src_db); # insert the row after setting srcid SELECT queue_set_srcid(src_tbl_id, 'a', rowid) => dest_db; INSERT INTO dest_tbl (row) => dest_db;}
Apr 22 2009 Using Q4M 48
q4m-forward
Simple forwarder scriptinstalled into mysql-dir/bin
usage: q4m-forward [options] src_addr dest_addrexample: % support-files/q4m-forward \ "dbi:mysql:database=db1;table=tbl1;user=foo;password=XXX" \ "dbi:mysql:database=db2;table=tbl2;host=bar;user=foo"options: --reset reset duplicate check info. --sender=idx slot no. used for checking duplicates (0..63, default: 0) --help
Apr 22 2009 Using Q4M 49
Apr 22 2009 Using Q4M 50
Limitations and the Future of Q4M
Apr 22 2009 Using Q4M 51
Things that Need to be Fixed
Table compactions is a blocking operationruns when live data becomes <25% of log
filevery bad, though not as bad as it seems
it's fast since it's a sequential write operation
Relays are slowsince transfer is done row-by-row
Binlog does not worksince MQ replication should be synchronous
Apr 22 2009 Using Q4M 52
Future of Q4M
2-phase commit with other storage engines (maybe)queue consumption and InnoDB updates
can become atomic operation
Thank you
http://q4m.31tools.com/
Apr 22 2009 Using Q4M 53