Building responsive Symbology & Suggest web
servicewith MongoDB
Andrei Palchys, @apalchys
Alex Kosau, @alexkosau
Introduction• Customer: Thomson Reuters
• Business domain: Financial markets
• Goal: Implement Next-Gen financial web services
• The project started: July 2011
• The project finished: (Dec 2011)
• Team: 1 team lead, 5+1 developers, 2 QA
Web services• Symbology Web ServiceProvides reference data about financial instruments, via symbols, codes or instrument names
• Suggest Web Service
Architecture
Sources ETLSearch Engine
Web services
Front End
Desktop
Sources ETL The New Web Services Desktop
Old
New
Reasons to write the new web services
• Bad performance
• Expensive for scaling or extending
• Not easy to manage some type of data
Requirements for the web new services
• Performance 95% Symbology requests should fit in 50ms.
95% Suggest requests should fit in 25ms.
• Use normalized data
• Use less memory as much as possible
• Fast data loading into DB
• Windows environment and .Net platform
• Microsoft SQL Server • 13 ms, too slow
• Oracle TimesTen• Relational
• Completely in-memory: guaranteed latency but slow startup
• Expensive
• McObject’s ExtremeDb• Object DB
• Native C interface: designed for performance
• Ultra reliability
• Still expensive
What we considered from commercial databases
• Redis
• Hbase
• Cassandra
• RavenDB
All these databases miss one of the requirements
What we considered from free databases
MongoDB
• Document-oriented
• Simple use (decent interface for .NET available)
• Simple maintenance (monitoring, replication, sharding)
• Data is stored in-memory once used.
• 1ms average response time
• Cross-platform (native Windows support)
Databases
• Symbology DB – about 30GB of data
• Suggest DB – >22 GB of data
Symbology WSSuggest WS
Symbology DB
Suggest WS
Suggest DB
• 6 “clusters” all around the world (TR data centers), in replica set.
• “cluster” – 3 servers (replica set + sharding) + 1 arbiter
• 2 of them are also used to load data.
• 128GB of memory per server
Deployment (planned)
• Fast search by full key
• Minimize the space taken by the data, since we need it to fit into RAM
• Data is Text only (no pictures etc)
• Full document required always
• Only some fields are used to query data, and these fields are short (3..10 symbols)
• New fields should be easily added to the “queryable” list
• Composite queries are needed sometimes• AB and CD and not EF or GH
• Fast data loading
Symbology DB: challenge
Map the names of the document fields to ints
RIC -> 1
Name -> 2
{"1": "GOOG.O","2": "Google"
}
Symbology DB: solution
Unite all queryable fields into arrays
• Query syntax is the same
• Single index – less space occupied
• Easy to add new searchable data
"s":[{
"k": 1, "v": "MSFT.O"
},{
"k": 2, "v": "Microsoft Inc."
}
]
Symbology DB: solution
Combine key and value properties
• Takes less space
• Use regex /^a../
• No performance decrease – MongoDB uses index for regex which starts with /^
"s":[
"MSFT.O|1",
"Microsoft Inc.|2"
]
Query: { s: { $regex: \"^MSFT.O\\|\" } }
Symbology DB: solution
Compress not queryable data and store as a single field (binary data)
• Encode with Protocol Buffers or MsgPack– In our case, MsgPack 2x faster than Protobuf
• Zip with Snappy – Fastest algorithm in the world.
{
"b" : BinData(0,"CgcxMDkwMzcwEgZ1cztJQk0xAAAAAAAA8D86A05ZU0IXTmV3IFl
vcmsgU3RvY2sgRXhjaGFuZ2VZAAAAAAAA8D9gAXABeAGJAQAAAAAAAPA/ogEFNDc0MU6qAQU0NzQxTrI…“)
}
Symbology DB: solution
Symbology DB: solution
Change ETL output format to json and insert directly to MongoDB
It helped to decrease loading time from 9h to 1h.
• Fast search by partial text
• Keep only top 50 entities per term
• Generate Suggest DB from existing Symbology DB
Suggest DB: challenge
Use “Inverted” index for fast search by partial text
{“term”: “g”, “references”:[…]},
{“term”: “go”, “references”:[…]},
{“term”: “goo”, “references”:[…]},
{“term”: “goog”, “references”:[…]},
Suggest DB: solution
Generate Suggest DB from existing Symbology DB
• About 750 mln temporary documents
• MongoDB Map Reduce is too slow
• All MongoDB based algorithms takes a lot of time
Use Amazon Elastic MapReduce!
10h -> 40 mins
Practical usage Amazon Elastic MapReduce (Viktar Basharymau)
http://bit.ly/usage_mapreduce
Suggest DB: solution
- Use IBsonSerializer interface instead of BsonElement attributes
- Driver has good performance – we have not found any bottlenecks.
.Net MongoDB driver