the architecture of search engines in booking.com

27
The architecture of search engines in booking.com Kang-min Liu |2017-03-09

Upload: kang-min-liu

Post on 16-Apr-2017

37 views

Category:

Technology


0 download

TRANSCRIPT

Page 1: The architecture of search engines in Booking.com

The architecture of search engines in booking.comKang-min Liu |2017-03-09

Page 2: The architecture of search engines in Booking.com

Amsterdam

Page 3: The architecture of search engines in Booking.com

關於 Booking.com

Booking.com B.V. 隸屬於 Priceline 集團(納斯達克上市公司:PCLN),擁有並經營 Booking.com™,為全球頂尖線上住宿預訂業者。Booking.com 每日平均預訂晚數超過 1,200,000。Booking.com 網站及應用程式的造訪者來自世界各地,橫跨休閒及商務旅遊市

場。

Booking.com B.V. 公司成立於 1996 年,舉凡小型家庭自營 B&B、商務公寓、五星級豪華

套房,始終以最優惠價格提供各類住宿產品。Booking.com 秉承國際化理念,提供超過 40 種語言版本網頁,合作住宿總數達 1,160,281 間, 遍及全球 226 個國家和地區。

https://www.booking.com/content/about.zh-tw.html

Page 4: The architecture of search engines in Booking.com

Problems (Tech.)

Page 5: The architecture of search engines in Booking.com

Data Volume

● Location○ Cities + POIs: 3M○ Hotels: 1.2M

● Reservation○ 1.2M per day

● Hotel Reviews○ 100M

● Availability○ 52B

Page 6: The architecture of search engines in Booking.com

Location

Page 7: The architecture of search engines in Booking.com

Search● Input

○ Free Text● Result

○ Hotel ID○ City ID○ Lat/Lon

Page 8: The architecture of search engines in Booking.com

● Names are short

○ Stopword does not apply

● Multi-language

● High Ambiguity

● Multi-meaning Words

○ Park Hotel

○ Park City

○ City Hotel

● Local names

○ USJ = 環球影城

Difficulties

Page 9: The architecture of search engines in Booking.com

● MySQL

○ SELECT id FROM City

WHERE name like ‘%London%’

● Pros

○ Easy to implement

● Cons

○ Sensitive to Token order

○ No scoring

○ No partial matching

Solution (pre-2011)

Page 10: The architecture of search engines in Booking.com

● Elasticsearch

○ English-biased tokenization rule

○ One Index for everything, for all purposes (term suggestion + search)

● Pros

○ Tokenization / Partial matchiing

○ Fast Scoring + TopK

● Cons

○ Scoring is optimized for long corpus. Difficult to tweak.

○ Machine downtime management

Solution (2011-2013)

Page 11: The architecture of search engines in Booking.com

● Brick

○ In-house search engine. Simply TCP server on top of Lucene.

○ One document per translation.

○ 8 shards / 5 replicas.

○ Term suggestion + auto-correction + classification

● Pros

○ Control the scoring for each token

○ Controls the system deployment

● Cons

○ Tightly made for our specific problem

Solution 2013..NOW

Page 12: The architecture of search engines in Booking.com

Web

search search search

search search search

search search search

Replica 0 Replica 1 Replica M

… … …

MaterializedLocation x Translation

Location +Translation

Page 13: The architecture of search engines in Booking.com

Availability (AV)

Page 14: The architecture of search engines in Booking.com

Search● Input

○ Where – city, country, region○ When – check-in date○ How long – check-out date○ What – search options (stars,

price range, etc.)● Result

○ Available hotels

Page 15: The architecture of search engines in Booking.com

Inverted index #pre-2011

● LAMP - (P = perl) stack● normalized, optimized dataset● search ~ mysql filter + perl sort● Single search worker per query

● High time complexity● Large cities are unsearchable Inventory

Search

Page 16: The architecture of search engines in Booking.com

Pre-computed AV #2011+

● materialized dataset● read-optimized databases (AV)

○ aim for constant time fetch

● Single search worker● Failed with inventory growth● Failed on big search

Search

AVInventory Materialization AVAV

Page 17: The architecture of search engines in Booking.com

Volume of AV

“The brand’s global dominance cannot be overstated: It works with

approximately 800,000 partners, offering an average of 3 room

types, 2+ rates, 30 different length of stays across 365 arrival days,

which yields something north of 52 billion price points at any given

time.”

https://www.forbes.com/sites/jonathansalembaskin/2015/09/24/booking-com-ch

annels-its-inner-geek-toward-engagement/

Page 18: The architecture of search engines in Booking.com

Map-Reduce #2014+

● Parallelized search○ multiple workers per query

● Multiple MR phrases● Search-as-a-service

○ Plus all the goods and bads of services

● World search: 20s● Overheads: IPC, serialization

AVinv Materization AVAV

MR

Web server

MR

MR

Page 19: The architecture of search engines in Booking.com

MR + LocalAV #2015+

● Data in RAM○ Bring code to data

● Java○ reduce constant factor

■ Distance for100K hotels● perl: 0.4s● java: 0.04s

○ multi-thread■ smaller overhead than IPC

inv Materization

Web server(Scatter-gather

)

SmartAV

MR AV

SmartAV

MR AV

Page 20: The architecture of search engines in Booking.com

координаторкоординатор

Web service

Coordinator

AVsearch AVsearch AVsearch

AVsearch AVsearch AVsearch

AVsearch AVsearch AVsearch

статический шардингhotel_id mod Nреплики эквивалентны

shard0

Replica 0 Replica 1 Replica M

shard1

shardN

… … …

Queues for materializating

availability

Materialization

inv

scatter-gatherрандомный выборрепликиretry, если необходимоping nodes

апдейты за последние часы

in-memory indicesAV persisted

Page 21: The architecture of search engines in Booking.com

● Statically sharded (hotel_id mod k)

● Hotel data

○ Updated Hourly

○ Kept in RAM. Non-persisted, but easy to fetch and rebuild from mysql.

● Availability data

○ Persisted

○ Realtime updates

○ RocskDB

Local AV

Page 22: The architecture of search engines in Booking.com

● Filter

○ Search criterias: Stars / WiFi / parking etc

○ Group matching: Rooms wanted, persons per room

○ Availability: check-in and check-out dates

● Sort

○ By price, distance, review score

● Top-K

● Merge

Application

Page 23: The architecture of search engines in Booking.com

● MR search vs. MR search + local AV + new tech. Stack

● Adriatic coast (~30K hotels)○ before - 13s, after - 30ms

● Rome (~6K hotels)○ before 5s, after 20ms

● Sofia (~0.3K hotels) ○ before 200ms, after - 10ms

Result

Page 24: The architecture of search engines in Booking.com

Conclusion

Page 25: The architecture of search engines in Booking.com
Page 27: The architecture of search engines in Booking.com

Thank you :)