introduction to project
DESCRIPTION
An introductory talk for Hacker News Kansai meetup on the ruby rewrite of Jim Breen's wwwjdicTRANSCRIPT
1
About me
マーク・バーンズabout.me/mark.burns
日本語ができる Ruby developer
On holiday from England
I love ruby and startups
2
Introduction
Jim Breen’s (Monash University)
Japanese-English online dictionary
wwwjdic.com
Data freely available
accepts user-contributions
3
wwwjdic (rewrite)
https://github.com/markburns/wwwjdic
4
Current interaction
GET http://wwwjdic.com
301 -> http://www.edrdg.org/cgi-bin/wwwjdic/wwjdic?1C
POST http://www.csse.monash.edu.au/~jwb/cgi-bin/wwwjdic.cgi?1E
BODY: dsrchkey=%CD%F1&dicsel=1
5
Response
5
6
Aims
JSON API
Cleaner UI
Nice features: e.g. autocomplete
Easily extensible open source codebase
9
Autocomplete
10
Trie index
http://oldblog.antirez.com/post/autocomplete-with-redis.html
Autocomplete
11
Trie index
Time: O(log(N)) N=~150,000.
Space: N*(Ma+1)
=~ 51MB
12
TRIE
12
13
https://github.com/markburns/wwwjdic/blob/master/app/data_access/auto_complete.rb
14
https://github.com/markburns/wwwjdic/blob/master/app/data_access/auto_complete.rb
["eg", "ega", "egal", "egali", "egalit", "egalita", "egalitar", "egalitari", "egalitaria", "egalitarian", "egalitarian*", "egg", "egg ",
"egg (", "egg (e"]
15
https://github.com/markburns/wwwjdic/blob/master/app/data_access/auto_complete.rb
["eg", "ega", "egal", "egali", "egalit", "egalita", "egalitar", "egalitari", "egalitaria", "egalitarian", "egalitarian*", "egg", "egg ",
"egg (", "egg (e"]
["egg dish", "egg dishe", "egg dishes", "egg dishes*", "egg l", "egg la", "egg lai", "egg laid", "egg laid ", "egg laid i", "egg laid in", "egg laid in ", "egg laid in w",
"egg laid in wi", "egg laid in win"]
16
["egg laid in wint", "egg laid in winte", "egg laid in winter", "egg laid in winter*", "egg m",
"egg me", "egg mem", "egg memb", "egg membr", "egg membra", "egg membran",
"egg membrane", "egg membrane*", "egg s", "egg sa"]
["eg", "ega", "egal", "egali", "egalit", "egalita", "egalitar", "egalitari", "egalitaria", "egalitarian", "egalitarian*", "egg", "egg ",
"egg (", "egg (e"]
["egg dish", "egg dishe", "egg dishes", "egg dishes*", "egg l", "egg la", "egg lai", "egg laid", "egg laid ", "egg laid i", "egg laid in", "egg laid in ", "egg laid in w",
"egg laid in wi", "egg laid in win"]
https://github.com/markburns/wwwjdic/blob/master/app/data_access/auto_complete.rb
17
"walr""walt"
"walrus"
["walr", "walru", "walrus", "walrus*", "walruse", "walruses", "walruses*", "walt", "waltz", "waltz ", "waltz (",
"waltz (c", "waltz (co", "waltz (com", "waltz (comp"]
https://github.com/markburns/wwwjdic/blob/master/app/data_access/auto_complete.rb
18
shutl.com & graphs
19
Isomorphism?
20
N-grams
安心 リフォーム へ の 近道 [TAB]29 (Anshin reform he no chikamichi)
安心 + リフォーム + へ + の + 近道安心 [TAB]41,322,178
21
Present/State of Play
Data import to redis
Indexed word lookup
Autocomplete
Begun work on text glossing
22
Noticably Missing
Not yet released to production
No test/staging server
However, should be easy enough to run locally
23
Future
Wordnet plus graph db => mapping of languages
Analysis of kanji
User experience/Design/Polish
N-grams
Other ideas/collaboration?
24
https://github.com/markburns/wwwjdichttp://www.slideshare.net/_mark_burns/slides-24568551
about.me/mark.burns
Questions?24