introduction to project

24
1 About me マママ ママママ about.me/mark.burns マママママママ Ruby developer On holiday from England I love ruby and startups

Upload: mark-burns

Post on 15-Jan-2015

100 views

Category:

Technology


1 download

DESCRIPTION

An introductory talk for Hacker News Kansai meetup on the ruby rewrite of Jim Breen's wwwjdic

TRANSCRIPT

Page 1: Introduction to  project

1

About me

マーク・バーンズabout.me/mark.burns

日本語ができる Ruby developer

On holiday from England

I love ruby and startups

Page 2: Introduction to  project

2

Introduction

Jim Breen’s (Monash University)

Japanese-English online dictionary

wwwjdic.com

Data freely available

accepts user-contributions

Page 4: Introduction to  project

4

Current interaction

GET http://wwwjdic.com

301 -> http://www.edrdg.org/cgi-bin/wwwjdic/wwjdic?1C

POST http://www.csse.monash.edu.au/~jwb/cgi-bin/wwwjdic.cgi?1E

BODY: dsrchkey=%CD%F1&dicsel=1

Page 5: Introduction to  project

5

Response

5

Page 6: Introduction to  project

6

Aims

JSON API

Cleaner UI

Nice features: e.g. autocomplete

Easily extensible open source codebase

Page 7: Introduction to  project

7

JSON API

GET http://localhost:4000/ 卵 .json

Page 8: Introduction to  project

8

Simpler UI (Example)GET http://localhost:4000/ 卵

8

Page 9: Introduction to  project

9

Autocomplete

Page 10: Introduction to  project

10

Trie index

http://oldblog.antirez.com/post/autocomplete-with-redis.html

Autocomplete

Page 11: Introduction to  project

11

Trie index

Time: O(log(N)) N=~150,000.

Space: N*(Ma+1)

=~ 51MB

Page 12: Introduction to  project

12

TRIE

12

Page 14: Introduction to  project

14

https://github.com/markburns/wwwjdic/blob/master/app/data_access/auto_complete.rb

["eg", "ega", "egal", "egali", "egalit", "egalita", "egalitar", "egalitari", "egalitaria", "egalitarian", "egalitarian*", "egg", "egg ",

"egg (", "egg (e"]

Page 15: Introduction to  project

15

https://github.com/markburns/wwwjdic/blob/master/app/data_access/auto_complete.rb

["eg", "ega", "egal", "egali", "egalit", "egalita", "egalitar", "egalitari", "egalitaria", "egalitarian", "egalitarian*", "egg", "egg ",

"egg (", "egg (e"]

["egg dish", "egg dishe", "egg dishes", "egg dishes*", "egg l", "egg la", "egg lai", "egg laid", "egg laid ", "egg laid i", "egg laid in", "egg laid in ", "egg laid in w",

"egg laid in wi", "egg laid in win"]

Page 16: Introduction to  project

16

["egg laid in wint", "egg laid in winte", "egg laid in winter", "egg laid in winter*", "egg m",

"egg me", "egg mem", "egg memb", "egg membr", "egg membra", "egg membran",

"egg membrane", "egg membrane*", "egg s", "egg sa"]

["eg", "ega", "egal", "egali", "egalit", "egalita", "egalitar", "egalitari", "egalitaria", "egalitarian", "egalitarian*", "egg", "egg ",

"egg (", "egg (e"]

["egg dish", "egg dishe", "egg dishes", "egg dishes*", "egg l", "egg la", "egg lai", "egg laid", "egg laid ", "egg laid i", "egg laid in", "egg laid in ", "egg laid in w",

"egg laid in wi", "egg laid in win"]

https://github.com/markburns/wwwjdic/blob/master/app/data_access/auto_complete.rb

Page 17: Introduction to  project

17

"walr""walt"

"walrus"

["walr", "walru", "walrus", "walrus*", "walruse", "walruses", "walruses*", "walt", "waltz", "waltz ", "waltz (",

"waltz (c", "waltz (co", "waltz (com", "waltz (comp"]

https://github.com/markburns/wwwjdic/blob/master/app/data_access/auto_complete.rb

Page 18: Introduction to  project

18

shutl.com & graphs

Page 19: Introduction to  project

19

Isomorphism?

Page 20: Introduction to  project

20

N-grams

安心 リフォーム へ の 近道 [TAB]29 (Anshin reform he no chikamichi)

安心 + リフォーム + へ + の + 近道安心 [TAB]41,322,178

Page 21: Introduction to  project

21

Present/State of Play

Data import to redis

Indexed word lookup

Autocomplete

Begun work on text glossing

Page 22: Introduction to  project

22

Noticably Missing

Not yet released to production

No test/staging server

However, should be easy enough to run locally

Page 23: Introduction to  project

23

Future

Wordnet plus graph db => mapping of languages

Analysis of kanji

User experience/Design/Polish

N-grams

Other ideas/collaboration?

Page 24: Introduction to  project

24

https://github.com/markburns/wwwjdichttp://www.slideshare.net/_mark_burns/slides-24568551

about.me/mark.burns

Questions?24