bulat fatkulin - how to extract terminology equivalents from wikipedia-like corpora dumps in eastern...

6

Upload: aist

Post on 24-Jul-2015

15 views

Category:

Presentations & Public Speaking


0 download

TRANSCRIPT

Page 1: Bulat Fatkulin - How To Extract Terminology Equivalents From Wikipedia-like Corpora Dumps In Eastern Languages By Means Of Unix AND Python Tools

How To Extract Terminology Equivalents From

Wikipedia-like Corpora Dumps In Eastern Languages

By Means Of Unix AND Python Tools.

Á. Ã. Ôàòêóëèí

ÔÁÃÎÓ ÂÏÎ �Þæíî-Óðàëüñêèé ãîñóäàðñòâåííûé óíèâåðñèòåò (ÍÈÓ), êàôåäðà îáùåéëèíãâèñòèêè, äîöåíòhttp://susu.ac.ru

8 àïðåëÿ 2015 ã.

Á. Ã. Ôàòêóëèí (ÔÁÃÎÓ ÂÏÎ �Þæíî-Óðàëüñêèé ãîñóäàðñòâåííûé óíèâåðñèòåò (ÍÈÓ), êàôåäðà îáùåé ëèíãâèñòèêè, äîöåíò http://susu.ac.ru)How To Extract Terminology Equivalents From Wikipedia-like Corpora Dumps In Eastern Languages By Means Of Unix AND Python Tools.8 àïðåëÿ 2015 ã. 1 / 5

Page 2: Bulat Fatkulin - How To Extract Terminology Equivalents From Wikipedia-like Corpora Dumps In Eastern Languages By Means Of Unix AND Python Tools

Ðèñ. : The Triangle of Orientalism

Á. Ã. Ôàòêóëèí (ÔÁÃÎÓ ÂÏÎ �Þæíî-Óðàëüñêèé ãîñóäàðñòâåííûé óíèâåðñèòåò (ÍÈÓ), êàôåäðà îáùåé ëèíãâèñòèêè, äîöåíò http://susu.ac.ru)How To Extract Terminology Equivalents From Wikipedia-like Corpora Dumps In Eastern Languages By Means Of Unix AND Python Tools.8 àïðåëÿ 2015 ã. 2 / 5

Page 3: Bulat Fatkulin - How To Extract Terminology Equivalents From Wikipedia-like Corpora Dumps In Eastern Languages By Means Of Unix AND Python Tools

Ðèñ. : Use python package

Á. Ã. Ôàòêóëèí (ÔÁÃÎÓ ÂÏÎ �Þæíî-Óðàëüñêèé ãîñóäàðñòâåííûé óíèâåðñèòåò (ÍÈÓ), êàôåäðà îáùåé ëèíãâèñòèêè, äîöåíò http://susu.ac.ru)How To Extract Terminology Equivalents From Wikipedia-like Corpora Dumps In Eastern Languages By Means Of Unix AND Python Tools.8 àïðåëÿ 2015 ã. 2 / 5

Page 4: Bulat Fatkulin - How To Extract Terminology Equivalents From Wikipedia-like Corpora Dumps In Eastern Languages By Means Of Unix AND Python Tools

Ðèñ. : Download Baidu pages

Á. Ã. Ôàòêóëèí (ÔÁÃÎÓ ÂÏÎ �Þæíî-Óðàëüñêèé ãîñóäàðñòâåííûé óíèâåðñèòåò (ÍÈÓ), êàôåäðà îáùåé ëèíãâèñòèêè, äîöåíò http://susu.ac.ru)How To Extract Terminology Equivalents From Wikipedia-like Corpora Dumps In Eastern Languages By Means Of Unix AND Python Tools.8 àïðåëÿ 2015 ã. 3 / 5

Page 5: Bulat Fatkulin - How To Extract Terminology Equivalents From Wikipedia-like Corpora Dumps In Eastern Languages By Means Of Unix AND Python Tools

At the �rst stage of our study we used the simplest instruments of

UNIX bash tools. Then we used zcat tool and less tools for quick

reading the text.

To �nd the terminology about Iran we used the grep command to

�nd all strings containing �Iran� word in di�erent languages

according to the language of the text.

sort <result| uniq | sort -nc > result1

Á. Ã. Ôàòêóëèí (ÔÁÃÎÓ ÂÏÎ �Þæíî-Óðàëüñêèé ãîñóäàðñòâåííûé óíèâåðñèòåò (ÍÈÓ), êàôåäðà îáùåé ëèíãâèñòèêè, äîöåíò http://susu.ac.ru)How To Extract Terminology Equivalents From Wikipedia-like Corpora Dumps In Eastern Languages By Means Of Unix AND Python Tools.8 àïðåëÿ 2015 ã. 4 / 5

Page 6: Bulat Fatkulin - How To Extract Terminology Equivalents From Wikipedia-like Corpora Dumps In Eastern Languages By Means Of Unix AND Python Tools

Ðèñ. : The process of work

Á. Ã. Ôàòêóëèí (ÔÁÃÎÓ ÂÏÎ �Þæíî-Óðàëüñêèé ãîñóäàðñòâåííûé óíèâåðñèòåò (ÍÈÓ), êàôåäðà îáùåé ëèíãâèñòèêè, äîöåíò http://susu.ac.ru)How To Extract Terminology Equivalents From Wikipedia-like Corpora Dumps In Eastern Languages By Means Of Unix AND Python Tools.8 àïðåëÿ 2015 ã. 5 / 5