bulat fatkulin - how to extract terminology equivalents from wikipedia-like corpora dumps in eastern...
TRANSCRIPT
How To Extract Terminology Equivalents From
Wikipedia-like Corpora Dumps In Eastern Languages
By Means Of Unix AND Python Tools.
Á. Ã. Ôàòêóëèí
ÔÁÃÎÓ ÂÏÎ �Þæíî-Óðàëüñêèé ãîñóäàðñòâåííûé óíèâåðñèòåò (ÍÈÓ), êàôåäðà îáùåéëèíãâèñòèêè, äîöåíòhttp://susu.ac.ru
8 àïðåëÿ 2015 ã.
Á. Ã. Ôàòêóëèí (ÔÁÃÎÓ ÂÏÎ �Þæíî-Óðàëüñêèé ãîñóäàðñòâåííûé óíèâåðñèòåò (ÍÈÓ), êàôåäðà îáùåé ëèíãâèñòèêè, äîöåíò http://susu.ac.ru)How To Extract Terminology Equivalents From Wikipedia-like Corpora Dumps In Eastern Languages By Means Of Unix AND Python Tools.8 àïðåëÿ 2015 ã. 1 / 5
Ðèñ. : The Triangle of Orientalism
Á. Ã. Ôàòêóëèí (ÔÁÃÎÓ ÂÏÎ �Þæíî-Óðàëüñêèé ãîñóäàðñòâåííûé óíèâåðñèòåò (ÍÈÓ), êàôåäðà îáùåé ëèíãâèñòèêè, äîöåíò http://susu.ac.ru)How To Extract Terminology Equivalents From Wikipedia-like Corpora Dumps In Eastern Languages By Means Of Unix AND Python Tools.8 àïðåëÿ 2015 ã. 2 / 5
Ðèñ. : Use python package
Á. Ã. Ôàòêóëèí (ÔÁÃÎÓ ÂÏÎ �Þæíî-Óðàëüñêèé ãîñóäàðñòâåííûé óíèâåðñèòåò (ÍÈÓ), êàôåäðà îáùåé ëèíãâèñòèêè, äîöåíò http://susu.ac.ru)How To Extract Terminology Equivalents From Wikipedia-like Corpora Dumps In Eastern Languages By Means Of Unix AND Python Tools.8 àïðåëÿ 2015 ã. 2 / 5
Ðèñ. : Download Baidu pages
Á. Ã. Ôàòêóëèí (ÔÁÃÎÓ ÂÏÎ �Þæíî-Óðàëüñêèé ãîñóäàðñòâåííûé óíèâåðñèòåò (ÍÈÓ), êàôåäðà îáùåé ëèíãâèñòèêè, äîöåíò http://susu.ac.ru)How To Extract Terminology Equivalents From Wikipedia-like Corpora Dumps In Eastern Languages By Means Of Unix AND Python Tools.8 àïðåëÿ 2015 ã. 3 / 5
At the �rst stage of our study we used the simplest instruments of
UNIX bash tools. Then we used zcat tool and less tools for quick
reading the text.
To �nd the terminology about Iran we used the grep command to
�nd all strings containing �Iran� word in di�erent languages
according to the language of the text.
sort <result| uniq | sort -nc > result1
Á. Ã. Ôàòêóëèí (ÔÁÃÎÓ ÂÏÎ �Þæíî-Óðàëüñêèé ãîñóäàðñòâåííûé óíèâåðñèòåò (ÍÈÓ), êàôåäðà îáùåé ëèíãâèñòèêè, äîöåíò http://susu.ac.ru)How To Extract Terminology Equivalents From Wikipedia-like Corpora Dumps In Eastern Languages By Means Of Unix AND Python Tools.8 àïðåëÿ 2015 ã. 4 / 5
Ðèñ. : The process of work
Á. Ã. Ôàòêóëèí (ÔÁÃÎÓ ÂÏÎ �Þæíî-Óðàëüñêèé ãîñóäàðñòâåííûé óíèâåðñèòåò (ÍÈÓ), êàôåäðà îáùåé ëèíãâèñòèêè, äîöåíò http://susu.ac.ru)How To Extract Terminology Equivalents From Wikipedia-like Corpora Dumps In Eastern Languages By Means Of Unix AND Python Tools.8 àïðåëÿ 2015 ã. 5 / 5