o )057, . 4 xml - 東京外国語大学 tokyo university of … sketch 1=]j2z da f£ 9)a8c kw4p hgon...

25
O)057,.4 xml &! xml ?<GJ&! Rw3schools.com(http://www.w3schools.com/xml/xml_tree.asp)S (i)1)'7ML=&!N <xml> <doc Level=”” Title=”*%+;”> <text Types=””> (ii) .)&7MD8&!N </text> </doc> </xml> P%")(6$6)057, (i) CE#7-%9@ 3)

Upload: trankhanh

Post on 19-May-2018

218 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: O )057, . 4 xml - 東京外国語大学 Tokyo University of … Sketch 1=]j2z da f£ 9)A8C kw4p hgOn Word Sketch & .D70S $ Q & ' & Word Sketch b|vJ 1. .D70'¢t 2. Lemma Z Part of speech

O��)057,��.��4� xml&!� ��

xml � ?<�GJ����&!

Rw3schools.com(http://www.w3schools.com/xml/xml_tree.asp)��S

(i)1)'7ML=&!N

<xml>

<doc Level=”��” Title=”*�%+;”>

<text Types=”��”>

(ii) .)&7MD8&!N

</text>

</doc>

</xml>

P�%")(�6$6���)057,

(i) CE�#7-%�9@�

���� 3)

Page 2: O )057, . 4 xml - 東京外国語大学 Tokyo University of … Sketch 1=]j2z da f£ 9)A8C kw4p hgOn Word Sketch & .D70S $ Q & ' & Word Sketch b|vJ 1. .D70'¢t 2. Lemma Z Part of speech

(ii) #7-%�;:�A�FI�H>

Q�.��4��)057,

(i)

�2�/23��.��4�KB�Next� 3) ��)057,

3)

Page 3: O )057, . 4 xml - 東京外国語大学 Tokyo University of … Sketch 1=]j2z da f£ 9)A8C kw4p hgOn Word Sketch & .D70S $ Q & ' & Word Sketch b|vJ 1. .D70'¢t 2. Lemma Z Part of speech

(ii)

(iii) ���.��4��)057,D���#7-%� compile �

3)

3)

Page 4: O )057, . 4 xml - 東京外国語大学 Tokyo University of … Sketch 1=]j2z da f£ 9)A8C kw4p hgOn Word Sketch & .D70S $ Q & ' & Word Sketch b|vJ 1. .D70'¢t 2. Lemma Z Part of speech

!

Simple'query lemma '

Lemma '

Phrase '

Word '

Character '

'

!

Context'

''''''Lemma'Filter lemma '

'PoS'Filter '

'

all any 1none '

'

Text'types '

'

'

!

Sort'

'''''''Left L1 '

''Right R1 '

''Node '

''''''References '

'''' L1'' node' R1 '

Page 5: O )057, . 4 xml - 東京外国語大学 Tokyo University of … Sketch 1=]j2z da f£ 9)A8C kw4p hgOn Word Sketch & .D70S $ Q & ' & Word Sketch b|vJ 1. .D70'¢t 2. Lemma Z Part of speech

Sample '

'

Filter'

'''' positive negative'

Frequency'

''''''Frequency '

'

''Node'tags '

''Node'forms '

''Doc'IDs '

''''''Text'Types '

'

P/N'(Positive/Negative) '

'

Collocation '

''Attribute word/tag/lempos/lemma… '

''Range '

''Minimum'Frequency'in'corpus '

Minimum'Frequency'in'given'range '

'

P/N'(Positive/Negative) '

'

Visualize '

'

Page 6: O )057, . 4 xml - 東京外国語大学 Tokyo University of … Sketch 1=]j2z da f£ 9)A8C kw4p hgOn Word Sketch & .D70S $ Q & ' & Word Sketch b|vJ 1. .D70'¢t 2. Lemma Z Part of speech

1

Sketch Engine

E

Concordance -CQL -

CQL(= Corpus Query Language) r w c v tk

CQL 1990 w University of Stuttgartx IMS

� w s|{o h ” 3 �

> x D WW N WJ1a P JaE

b WW N WJ w S I*PJQQ *W L*PJQTSV yg

b P J w o g } yg

W R p x

D S I1aW RaE

!DW E RO!

W R p x

DPJQQ 1aW RaE

W R x p x

DPJQQ 1aW Ra" W L1a ) aE { y DPJQTSV1aW R(RaE

b x w※osy > xpm x LVJW V QQ

b� ) x

Page 7: O )057, . 4 xml - 東京外国語大学 Tokyo University of … Sketch 1=]j2z da f£ 9)A8C kw4p hgOn Word Sketch & .D70S $ Q & ' & Word Sketch b|vJ 1. .D70'¢t 2. Lemma Z Part of speech

2

b� * & * { y *

W L1aa r x v

!HSR VJ) ! DW L1! ! W L1!==!E

!HSR VJ) ! DW L1! !E DW L1!==!E

!HSR VJ) ! DW L1! ==!E

W R & & x

DPJQQ 1aW Ra"W L1aB) aEDW L1a ) aE], - DW L1a= =aE

bD EuD Ex wy frs vls SO

b] t ], - y i ugh

nx t > os| u

t

Page 8: O )057, . 4 xml - 東京外国語大学 Tokyo University of … Sketch 1=]j2z da f£ 9)A8C kw4p hgOn Word Sketch & .D70S $ Q & ' & Word Sketch b|vJ 1. .D70'¢t 2. Lemma Z Part of speech

3

� v �

JPT WS ISa V) JPT ISax “ c p

DPJQQ 1! JPT!"W L1!B) !ED S I1!WS!E DW L1!B) !E

b x y B3 J cB5 IS cB J cBB PJ[NH P J V

& J & (JI t x

DW L1a ) aEDPJQQ 1a JaEDW L1aB) a" S I1a) JIaE

PSSO* NRL & T*IS R x

DPJQQ 1aPSSO NRLa"W L1aB) aEDW L  1aB) aE]+ / a TIS Ra

< x x x

DW L1! ) !E DW L1! ) !E ! RI S ! DW L1! ) !E

� NW NR r �

o

D S I1aHSR V) aEDW L  1aB) aE D S I1a aE NW NR0V*2

t { c t x wf p sx p

DW L1a ) aE& NW NRDW L1aB3) aEDE DW L1aB3) aE

wursnvg x

b�> J NW NR > J ugh tf

b� SRW NRNRL f H ) OJWH RLNRJ

-+,+ d ./ CJ

OJWH RLNRJ x u e

Page 9: O )057, . 4 xml - 東京外国語大学 Tokyo University of … Sketch 1=]j2z da f£ 9)A8C kw4p hgOn Word Sketch & .D70S $ Q & ' & Word Sketch b|vJ 1. .D70'¢t 2. Lemma Z Part of speech

x�>G� >Gfo 2015

Sketch Engine\�FBI6K

{s:Uq��

3ªWord List

;PC=§<D;PC=¨`�1�t�5za"�3

:J?:�3#�

Page 10: O )057, . 4 xml - 東京外国語大学 Tokyo University of … Sketch 1=]j2z da f£ 9)A8C kw4p hgOn Word Sketch & .D70S $ Q & ' & Word Sketch b|vJ 1. .D70'¢t 2. Lemma Z Part of speech

�'�¢�a!�+�

R�1£&

�Subcorpus: <D;PC=5�yO��&[w"�+�

�Search attribute: word, lemma, tag (POS)%$��*+�

use n-grams"( n�'� '�t�5[w"�+�

��+""���!J=@5[w"�+� +�WS' options5\��#."�+�

��"�;PC=5 BNC�<D;PC=5Written_Medium_Book�Search attribute5

lemma&�! word list5[w�3#WS'0�&%2+�

§¤r�¥�£& lemma5T*��t�¨

�Filter options�

�Regular expression; ����"��"�+� ” .* ”�M7KA8PA§Z�Z~m^�

!. OK¨5��'"�”th.*”"���3# the, that, this�'�t��[w�4+�

§�'V�©�«�¦%$��2+�¨

�Minimum frequency: �p¤r5|n"�+�

�Maximum frequency: �k¤r5|n"�+�

�Whitelist: �t�&h-���n'b�J=@��3jf�6?ELPA"�+�

�Blacklist: �t�&h-��%��n'b�J=@��3jf�6?ELPA"�+�

�Include non-words: d��/�e%$5h-��#�&\�+�

Page 11: O )057, . 4 xml - 東京外国語大学 Tokyo University of … Sketch 1=]j2z da f£ 9)A8C kw4p hgOn Word Sketch & .D70S $ Q & ' & Word Sketch b|vJ 1. .D70'¢t 2. Lemma Z Part of speech

Output options Frequency figures: Hit counts = raw frequency

Document counts ARF (Average Reduced Frequency)

e.g. Output type: Simple Keywords

Reference (sub)corpus

Prefer: rare/common words

Change output attribute(s):

BNC search attribute pos minimum frequency 0

Page 12: O )057, . 4 xml - 東京外国語大学 Tokyo University of … Sketch 1=]j2z da f£ 9)A8C kw4p hgOn Word Sketch & .D70S $ Q & ' & Word Sketch b|vJ 1. .D70'¢t 2. Lemma Z Part of speech

�BNC'U'<D;PC=Written_Domain_Imaginative" search attribute5 lemma�

regular expression5 wh.*�minimum frequency5 1�maximum frequency5 0

Page 13: O )057, . 4 xml - 東京外国語大学 Tokyo University of … Sketch 1=]j2z da f£ 9)A8C kw4p hgOn Word Sketch & .D70S $ Q & ' & Word Sketch b|vJ 1. .D70'¢t 2. Lemma Z Part of speech

�BNC'U'<D;PC=Written_Domain_Informative" search attribute5 word�

regular expression5 .*ing�Frequency figures5 Document counts

Page 14: O )057, . 4 xml - 東京外国語大学 Tokyo University of … Sketch 1=]j2z da f£ 9)A8C kw4p hgOn Word Sketch & .D70S $ Q & ' & Word Sketch b|vJ 1. .D70'¢t 2. Lemma Z Part of speech

�BNC_Y" search attribute5 word�use n-grams" n=4

Page 15: O )057, . 4 xml - 東京外国語大学 Tokyo University of … Sketch 1=]j2z da f£ 9)A8C kw4p hgOn Word Sketch & .D70S $ Q & ' & Word Sketch b|vJ 1. .D70'¢t 2. Lemma Z Part of speech

BNC'U'<D;PC=Written_Medium_Book" search attribute5 word�Output

type 5 Keywords & � ! Reference subcorpus 5 BNC ' < D ; P C =

Written_Medium_To-be-spoken

Page 16: O )057, . 4 xml - 東京外国語大学 Tokyo University of … Sketch 1=]j2z da f£ 9)A8C kw4p hgOn Word Sketch & .D70S $ Q & ' & Word Sketch b|vJ 1. .D70'¢t 2. Lemma Z Part of speech

BNC_Y" search attribute5 word�Regular expressions5.*ing�Output type5

Change output attributes&�! lemma, pos, lempos5�y

��'Q �#vl��u word&��2

Page 17: O )057, . 4 xml - 東京外国語大学 Tokyo University of … Sketch 1=]j2z da f£ 9)A8C kw4p hgOn Word Sketch & .D70S $ Q & ' & Word Sketch b|vJ 1. .D70'¢t 2. Lemma Z Part of speech
Page 18: O )057, . 4 xml - 東京外国語大学 Tokyo University of … Sketch 1=]j2z da f£ 9)A8C kw4p hgOn Word Sketch & .D70S $ Q & ' & Word Sketch b|vJ 1. .D70'¢t 2. Lemma Z Part of speech

SketchEngine 4. Word Sketch 2

4

Word Sketch

Word Sketch 1. 2. Lemma Part of speech 3. Show Word Sketch (Advanced options ) 4. ( 1: ):

1. 2. 3. (

)

1: make

Page 19: O )057, . 4 xml - 東京外国語大学 Tokyo University of … Sketch 1=]j2z da f£ 9)A8C kw4p hgOn Word Sketch & .D70S $ Q & ' & Word Sketch b|vJ 1. .D70'¢t 2. Lemma Z Part of speech

4. ( 1: ): Change options Word Sketch Cluster Sort by freq/ Sort by score Hide gramrels More Data 1 column Less Data 1 column

Word Sketch

make 1. British National Corpus 2. Lemma ”make” Part of speech ”verb” 3. 1 4. “np adj comp” (50.90) ( 2 )

make ”make+O+C”

2: make

* “Less Data”

Page 20: O )057, . 4 xml - 東京外国語大学 Tokyo University of … Sketch 1=]j2z da f£ 9)A8C kw4p hgOn Word Sketch & .D70S $ Q & ' & Word Sketch b|vJ 1. .D70'¢t 2. Lemma Z Part of speech

1

g Sketch Engine

E

Thesaurus/sketch-Diff

2 S

�� 1 c BNC ���

"love” 2POS

Page 21: O )057, . 4 xml - 東京外国語大学 Tokyo University of … Sketch 1=]j2z da f£ 9)A8C kw4p hgOn Word Sketch & .D70S $ Q & ' & Word Sketch b|vJ 1. .D70'¢t 2. Lemma Z Part of speech

2

2 love 2

W 3 2

W W 2 W

3

2 desire

Word sketch e 2Thesaurus 2 k

Sketch-Diff W 3

2 love desire c b 3

! 2 and/or 2 2

love desire

WE

Page 22: O )057, . 4 xml - 東京外国語大学 Tokyo University of … Sketch 1=]j2z da f£ 9)A8C kw4p hgOn Word Sketch & .D70S $ Q & ' & Word Sketch b|vJ 1. .D70'¢t 2. Lemma Z Part of speech

3

! love 2 desire

love satisfied

    c hn c

i t 0 5

Page 23: O )057, . 4 xml - 東京外国語大学 Tokyo University of … Sketch 1=]j2z da f£ 9)A8C kw4p hgOn Word Sketch & .D70S $ Q & ' & Word Sketch b|vJ 1. .D70'¢t 2. Lemma Z Part of speech

SketchEngine mx 7. WebBootCat

)6Q\ 2gr

VT�W� 5!;4?�]f 4`

YXI_

�WebBootCaT ���

"?*@2-/ &=@<��.%(/ zMP���'@3( Db��ky�

�WebBootCaT �UheD (S 1 On )

1. Homep��^H�819@��WebBootCaT &;-&�

2. '@3(R�~� �[�

3. Input type� Seed words/URLs����� �c(*aN����}�)�

S 1: WebBootCaT �Uhp�

�WebBootCaT �KCq�EoF (Seed words/URLs)

�” Seed words”�Web corpus Db���

1. B�� 1.~2. v��Input type� Seed words�,$-& J���

2. Seed words�l�%@>@0(3~20G) JL�

Page 24: O )057, . 4 xml - 東京外国語大学 Tokyo University of … Sketch 1=]j2z da f£ 9)A8C kw4p hgOn Word Sketch & .D70S $ Q & ' & Word Sketch b|vJ 1. .D70'¢t 2. Lemma Z Part of speech

3. Seed words 3� �:?+7�u�Q����� "?*@2-/jt���wi

�{s��(S 2On)�

4. Next &;-&�zMq�.%(/ +#?=@0�

5. OK &;-&�'@3(�Zb�

S 2: URL �A|

�”URLs”�Web corpus Db���

1. B�� 1.~2. v��Input type� URLs�,$-& J���

2. URLs�l�d[�� URL JL(S 3On)�

3. Next�OK�'@3(�Zb(S 4On)�

Page 25: O )057, . 4 xml - 東京外国語大学 Tokyo University of … Sketch 1=]j2z da f£ 9)A8C kw4p hgOn Word Sketch & .D70S $ Q & ' & Word Sketch b|vJ 1. .D70'¢t 2. Lemma Z Part of speech

S 3: URL �d[

S 4: Zb��'@3(