c 3020611
DESCRIPTION
Nguyen huu vinhTRANSCRIPT
3
Chng 3
M HNH V GII PHPChng 3 trnh by cch xy dng Ontology mc quan nim, trnh by cch lu tr Ontology, xut cch thc t chc lu tr c ng ngha. c bit l cch tm kim theo hng c ng ngha.
3.1. M hnh Ontology SPN-KOS
M hnh SPN-KOS (Software-Programming-Network Keyphrase Ontology System) gm cc thnh phn sau (E, C, R)
E tp hp cc yu t (phn t) m n chnh l keyphrase.
C tp hp cc lp v lnh vc tin hc.
R tp hp cc mi quan h trn cc phn t.
Keyphrase v l do dng Keyphrase cho m hnh SPN-KOS:
Mt keyphrase l mt t hoc nhiu t (v d: Tr tu nhn to, ngn ng lp trnh, trnh bin dch, qun tr mng.) trong keyword l mt t (v d: computer) trong lun vn u xem l Keyphrase.
Cho kh nng m t mc cao v ni dung ti liu d dng cho ngi c bit ti liu c lin quan vi ni dung h cn tm hiu hay khng ?
Cho mt tm tt c ng v ti liu.3.1.1. Din gii keyphrase trong tp EGi E l tp hp cha ton b cc keyphrase m t v phm vi PHN MM, LP TRNH, MNG MY TNH, cc keyphrase ny c rt trch th cng trong cc ti liu v CNTT ting Vit, da theo trang web www.dmoz.org, dir.yahoo.com, t in my tnh ti www.webopedia.com , rt trch hn 7600 keyphrases cho 3 lnh vc. Ta phn tch E thnh cc phn nh sau:
Cc lp, trong mi mt lp l tp con ca E v tp cc lp cha trong P (E). Mi lp cha mt tp hp cc keyphrase.
Trong E tn ti mi quan h gia cc keyphrase.3.1.2. Din gii cho s phn lp
Vic phn loi lp trong chuyn ngnh i hi cn c chuyn gia v lnh vc . V vy vic phn loi lp bng th cng da vo cc trang web www.dmoz.org/Computers/, dir.yahoo.com/Computers_and_Internet/, v trang web v t in my tnh www.webopedia.com( Ta c 66 lp dng cha phn cp
Bng 3.1: Cc lp dng cha phn cpT 66 lp cha phn cp ta s xy dng cy phn cp th mc theo cc 2 bc
Bc 1: giai on ny ta xy dng 3 lp chnh cho SPN-KOS nh sau:
P (E) c cc tp con l : C1 = PHN MM, C2 = LP TRNH, C3 = MNG MY TNH. Ta xy dng cy phn cp cho SPN-KOS
Hnh 3.1: Tng quan v SPN-KOS v cc lp ca nNh vy c bn s c 3 lp (tp hp) cha trong P (E). K hiu ton hc nh sau:
(i=1, 2, 3 : Ci ( E, v P (E) ( {C1, C2, C3}
Bc 2: T 3 lp phn cp ban u ta tip tc xy dng 44 lp tip theo da vo d liu cha-con, Sau ta li tip tc xy dng c 19 lp mi v n l cp con ca ca 44 lp va xy dng.
Hnh 3.2: M hnh cy phn cp SPN-KOS
V d v cch xy dng lp LP TRNH v cc lp lin quan nh sau
Lp ny c 20 lp con : (LP TRNH GAME, LP TRNH HNG I TNG, BIN DCH-NG GI V LIN KT, ELECTRONIC TECHNOLOGY, BIU THC, CC NGN NG LP TRNH, CC TON T V CC PHP TON, CU TRC D LIU, NH A CH, GIAO DIN LP TRNH NG DNG, NGH NGHIP, PHT TRIN WEB, PHNG PHP LP TRNH HNG KHA CNH, TH TC-HM-ON CHNG TRNH, CC NGN NG LP TRNH OSRC, HTML, XML, TRNH DUYT, CC LP C S) => nh vy s c 20 lp con s c to, nu lp no trng vi lp c th khng to. Cho nn ta xt tip cc lp con ca lp LP TRNH nh sau: trong 20 lp con ca LP TRNH, th c mt s lp ng thi va l lp con ca LP TRNH v cng l lp con ca lp khc.
XML LP TRNH, PHT TRIN WEB => XML va l con ca LP TRNH v l con ca PHT TRIN WEB
TRNH DUYT HTML , PHN MM=> TRNH DUYT va l con ca HTML v PHN MM
.
=> Nh th trong qu trnh xt duyt 3 lp c bn ta to c thm 20 lp. Tip tc xt cc lp con ca 44 lp v to tip 19 lp cng theo tng tng t nh sau
CHUN MNG ETHERNET => lp CHUN MNG c lp con l ETHERNET nhng lp con ny c to trong s 19 lp nn khng to na.
C S D LIU (C S D LIU OSRC, CC CU TRUY VN ) => lp C S D LIU c lp con l OSRC C S D LIU OSRC, CC CU TRUY VN trong lp OSRC DATABASES c to nn khng to, cn lp CC CU TRUY VN cha c mt trong s 41 lp cho nn to lp ny.
HAPTICS GIAO DIN NGI DNG => lp HAPTICS c lp con l GIAO DIN NGI DNG v lp ny khng c trong 169 nn to thm lp ny.
GIAO THC INTERNET GIAO THC MNG => lp GIAO THC MNG c lp con l GIAO THC INTERNET v lp ny cha c to trong s 19 lp cho nn ta cn to thm.Ghi ch: mi lp c hnh thnh c cha tp cc keyphrase m hnh cho mt phm vi c th. So vi tp cha ca n.
Nh th ta c
Tp s gm 66 lp l tng s lp theo thng k trn, mi lp s cha mt s keyphrase trong E, k hiu Ci = { i1, i2, in } , v theo thng k ta c c 76 mi quan h is-a gia 66 lp. V ta c vi n = 66 lp.
3.1.3. Din gii mi quan h ng ngha trong tp ETrong SPN-KOS cn c mt tp E gm cc yu t hoc phn t (mi phn t l mt keyphrase) cng vi cc mi quan h ng ngha gia cc Keyphrase, mi quan h ng ngha gia cc Keyphrase th cn phi da vo nh ngha bng ngn ng t nhin cung cp ti www.webopedia.com v ta phi to mt cch th cng cc mi quan h ng ngha ny, hin ti trong SPN-KOS cp n cc mi quan h ng ngha l cng lp, ng ngha (synonym), gn ngha (near-synonym), vit tt (acronym), broader, narrower, extension, related. Gia E v C cng c mi lin h vi nhau chnh l mi tp con trong E c th cha trong tp con ca C cho tp con m hnh ha v phm vi con trong C.
V d: mt danh sch cc keyphrase thuc tp con LP TRNH ni ln rng tp con ny v phm vi lp trnh v LP TRNH thuc phm vi ln hn l tin hc.3.1.4. Quan h gia cc Keyphrases trong ETa c tp E ( ( , mt quan h 2 ngi trn E l mt tp hp con R ca E2. Cho 2 phn t x v y ca E, ta ni x c quan h R vi y khi v ch khi (x,y) R, v vit l x R y. Nh vy: x R y ( (x,y) R. Khi x khng c quan h R vi y, ta vit: . Cc quan h gia cc keyphrase trong E bao gm:
Quan h cng lp gia cc keyphrase : ta ni keyphrase a c quan h cng lp (r1) vi keyphrase b nu c mt Ci sao cho a ( Ci v b ( Ci
Quan h broader narrower gia cc keyphrase : ta ni keyphrase a c quan h broader (r2) vi keyphrase b nu a c ngha bao hm b, hoc c th hiu a c ngha rng hn so vi b trong mt ng cnh ang xt v ngc li ta ni b c quan h narrower ()so vi a. V ng cnh nh th no th cn da theo nh ngha c cho ti WEBOPEDIA phn tch v thit lp
V d xt nh ngha ca keyphrase AppleTalk Address Resolution Protocol nh sau
Short for AppleTalk Address Resolution Protocol, a protocol for mapping a devices physical hardware address to a temporary Appletalk network-assigned address in Macintosh computer LANs. When a protocol stack sends a data packet, the protocol address specifies the destination. The data link layer relies on AARP to translate the protocol address into the hardware address of the destination node.
Nh th trong qu trnh c v xt duyt th cng ta c th to nn mi quan h broader narrower gia cc keyphrase.
Bng 3.2: Keyphrase c broader l keyphrase giao thc
Theo bng trn c th hiu cc keyphrase ct KeyphraseName l narrower so vi keyphrase giao thcMi quan h is-a gia cc lp c th m t thnh mi quan h broader narrower gia cc keyphrase nu keyphrase c tn ging vi tn lp v n c cha trong lp .
V d: ta c mi quan h gia lp NGN NG LP, ng thi ta cng c keyphrase ngn ng lp trnh (programming language) ( NGN NG LP TRNH. Nh vy keyphrase c tn ging tn lp s mi quan h broader narrower gia chng. ng thi cn c thm nh ngha bng ngn ng t nhin xc nh thm keyphrase ngn ng lp trnh c mi quan h narrower vi cc keyphrase l BASIC, C, C++, COBOL, FORTRAN, Ada, Pascal da theo nh ngha ca ng lp trnh c cho ti webopedia nh sau
A vocabulary and set of grammatical rules for instructing a computer to perform specific tasks. The term programming language usually refers to high-level languages, such as BASIC, C, C++, COBOL, FORTRAN, Ada, and Pascal. Each language is a unique set of keywords (words that it understands) and a special syntax for organizing program instructions ..
Nh vy mi quan h broader narrower ca cc keyphrase c c do xut pht t mi quan h gia cc lp v ta k hiu nh sau:
a broader b ( class(a) class(b)a narrower b ( class(a) class(b)Trong : class(a) l lp cha keyphrase a class(b) l lp cha keyphrase b. Quy c class(x) l lp cha keyphrase x.
Quan h ng ngha (synonym) gia cc keyphrase : ta ni keyphrase a c quan h ng ngha hoc dng rt gn (r3) vi keyphrase b nu trong mt ng cnh ang xt ta u c hai keyphrase cng ngha.
V d xt nh ngha ca keyphrase packet c cho nh sau
A piece of a message transmitted over a packet-switching network. See under packet switching. One of the key features of a packet is that it contains the destination address in addition to the data. In IP networks, packets are often called datagrams.
nh vy keyphrase packet c quan h ng ngha r3 vi keyphrase datagram trong ng cnh computer network.
Quan h gn ngha (near-synonym) gia cc keyphrase : ta ni keyphrase a c gn ngha (r4) vi keyphrase b nu trong mt ng cnh ang xt ta u c hai keyphrase c cng ngha, hoc ngha ca chng gn ging nhau
V d xt nh ngha ca keyphrase m truy cp (access code) trong c cu nh ngha.
Same as password, a series of characters and numbers that enables a user to access a computer.
nh vy keyphrase m truy cp (access code) c keyphrase gn ngha l mt khu (password).
Hay xt nh ngha ca keyphrase ActiveX
ActiveX is a loosely defined set of technologies developed by Microsoft in 1996 for sharing information among different applications. ActiveX is an outgrowth of two other Microsoft technologies called OLE (Object Linking and Embedding) and COM (Component Object Model). As a moniker, ActiveX can be very confusing because it applies to a whole set of COM-based technologies. Most people, however, think only of ActiveX controls, which represent a specific way of implementing ActiveX technologies. Many Microsoft Windows applications use ActiveX controls.
nh vy keyphrase ActiveX c keyphrase gn ngha l COM v keyphrase OLE. Quan h extension gia cc keyphrase : ta ni keyphrase a c quan h extension (r5) vi keyphrase b nu keyphrase a c cha cc ngha nh b v a cn c thm thng b sung cho n, hoc ngha ca keyphrase a nng cp t keyphrase b, hoc ngha ca keyphrase a thay th ngha ca keyphrase b, hoc keyphrase a l dn xut t keyphrase b v ngha b sung ng cnh, hoc keyphrase a l da trn keyphrase b. thit lp c mi quan h ny th cn da theo cc nh ngha bng ngn ng t nhin c cung cp t WEBOPEDIA.
V d xt nh ngha v C++ c cho ti WEBOPEDIA nh sau:
A high-level programming language developed by Bjarne Stroustrup at Bell Labs. C++ adds object-oriented features to its predecessor, C. C++ is one of the most popular programming language for graphical applications, such as those that run in Windows and Macintosh environments.
Da theo mt phn ca nh ngha cho ta c th thit lp gia keyphrase C v keyphrase C++ l C++ l extension ca C
S c lin h related gia cc keyphrase: ta ni keyphrase a c lin h related (r6) vi keyphrase b nu c lit k ti web. S c lin h ny mang tnh ph qut m n c th bao hm cc mi quan h cp trn, hoc n va c ngha trong quan h ny v c ngha trong quan h khc, do WEBOPEDIA cng cung cp sn cc keyphrase c lin h vi keyphrase m ta ang xt.
V d xt keyphrase database c cc keyphrase lin quan c lit k c th ti WEBOPEDIA nh sau: active archiving, attribute, connection pool, DAM, data mining, data warehouse, database management system, distributed database, DML, drill down, DSN, dynaset, EDGAR, ETL, field, file, hypertext, metadata, ODS, OLAP, RDBMS, record, replication.Ghi ch: xc nh cc mi quan h r2, r3, r4, r5 ta u da theo nh ngha bng ngn ng t nhin c cung cp ti WEBOPEDIA, trong qu trnh to th cng cn xem xt cc nh ngha lin quan vi mt keyphrase ch khng phi ch xt trn trong phm vi nh ngha ca keyphrase c cho. Cn mi quan h r6 c WEBOPEDIA lin kt km theo cc nh ngha.
Xt thm mt vi nh ngha v phn tch n
Xt nh ngha ca analog television nh sau
Preceding digital television (DTV), all televisions encoded pictures as an analog signal by varying signal voltage and radio frequencies. DTV is fast replacing analog TVs as digital broadcasting enables broadcasters to offer television with movie-quality picture and sound. Analog systems are more commonly known as NTSC systems.A U.S. Senate panel is set an April 7, 2009, as the deadline for television stations to switch entirely from analog to digital broadcasts. Analog televisions will work until all analog broadcasting ceases. Once the transition to complete DTV is taken place, a converter will be required to receive DTV signals and change them to the analog format of these older types of televisions. However, these DTV-to-analog converters will not produce true DTV quality.
Analog televisions are now commonly referred to conventional televisions.
Trong nh ngha ca analog television c ghi nh sau Preceding digital television => iu ny cho bit analog television c extension ca n l digital television hay c th ni rng digital television l extendFrom analog television khi ta c thm mi quan h extension gia analog television v digital television. Ngoi ra, phn cui ca nh ngha analog television c cp Analog televisions are now commonly referred to conventional televisions => t y ta cng c th suy lun rng analog television ng ngha vi conventional televion nh vy ta c th mi quan h synonym gia analog television vi conventional televion Xt nh ngha crimeware nh sau
A type of malicious software that is designed to commit crimes on the Internet. Crimeware may be a virus, spyware or other deceptive piece of software that can be used to commit identity theft and fraud.
Theo nh ngha v phn tch ta nhn thy crimeware c broader l malicious software , crimeware c near-synonyms l virus, spyware.
Xt nh ngha DAML nh sau
Short for DARPA Agent Markup Language, DAML is a semantic markup language that is specifically an extension to XML and the Resource Description Framework (RDF). DAML is used for the U.S. Defense Advanced Research Project Agency (DARPA) and compared to the XML standard it offers a better capacity to express semantics (describing objects and the relationships between objects), which means a much higher level of interoperability between Web sites.
Ta c broader ca DAML l semantic markup language v DAML extension t XML v RDFGhi ch: Gia quan h broader v quan h near-synonym ty thuc vo ng cnh v nh ngha c cho m n s c nhng trng hp broader v near-synonym ging nhau v cng c trng hp l khc nhau.
Xt nh ngha ca keyphrase multidimensional DBMS nh sau
A database management system (DBMS) organized around groups of records that share a common field value. Multidimensional databases are often generated from relational databases. Whereas relational databases make it easy to work with individual records, multidimensional databases are designed for analyzing large groups of records.
The term OLAP (On-Line Analytical Processing) is become almost synonymous with multidimensional databases, whereas OLTP (On-Line Transaction Processing) generally refers to relational DBMSs.
T nh ngha ta rt ra c multidimensional DBMS gn ngha vi OLAP
multidimensional DBMS l extension ca relational databases multidimensional DBMS c broader l database management systemGhi ch: c nhng trng hp sau khi phn tch mi quan h gia hai keyphrase th c th xem nh ng ngha hoc gn ngha. V mi quan h gn ngha l s bao hm ca mi quan h ng ngha, c nhng mi quan h cng c cung cp sn ti WEBOPEDIA.
Chng hn ta xt thm nh ngha tool c cho c th nh sau
A program that performs a very specific task.
Synonymous with utility.Similar to application.
Nh vy da theo nh ngha ta c th xc nh c mi quan h gia program v utility l ng ngha (synonym) v mi quan h gia program v application l gn ngha (near-synonym).
Quy nh v k hiu cho Lp (Class) v Keyphrase s dng: theo cc phn loi ti trang WEBOPEDIA v phn nh phm vi tin hc, ta s quy nh tng ng cho cc lp (Class) trong SPN-KOS nhng s c vit di dng ton b l ch hoa (v d: NGN NG LP TRNH,) v cc Keyphrase c th c trng hp Keyphrase dng ton b l ch hoa nu keyphrase l acronym ca mt Keyphrase khc (v d: keyphrase Ngn ng nh du siu vn bn c acronym l HTML) hoc k t u ca Keyphrase c th l ch hoa.3.2. Cch thc lu tr Ontology SPN-KOSCch thc t chc lu tr cho SPN-KOS chnh l cch thc lu tr thng tin cc lp, keyphrase, v mi quan h gia chng trn a nh th no. Cch thc lu tr Ontology dng d liu th c m t nh sau:
Mi quan h is-a gia cc lp s c lu tr trong c s d liu ACCESS (hoc SQL Server) nh sau: ta xt 66 lp chnh, c mi hai lp nu c quan h is-a vi nhau ta s lu thnh mt dng trong mt bng (table) gm 2 ct c tn ClassParentName, ClassChildName , v ni dung ca 2 d liu ca dng l cha tn hai lp c mi quan h is-a vi nhau. Khi chn thm d liu d liu vo bng s tng ng vic to mi quan h cho 2 lp, v tng s dng c c t bng gm 76 dng tng ng vi 76 mi quan h is-a.
Tng t ta cng lu tr thng tin v mi quan h ng ngha ca Keyphrase nh Acronym, Synonym, Near-Synonym, Broader, Narrower, Extension v Related di dng cc table gm 2 ct, nhng i vi mi quan h Acronym v Synonym th lu trong cng mt bng, v ngha ca chng c th c xem l tng t nhau. Cn cc mi quan h khc ca keyphrase nh
Boader : lu thnh mt bng hai ct KeyphraseName, Broader.
Narrower : lu thnh mt bng hai ct KeyphraseName, Narrower.
Extension : lu thnh mt bng hai ct KeyphraseName, Extension.
Related : lu thnh mt bng hai ct l KeyphraseName, Related.Ghi ch: Related th hin s c lin quan n Keyphrase cho nn d liu ca n c th s bao hm hu ht cc mi quan h khc.Nh th ta c: Bng d liu vi 76 dng v 2 ct tng ng vi 76 mi quan h gia mt cp lp v c tng s l 76 mi quan h gia tt c cc lp.3.2.1. Cch lu tr lp (Class)
Bng 3.3: Lp cha v lp con
Theo bng trn ta c 2 ct ClassParentName (lp cha) v ClassChildName (lp con), ta ni ClassParentName c is-a l ClassChildName. Ta khng ni ClassChildName c is-a l ClassParentName.V d: LP TRNH c is-a l CU TRC D LIU => iu ny cng cho bit rng LP TRNH l cha ng CU TRC D LIU, v nu khng tn ti LP TRNH th cng khng tn ti CU TRC D LIU. Nhng nu mt lp khc c is-a l CU TRC D LIU th CU TRC D LIU vn tn ti d rng khng tn ti LP TRNH.
Lu thut ton to mi quan h is-a gia cc lp theo bng d liu
u vo: Bng d liu gm 2 ct ClassParentName, ClassChildNameu ra: Danh sch cc lp c to mi quan h is-a gia chng
Lu :
Hnh 3.3: Lu to quan h is-a gia cc lp
3.2.2. Keyphrases v lp (Class) cha Keyphrasesy chnh l mi quan h cng lp gia cc keyphrase, v sau khi ta gn mi keyphrase cho mt lp tng ng th khi nu xt trong mt lp ta s bit c trong lp s cha keyphrase no, nh th cng bit c cc keyphrase cng lp v y cng chnh l mi quan h (r1) nh cp SPN-KOS mc quan nim
Bng 3.4: Keyphrase v lp cha KeyphraseGia Keyphrase v mt keyphrase c mi quan h cng lp (r1) vi n ta lu thnh mt bng gm hai ct tng ng nh mi quan h 2 ngi : trong keyphrase nm trong ct KeyphraseName v ct inClass s cho bit lp cha Keyphrase
Lu thut ton to mi quan h (r1) t bng d liu trn
u vo: Bng d liu c > 7600 dng gm 2 ct KeyphraseName, inClassu ra: Danh sch cc keyphrase c phn nh cho cc lp
Lu :
Hnh 3.4: Lu gn Keyphrase vo lp3.2.3. Keyphrases c quan h Broader, Narrower
Bng 3.5: Keyphrase v quan h Broader ca KeyphrasesGia Keyphrase v mt keyphrase c mi quan h Broader (r2) vi n ta lu thnh mt bng gm hai ct : trong keyphrase nm trong ct KeyphraseName v keyphrase c mi quan h Broader s nm trong ct Broader
Lu thut ton to mi quan h (r2)
u vo: Bng d liu c hn 2700 dng v gm 2 ct KeyphraseName, Broaderu ra: Danh sch cc keyphrase v cc keyphrase c quan h Broader vi n
Lu :
Hnh 3.5 : Lu to mi quan h Broader cho Keyphrases
Bng 3.6: Keyphrases v quan h Narrower ca KeyphrasesGia Keyphrase v mt keyphrase c mi quan h Narrower () vi n ta lu thnh mt bng gm hai ct : trong keyphrase nm trong ct KeyphraseName v keyphrase c mi quan h Narrower s nm trong ct Narrower
Lu thut ton to mi quan h ()
u vo: Bng d liu c hn 450 dng,v c 2 ct KeyphraseName, Narroweru ra: Danh sch cc keyphrase v cc keyphrase c quan h Narrower vi n
Lu :
Hnh 3.6: Lu to mi quan h Narrower cho Keyphrase3.2.4. Keyphrases c quan h Acronym, Synonym
Bng 3.7: Keyphrase v quan h Acronym, Synonym ca KeyphrasesGia mt keyphrase v keyphrase khc c mi quan h ng ngha (synonym) hoc vit tt (acronym) (r3): ta quy nh synonym v acronym l nh nhau, cho nn theo bng k bn ta lu cc keyphrase nm trong ct KeyphraseName s c Synonym, Acronym nm trong ct Acro_Synonym
Thut ton to mi quan h (r3) bng d liu trn
u vo: Bng d liu gm hn 2700 dng d liu v 2 ct KeyphraseName, Acro_Synonymu ra: Danh sch keyphrase km vi keyphrase c mi quan h l Acronym, Synonym
Lu :
Hnh 3.7: Lu to mi quan h Acronym, Synonym cho Keyphrases
Bng 3.8: Keyphrase v quan h Near-Synonym ca KeyphrasesGia mt keyphrase v keyphrase khc c mi quan h gn ngha (Near-Synonym) (r4) : ta lu tr keyphrase nm trong ct KeyphraseName v cc keyphrase gn ngha vi n nm trong ct Near-Synonym
Thut ton to mi quan h (r4) bng d liu trn
u vo: Bng d liu gm hn 1500 dng d liu v 2 ct KeyphraseName, NearSynonymu ra: Danh sch keyphrase km vi keyphrase c mi quan h l Near-Synonym
Lu :
Hnh 3.8: Lu to mi quan h Near-Synonym cho Keyphrases3.2.5. Keyphrases c quan h Extension
Bng 3.9: Keyphrase v quan h Extension ca KeyphraseGia mt keyphrase v keyphrase c mi quan h Extension (r5) ta cng lu thnh mt bng gm hai ct : trong keyphrase ct KeyphraseName s c Extension l keyphrase nm ct Extension
Thut ton to mi quan h (r5) bng d liu trn
u vo: Bng d liu gm 77 dng d liu v 2 ct KeyphraseName, Extensionu ra: Danh sch cc keyphrase km vi keyphrase c mi quan h l Extension
Lu :
Hnh 3.9: Lu to mi quan h Extension cho Keyphrases3.2.6. Keyphrases c lin h
Bng 3.10: Keyphrase v cc keyphrase c lin quan vi nMt keyphrase c th lin h vi nhiu keyphrase khc, v tt c cc keyphrase c lin h n mt keyphrase ang xt u c lit k ti trang web, v d liu th hin s c lin h Related (r6) c lu trong mt bng 2 ct nh hnh bn: trong keyphrase ct KeyphraseName s c lin h vi keyphrase nm trong ct Related
Thut ton to mi quan h (r6) bng d liu trn
u vo: Bng d liu gm hn 21000 dng d liu v 2 ct KeyphraseName, Relatedu ra: Danh sch cc keyphrase km vi keyphrase c lin quan
Lu :
Hnh 3.10: Lu gn cc Keyphrase c lin h n mt Keyphrases
Nh cp trong m hnh SPN-KOS mc quan nim ta c cc mi quan h c xy dng da theo cc nh ngha tng ng vi mt Keyphrase cho ti trang web v cc ti liu cng ngh thng tin ting Vit. Ngoi cc mi quan h m ta xy dng nh trn, v trong qu trnh phn tch nh ngha cho v mi quan h ng ngha gia cc keyphrase th s pht sinh thm cc keyphrase khc ngoi tng s hn 6000 keyphrase trn, cc keyphrase c c trch th cng t nh ngha cung cp. V th sau khi xt duyt th th c thm cc keyphrase mi t mi quan h broader, narrower, extension, related. V cc keyphrase mi pht sinh ny ta xem nh cng lp vi keyphrase hin ti. Cn i vi mi quan h acronym, synonym, near-synonym th ta ta cng khng xem xt liu c keyphrase mi pht sinh hay khng v i vi cc mi quan h ny ta xem nh ging nhau. Xy dng m hnh v cch thc lu tr, tm kim theo ng ngha
T kho d liu thu thp c, ta tin hnh phn tch cu trc, xc nh ni dung ti liu lm c s xy dng m hnh v t chc lu tr, tm kim theo hng kt hp.3.2.7. Xy dng m hnh lu tr mc quan nim thc hin mc tiu tm kim ti liu theo m hnh thc t, ta xy dng m hnh kt hp thu thp ti liu, t chc lu tr, suy din tr thc mc quan nim sau:
Hnh 3.11 : M hnh mc quan nimT m hnh lu tr mc quan nim, ta tin hnh phn tch h thng ti liu v xy dng m hnh tng qut qun l, lu tr ti liu v tm kim theo ng ngha Kho ti liu thu thp c ( M hnh cy th mc quy chun ( M hnh qun l theo ng ngha.
Trc tin ta thu thp ti liu v cng ngh thng tin ting Vit t nhiu ngun khc nhau, sau ta t chc lu tr theo cy th mc quy chun. Mi ti liu s c lu tr theo cy th mc quy chun lnh vc tng ng vi ti liu . Cui cng da vo h tr thc SPN-KOS suy din ng ngha cho tng ti liu tng ng.3.2.8. T chc lu tr theo cy th mc
Cc th mc dng lu tr cc tp tin c t chc theo m hnh SPN-KOS, l gm 66 th mc chnh, trong ta to mi quan h gia cc th mc vi nhau dng lin kt nh sauHnh 3.12 : T chc lu tr cc th mcTheo hnh trn ta c bucket D c cha l A v C, cn bucket B c cha l C, ng thi cng bit rng bucket A v C khng c cha, nh th cng bit c rng bucket A v C chnh l mt trong s 3 lp ban u ca SPN-KOS (xem hnh 3.1).3.2.9. T chc lu tr, tm kim theo theo ngh ngha.T m hnh mc quan nim (Hnh 3.11), ta xy dng m hnh c s d liu phc v cho vic lu tr v tm kim theo ng ngha.
Hnh 3.13: M hnh c s lu tr v tm kim theo ng nghaMt h thng tm kim thng tin theo ng ngha c hai chc nng chnh lp ch mc v tra cu. Qu trnh lp ch mc v tra cu c chia lm 2 giai on: xc lp gii php, t chc kho ti liu c ng ngha v Thit k x l tm kim.3.2.9.1. Thit k gii php v t chc kho ti liu c ng ngha
giai on ny, quy trnh thc hin gm 04 Bc, c th:
Bc 1: Rt trch keyphrase Kt hp vi m hnh qun l ti liu theo c s d liu quan h, ta tin hnh rt trch cc keyphrases ca ti liu bng cch th cng. Kt qu tr v l danh sch cc keyphrases tng ng vi tng ti liu c th.
Bc 2: pht sinh thm cc KeyphrasesK tip dng SPN-KOS pht sinh thm cc Keyphrase t cc mi quan h ng ngha ( Acronym, Synonym, Near-Synonym, Broader, Narrower, Extenstion, Related ) da trn cc Keyphrases c. Nh th th s keyphrases gia tng u ny cng c th khng nh rng tp tin m ta trch ra cc keyphrases c biu din ng ngha nhiu hn.
Bc 3: T chc lu tr tp tin vo th mcNgoi ra, ta cng bit thm rng lp cha cc Keyphrase v khi cn xem lp cha ca lp th ta c th bit thng qua cu trc t chc th mc nh trn. Cui cng, ta s xt xem lp no m s Keyphrase bao hm hu ht cc Keyphrase ca tp tin ang xt th ta lu tp tin vo th mc tng ng.V d: ta c ti liu A gm cc Keyphrase C v hp ng sau ta dng Ontology SPN-KOS suy lun cc mi quan h ng ngha c lin quan n Keyphrases ny ta s c thm cc Keyphrases nh sau: ngn ng my, ngn ng kt hp, ngn ng assembly, KLOC, APL, C cng cng, ngn ng lp trnh, C sharp, Eiffel, Ngn ng cp cao, Ngn ng mc cao, Rexx, Visual C cng cng. C tng cng 15 Keyphrases bao gm Keyphrase bn trong ti liu A v Keyphrases suy din t SPN-KOS.KeyphraseLp cha Keyphrase
CCC NGN NG LP TRNH
hp ngCC NGN NG LP TRNH, CNG C LP TRNH
KLOCCC NGN NG LP TRNH, LP TRNH
ngn ng assemblyCC NGN NG LP TRNH
ngn ng kt hpCC NGN NG LP TRNH
ngn ng myCC NGN NG LP TRNH
APLCC NGN NG LP TRNH
C cng cngCC NGN NG LP TRNH
C sharpCC NGN NG LP TRNH
EiffelCC NGN NG LP TRNH
Ngn ng cp caoCC NGN NG LP TRNH
Ngn ng mc caoCC NGN NG LP TRNH
Ngn ng lp trnhCC NGN NG LP TRNH
RexxCC NGN NG LP TRNH
Visual C cng cngCC NGN NG LP TRNH
Tng s lp lin quan vi 15 Keyphrase l 3 lp:
CC NGN NG LP TRNH, CNG C LP TRNH, LP TRNH
Trong lp CC NGN NG LP TRNH cha 15 keyphrases trn, lp LP TRNH cha 1/15 keyphrases, lp CNG C LP TRNH cha 1/15 keyphrases ( lp CC NGN NG LP TRNH cha hu ht cc Keyphrases v th ta lu tp tin ban u (tp tin trc khi tin x l) vo th mc CC NGN NG LP TRNH.
Lu thut ton lu tr tp tin vo th mcu vo: tp tin d liu dng PDF, HTML, DOC, TXT
u ra: tp tin c lu vo trong th mc tng ng. Dng Ontology SPN-KOS suy din ra cc mi quan h ng ngha ca cc keyphrases.
Hnh 3.14: Lu rt trch v suy din ng ngha keyphrase(Kt qu l cc ti liu c lu vo th mc tng ng cng keyphrases c quan h ng ngha. Bc 4: Xy dng ch mc
u vo: Tp cc keyphrases v cc quan h Acronym, Synonym, Near-Synonym, Broader, Narrower, Extenstion, Related tng ng vi mi ti liu u ra : Danh sch cc tp tin ng vi tng keyphrase cng vi mc u tin (SCORE) (Da vo Lucene mc lc C).Thnh phn trong Indexing:
- Dictionary: Lu cc danh sch keyphrases khc nhau c to suy din ng ngha trong tng ti liu.
-DSTapTin : danh sch m ti liu th i (MA_TAI_LIEU)-DSVitri: (keyphrases; DSTapTin)
( To thnh indexing (Ontology trong lnh vc ang xt)
Lu :
Hnh 3.15: Lu lp ch mc cho ti liu3.2.9.2. Qui trnh tm kim theo ng ngha giai on ny, quy trnh c th gm 3 bc sau:
Bc 1: X l rt trch keyphrases t ngi dng nhp vo giai on ny ta rt trch cc keyphrases c ngha do ngi dng nhp vo (kt qu danh sch keyphrases cha c quan h) da vo kho t inu vo: L cu truy vn do ngi dng nhp vo
u ra: Danh sch cc keyphrases rt trch c t ngi dng.
Hnh 3.16: Lu rt trch keyphrase t cu truy vn ca ngi s dng Bc 2: Tm mi quan h ng ngha.
giai on ny ta xy dng mng ng ngha da vo Ontology SPN-KOS (Acronym, Synonym, Synonym, Broader, Narrower, Extenstion, Related) ca cc keyphrases c rt trch bc 1 V d: bc ngi dng nhp vo cu Tm kim ti liu lp trnh C. Da vo t in ta rt ra keyphrase c ngha l C, k tip da vo Ontology ta suy din ngha ca keyphrase C. Ta c mng ng ngha sau:
Hnh 3.17 : th ng ngha ca kephrase C Bc 3: Tm kim vn bn c cha cc keyphrasesT danh sch kt qu do ngi dng nhp vo v c biu din ng ngha bc 2 v da vo INVERTED_INDEX (Kt qu bc 4 mc 3.3.4.1), ta tin hnh tm kim ni dung ti liu c cha cc keyphrases do ngi dng nhp vo v xp hng keyphrase c u tin ln (SCORE) nht tr v.
( Kt qu tr v l danh sch cc ti liu c cha cc keyphrases do ngi dng nhp vo.3.2.10. M hnh t chc lu tr tng th, tm kim ti liu theo ngh ngha
M hnh tng th qun l, tm kim h thng ti liu theo hng c
ng ngha c thc hin 2 giai on l xy dng ti liu theo hng c ng ngha v giai on x l tm kim theo ng ngha.
1. Lu lu tr ti liu theo hng ng ngha.
Hnh 3.18: M hnh qun l ti liu2. Lu x l tm kim ti liu theo hng c ng ngha
Hnh 3.19 : Lu tm kim theo ng ngha
4. dfgdf
5. dfgdfg
6. dfg
MNG MY TNH
LP TRNH
PHN MM
SPN-KOS
CS
b
a
e
d
c
Nu hai lp c mi quan h vi nhau th s lu tr thnh mt dng trong c s d liu, v th tng s dng th hin l 76, cng tng ng vi 76 mi quan h is-a gia cc lp
Bt u
Dng
Ht d liu ct ClassParentName
To mi quan h has-a cho ClassChildName v ClassParentName
Tng ng mi mu d liu ca ClassChildName ta duyt tng dng d liu ca ct ClassParentName
Ht d liu ct ClassChildName
Xc nh y l mt lp
Duyt tng dng ca ct ClassChildname trong table
cn
cn
Ht
Ht
Bt u
Ht d liu ct KeyphraseName
i vi mi Keyphrase ct KeyphraseName ta xc nh xem Keyphrase ny thuc lp no trong inClass
Duyt tng dng ca ct KeyphraseName trong table
cn
Ht
Dng
ht
cn
Duyt tng dng ca ct KeyphraseName
To mi quan h Broader cho KeyphraseName theo d liu ca ct Broader
Dng
Ht d liu ct KeyphraseName
Bt u
ht
cn
Duyt tng dng ca ct KeyphraseName
To mi quan h Narrower cho KeyphraseName theo d liu ca ct Narrower
Dng
Ht d liu ct KeyphraseName
Bt u
Bt u
Ht d liu ct KeyphraseName
Dng
To mi quan h Near-Synonym cho KeyphraseName theo d liu ca ct NearSynom
Duyt tng dng ca ct KeyphraseName
cn
ht
Bt u
Ht d liu ct KeyphraseName
Dng
To mi quan h Acronym, Synonym cho KeyphraseName theo d liu ca ct Acro_Synonym
Duyt tng dng ca ct KeyphraseName
cn
ht
Bt u
Ht d liu ct KeyphraseName
Dng
To mi quan h Extension cho KeyphraseName theo d liu ca ct Extension
Duyt tng dng ca ct KeyphraseName
cn
ht
Bt u
Ht d liu ct KeyphraseName
Dng
Gn cc keyphrase c lin h vi KeyphraseName theo d liu ca ct Related
Duyt tng dng ca ct KeyphraseName
cn
ht
Kho ti liu thu thp c cn lu tr
SPN-SKOS (H TR THC ONTOLOGY)
INVERTED_INDEX
MA_TAI_LIEU
KEYPHRASE
SCORE
TAI_LIEU
MA_TAI_LIEU
KEYPHRASES
MA_NHOM_LINH_VUC
TUA_DE
MA_LOAI_TAI_LIEU
MA_TAC_GIA
MO_TA
MA_DINH_DANG
BAN_QUYEN
MA_NGON_NGU
TAP_TIN
C(A
.
D
C
B
A
C
Parent
Parent
Parent
Parent
Tng s Bucket trong bng bm l 66 tng ng s lp trong Ontology
Ta xem mi th mc tng ng vi 1 lp trong SPN-KOS v cng l mt bucket trong bng bm theo phng php ni kt trc tip, mi bucket cha thng tin cc trng gm:
Tn th mc,
Th mc cha (Parent)
Trong :
Trng Parent l mt con tr s tr n mt danh sch lin kt m ti s cho bit bucket hin ti c nt cha l nt no, v t ta s ly c th mc cha tng ng
Pascal
RLaB
Extenstion
ht
cn
Ti liu dng PDF, HTML, DOC, TXT,
Rt trch ra cc keyphrase tng ng ca ti liu
Dng
Ht d liu cn x l
Bt u
Nu khng trng
Duyt tng dng ca 2 ct KEYPHRASE v MA_TAI_LIEU trong bng KEYPHRASES
Bt u
So snh vi ct KEYPHRASE v MA_TAI_LIEU trong bng
INVERTED_INDEX
Ht d liu cn x l
Lu vo bng INVERTED_INDEX
Dng
Nu trng
Duyt dng tip theo
cn
LISP
Prolog
C++
Visual C cng cng
Ngn ng lp trnh cp cao
C
Related
Broader
Broader
Kt thc
Tp ch mc
H tri thc Ontology (SPN-KOS)
Danh sch cc keyphrases v cc keyphrases c quan h ng ngha tng ng ca tng ti liu
Kho ti liu thu thp
Cy th mc quy chun
Bt u
Related
Broader
T in
Danh sch keyphrase c ngha tch c
Ngi dng
H tr thc Ontology SPN-KOS
Ch mc c
Thc hin truy vn
Xp hng v tr v kt qu
Xy dng truy vn
D_KEYPHRASES
MA_TAI_LIEU
KEYPHRASE
Giao din tm kim
Thng bo tm khng c
Nu lp no bao hm hu ht cc keyphrase th ta s lu tp tin dng ban u vo th mc tng ng
Dng Ontology SPN-KOS suy din ra cc lp cha keyphrase
Broader
Dng Ontology SPN-KOS suy din ra cc mi quan h ng ngha ca cc keyphrase
cn
Duyt dng tip theo
Nu khng
Dng
Lu keyphrase c ngha
Ht d liu cn x l
Kim tra xem d liu trong kho t in c trong cu truy vn ngi dng nhp vo?
Bt u
Duyt tng dng trong kho t in
Nu c
T chc lu tr ti liu theo th mc c quy chun
SPN-KOS
(H tr thc Ontology)
PAGE 4
_1366132737.unknown
_1366300516.unknown
_1307097220.unknown
_1309933685.unknown
_1306913348.unknown