c 3020611

46
Chương 3 MÔ HÌNH VÀ GIẢI PHÁP Chương 3 trình bày cách xây dựng Ontology ở mức quan niệm, trình bày cách lưu trữ Ontology, đề xuất cách thức tổ chức lưu trữ có ngữ nghĩa. Đặc biệt là cách tìm kiếm theo hướng có ngữ nghĩa. 3.1. Mô hình Ontology SPN-KOS hình SPN-KOS (Software-Programming-Network Keyphrase Ontology System) gồm các thành phần sau ( E , C , R ) E tập hợp các yếu tố (phần tử) mà nó chính là keyphrase. C tập hợp các lớp về lĩnh vực tin học. R tập hợp các mối quan hệ trên các phần tử. Keyphrase và lý do dùng Keyphrase cho mô hình SPN- KOS: Một keyphrase là một từ hoặc nhiều từ (ví dụ: “Trí tuệ nhân tạo”, ”ngôn ngữ lập trình”, ”trình biên dịch”, ”quản trị mạng”.) trong đó keyword là một từ (ví dụ: computer) trong luận văn đều xem là Keyphrase. 1

Upload: vinh-nguyen-huu

Post on 18-Nov-2015

35 views

Category:

Documents


0 download

DESCRIPTION

Nguyen huu vinh

TRANSCRIPT

3

Chng 3

M HNH V GII PHPChng 3 trnh by cch xy dng Ontology mc quan nim, trnh by cch lu tr Ontology, xut cch thc t chc lu tr c ng ngha. c bit l cch tm kim theo hng c ng ngha.

3.1. M hnh Ontology SPN-KOS

M hnh SPN-KOS (Software-Programming-Network Keyphrase Ontology System) gm cc thnh phn sau (E, C, R)

E tp hp cc yu t (phn t) m n chnh l keyphrase.

C tp hp cc lp v lnh vc tin hc.

R tp hp cc mi quan h trn cc phn t.

Keyphrase v l do dng Keyphrase cho m hnh SPN-KOS:

Mt keyphrase l mt t hoc nhiu t (v d: Tr tu nhn to, ngn ng lp trnh, trnh bin dch, qun tr mng.) trong keyword l mt t (v d: computer) trong lun vn u xem l Keyphrase.

Cho kh nng m t mc cao v ni dung ti liu d dng cho ngi c bit ti liu c lin quan vi ni dung h cn tm hiu hay khng ?

Cho mt tm tt c ng v ti liu.3.1.1. Din gii keyphrase trong tp EGi E l tp hp cha ton b cc keyphrase m t v phm vi PHN MM, LP TRNH, MNG MY TNH, cc keyphrase ny c rt trch th cng trong cc ti liu v CNTT ting Vit, da theo trang web www.dmoz.org, dir.yahoo.com, t in my tnh ti www.webopedia.com , rt trch hn 7600 keyphrases cho 3 lnh vc. Ta phn tch E thnh cc phn nh sau:

Cc lp, trong mi mt lp l tp con ca E v tp cc lp cha trong P (E). Mi lp cha mt tp hp cc keyphrase.

Trong E tn ti mi quan h gia cc keyphrase.3.1.2. Din gii cho s phn lp

Vic phn loi lp trong chuyn ngnh i hi cn c chuyn gia v lnh vc . V vy vic phn loi lp bng th cng da vo cc trang web www.dmoz.org/Computers/, dir.yahoo.com/Computers_and_Internet/, v trang web v t in my tnh www.webopedia.com( Ta c 66 lp dng cha phn cp

Bng 3.1: Cc lp dng cha phn cpT 66 lp cha phn cp ta s xy dng cy phn cp th mc theo cc 2 bc

Bc 1: giai on ny ta xy dng 3 lp chnh cho SPN-KOS nh sau:

P (E) c cc tp con l : C1 = PHN MM, C2 = LP TRNH, C3 = MNG MY TNH. Ta xy dng cy phn cp cho SPN-KOS

Hnh 3.1: Tng quan v SPN-KOS v cc lp ca nNh vy c bn s c 3 lp (tp hp) cha trong P (E). K hiu ton hc nh sau:

(i=1, 2, 3 : Ci ( E, v P (E) ( {C1, C2, C3}

Bc 2: T 3 lp phn cp ban u ta tip tc xy dng 44 lp tip theo da vo d liu cha-con, Sau ta li tip tc xy dng c 19 lp mi v n l cp con ca ca 44 lp va xy dng.

Hnh 3.2: M hnh cy phn cp SPN-KOS

V d v cch xy dng lp LP TRNH v cc lp lin quan nh sau

Lp ny c 20 lp con : (LP TRNH GAME, LP TRNH HNG I TNG, BIN DCH-NG GI V LIN KT, ELECTRONIC TECHNOLOGY, BIU THC, CC NGN NG LP TRNH, CC TON T V CC PHP TON, CU TRC D LIU, NH A CH, GIAO DIN LP TRNH NG DNG, NGH NGHIP, PHT TRIN WEB, PHNG PHP LP TRNH HNG KHA CNH, TH TC-HM-ON CHNG TRNH, CC NGN NG LP TRNH OSRC, HTML, XML, TRNH DUYT, CC LP C S) => nh vy s c 20 lp con s c to, nu lp no trng vi lp c th khng to. Cho nn ta xt tip cc lp con ca lp LP TRNH nh sau: trong 20 lp con ca LP TRNH, th c mt s lp ng thi va l lp con ca LP TRNH v cng l lp con ca lp khc.

XML LP TRNH, PHT TRIN WEB => XML va l con ca LP TRNH v l con ca PHT TRIN WEB

TRNH DUYT HTML , PHN MM=> TRNH DUYT va l con ca HTML v PHN MM

.

=> Nh th trong qu trnh xt duyt 3 lp c bn ta to c thm 20 lp. Tip tc xt cc lp con ca 44 lp v to tip 19 lp cng theo tng tng t nh sau

CHUN MNG ETHERNET => lp CHUN MNG c lp con l ETHERNET nhng lp con ny c to trong s 19 lp nn khng to na.

C S D LIU (C S D LIU OSRC, CC CU TRUY VN ) => lp C S D LIU c lp con l OSRC C S D LIU OSRC, CC CU TRUY VN trong lp OSRC DATABASES c to nn khng to, cn lp CC CU TRUY VN cha c mt trong s 41 lp cho nn to lp ny.

HAPTICS GIAO DIN NGI DNG => lp HAPTICS c lp con l GIAO DIN NGI DNG v lp ny khng c trong 169 nn to thm lp ny.

GIAO THC INTERNET GIAO THC MNG => lp GIAO THC MNG c lp con l GIAO THC INTERNET v lp ny cha c to trong s 19 lp cho nn ta cn to thm.Ghi ch: mi lp c hnh thnh c cha tp cc keyphrase m hnh cho mt phm vi c th. So vi tp cha ca n.

Nh th ta c

Tp s gm 66 lp l tng s lp theo thng k trn, mi lp s cha mt s keyphrase trong E, k hiu Ci = { i1, i2, in } , v theo thng k ta c c 76 mi quan h is-a gia 66 lp. V ta c vi n = 66 lp.

3.1.3. Din gii mi quan h ng ngha trong tp ETrong SPN-KOS cn c mt tp E gm cc yu t hoc phn t (mi phn t l mt keyphrase) cng vi cc mi quan h ng ngha gia cc Keyphrase, mi quan h ng ngha gia cc Keyphrase th cn phi da vo nh ngha bng ngn ng t nhin cung cp ti www.webopedia.com v ta phi to mt cch th cng cc mi quan h ng ngha ny, hin ti trong SPN-KOS cp n cc mi quan h ng ngha l cng lp, ng ngha (synonym), gn ngha (near-synonym), vit tt (acronym), broader, narrower, extension, related. Gia E v C cng c mi lin h vi nhau chnh l mi tp con trong E c th cha trong tp con ca C cho tp con m hnh ha v phm vi con trong C.

V d: mt danh sch cc keyphrase thuc tp con LP TRNH ni ln rng tp con ny v phm vi lp trnh v LP TRNH thuc phm vi ln hn l tin hc.3.1.4. Quan h gia cc Keyphrases trong ETa c tp E ( ( , mt quan h 2 ngi trn E l mt tp hp con R ca E2. Cho 2 phn t x v y ca E, ta ni x c quan h R vi y khi v ch khi (x,y) R, v vit l x R y. Nh vy: x R y ( (x,y) R. Khi x khng c quan h R vi y, ta vit: . Cc quan h gia cc keyphrase trong E bao gm:

Quan h cng lp gia cc keyphrase : ta ni keyphrase a c quan h cng lp (r1) vi keyphrase b nu c mt Ci sao cho a ( Ci v b ( Ci

Quan h broader narrower gia cc keyphrase : ta ni keyphrase a c quan h broader (r2) vi keyphrase b nu a c ngha bao hm b, hoc c th hiu a c ngha rng hn so vi b trong mt ng cnh ang xt v ngc li ta ni b c quan h narrower ()so vi a. V ng cnh nh th no th cn da theo nh ngha c cho ti WEBOPEDIA phn tch v thit lp

V d xt nh ngha ca keyphrase AppleTalk Address Resolution Protocol nh sau

Short for AppleTalk Address Resolution Protocol, a protocol for mapping a devices physical hardware address to a temporary Appletalk network-assigned address in Macintosh computer LANs. When a protocol stack sends a data packet, the protocol address specifies the destination. The data link layer relies on AARP to translate the protocol address into the hardware address of the destination node.

Nh th trong qu trnh c v xt duyt th cng ta c th to nn mi quan h broader narrower gia cc keyphrase.

Bng 3.2: Keyphrase c broader l keyphrase giao thc

Theo bng trn c th hiu cc keyphrase ct KeyphraseName l narrower so vi keyphrase giao thcMi quan h is-a gia cc lp c th m t thnh mi quan h broader narrower gia cc keyphrase nu keyphrase c tn ging vi tn lp v n c cha trong lp .

V d: ta c mi quan h gia lp NGN NG LP, ng thi ta cng c keyphrase ngn ng lp trnh (programming language) ( NGN NG LP TRNH. Nh vy keyphrase c tn ging tn lp s mi quan h broader narrower gia chng. ng thi cn c thm nh ngha bng ngn ng t nhin xc nh thm keyphrase ngn ng lp trnh c mi quan h narrower vi cc keyphrase l BASIC, C, C++, COBOL, FORTRAN, Ada, Pascal da theo nh ngha ca ng lp trnh c cho ti webopedia nh sau

A vocabulary and set of grammatical rules for instructing a computer to perform specific tasks. The term programming language usually refers to high-level languages, such as BASIC, C, C++, COBOL, FORTRAN, Ada, and Pascal. Each language is a unique set of keywords (words that it understands) and a special syntax for organizing program instructions ..

Nh vy mi quan h broader narrower ca cc keyphrase c c do xut pht t mi quan h gia cc lp v ta k hiu nh sau:

a broader b ( class(a) class(b)a narrower b ( class(a) class(b)Trong : class(a) l lp cha keyphrase a class(b) l lp cha keyphrase b. Quy c class(x) l lp cha keyphrase x.

Quan h ng ngha (synonym) gia cc keyphrase : ta ni keyphrase a c quan h ng ngha hoc dng rt gn (r3) vi keyphrase b nu trong mt ng cnh ang xt ta u c hai keyphrase cng ngha.

V d xt nh ngha ca keyphrase packet c cho nh sau

A piece of a message transmitted over a packet-switching network. See under packet switching. One of the key features of a packet is that it contains the destination address in addition to the data. In IP networks, packets are often called datagrams.

nh vy keyphrase packet c quan h ng ngha r3 vi keyphrase datagram trong ng cnh computer network.

Quan h gn ngha (near-synonym) gia cc keyphrase : ta ni keyphrase a c gn ngha (r4) vi keyphrase b nu trong mt ng cnh ang xt ta u c hai keyphrase c cng ngha, hoc ngha ca chng gn ging nhau

V d xt nh ngha ca keyphrase m truy cp (access code) trong c cu nh ngha.

Same as password, a series of characters and numbers that enables a user to access a computer.

nh vy keyphrase m truy cp (access code) c keyphrase gn ngha l mt khu (password).

Hay xt nh ngha ca keyphrase ActiveX

ActiveX is a loosely defined set of technologies developed by Microsoft in 1996 for sharing information among different applications. ActiveX is an outgrowth of two other Microsoft technologies called OLE (Object Linking and Embedding) and COM (Component Object Model). As a moniker, ActiveX can be very confusing because it applies to a whole set of COM-based technologies. Most people, however, think only of ActiveX controls, which represent a specific way of implementing ActiveX technologies. Many Microsoft Windows applications use ActiveX controls.

nh vy keyphrase ActiveX c keyphrase gn ngha l COM v keyphrase OLE. Quan h extension gia cc keyphrase : ta ni keyphrase a c quan h extension (r5) vi keyphrase b nu keyphrase a c cha cc ngha nh b v a cn c thm thng b sung cho n, hoc ngha ca keyphrase a nng cp t keyphrase b, hoc ngha ca keyphrase a thay th ngha ca keyphrase b, hoc keyphrase a l dn xut t keyphrase b v ngha b sung ng cnh, hoc keyphrase a l da trn keyphrase b. thit lp c mi quan h ny th cn da theo cc nh ngha bng ngn ng t nhin c cung cp t WEBOPEDIA.

V d xt nh ngha v C++ c cho ti WEBOPEDIA nh sau:

A high-level programming language developed by Bjarne Stroustrup at Bell Labs. C++ adds object-oriented features to its predecessor, C. C++ is one of the most popular programming language for graphical applications, such as those that run in Windows and Macintosh environments.

Da theo mt phn ca nh ngha cho ta c th thit lp gia keyphrase C v keyphrase C++ l C++ l extension ca C

S c lin h related gia cc keyphrase: ta ni keyphrase a c lin h related (r6) vi keyphrase b nu c lit k ti web. S c lin h ny mang tnh ph qut m n c th bao hm cc mi quan h cp trn, hoc n va c ngha trong quan h ny v c ngha trong quan h khc, do WEBOPEDIA cng cung cp sn cc keyphrase c lin h vi keyphrase m ta ang xt.

V d xt keyphrase database c cc keyphrase lin quan c lit k c th ti WEBOPEDIA nh sau: active archiving, attribute, connection pool, DAM, data mining, data warehouse, database management system, distributed database, DML, drill down, DSN, dynaset, EDGAR, ETL, field, file, hypertext, metadata, ODS, OLAP, RDBMS, record, replication.Ghi ch: xc nh cc mi quan h r2, r3, r4, r5 ta u da theo nh ngha bng ngn ng t nhin c cung cp ti WEBOPEDIA, trong qu trnh to th cng cn xem xt cc nh ngha lin quan vi mt keyphrase ch khng phi ch xt trn trong phm vi nh ngha ca keyphrase c cho. Cn mi quan h r6 c WEBOPEDIA lin kt km theo cc nh ngha.

Xt thm mt vi nh ngha v phn tch n

Xt nh ngha ca analog television nh sau

Preceding digital television (DTV), all televisions encoded pictures as an analog signal by varying signal voltage and radio frequencies. DTV is fast replacing analog TVs as digital broadcasting enables broadcasters to offer television with movie-quality picture and sound. Analog systems are more commonly known as NTSC systems.A U.S. Senate panel is set an April 7, 2009, as the deadline for television stations to switch entirely from analog to digital broadcasts. Analog televisions will work until all analog broadcasting ceases. Once the transition to complete DTV is taken place, a converter will be required to receive DTV signals and change them to the analog format of these older types of televisions. However, these DTV-to-analog converters will not produce true DTV quality.

Analog televisions are now commonly referred to conventional televisions.

Trong nh ngha ca analog television c ghi nh sau Preceding digital television => iu ny cho bit analog television c extension ca n l digital television hay c th ni rng digital television l extendFrom analog television khi ta c thm mi quan h extension gia analog television v digital television. Ngoi ra, phn cui ca nh ngha analog television c cp Analog televisions are now commonly referred to conventional televisions => t y ta cng c th suy lun rng analog television ng ngha vi conventional televion nh vy ta c th mi quan h synonym gia analog television vi conventional televion Xt nh ngha crimeware nh sau

A type of malicious software that is designed to commit crimes on the Internet. Crimeware may be a virus, spyware or other deceptive piece of software that can be used to commit identity theft and fraud.

Theo nh ngha v phn tch ta nhn thy crimeware c broader l malicious software , crimeware c near-synonyms l virus, spyware.

Xt nh ngha DAML nh sau

Short for DARPA Agent Markup Language, DAML is a semantic markup language that is specifically an extension to XML and the Resource Description Framework (RDF). DAML is used for the U.S. Defense Advanced Research Project Agency (DARPA) and compared to the XML standard it offers a better capacity to express semantics (describing objects and the relationships between objects), which means a much higher level of interoperability between Web sites.

Ta c broader ca DAML l semantic markup language v DAML extension t XML v RDFGhi ch: Gia quan h broader v quan h near-synonym ty thuc vo ng cnh v nh ngha c cho m n s c nhng trng hp broader v near-synonym ging nhau v cng c trng hp l khc nhau.

Xt nh ngha ca keyphrase multidimensional DBMS nh sau

A database management system (DBMS) organized around groups of records that share a common field value. Multidimensional databases are often generated from relational databases. Whereas relational databases make it easy to work with individual records, multidimensional databases are designed for analyzing large groups of records.

The term OLAP (On-Line Analytical Processing) is become almost synonymous with multidimensional databases, whereas OLTP (On-Line Transaction Processing) generally refers to relational DBMSs.

T nh ngha ta rt ra c multidimensional DBMS gn ngha vi OLAP

multidimensional DBMS l extension ca relational databases multidimensional DBMS c broader l database management systemGhi ch: c nhng trng hp sau khi phn tch mi quan h gia hai keyphrase th c th xem nh ng ngha hoc gn ngha. V mi quan h gn ngha l s bao hm ca mi quan h ng ngha, c nhng mi quan h cng c cung cp sn ti WEBOPEDIA.

Chng hn ta xt thm nh ngha tool c cho c th nh sau

A program that performs a very specific task.

Synonymous with utility.Similar to application.

Nh vy da theo nh ngha ta c th xc nh c mi quan h gia program v utility l ng ngha (synonym) v mi quan h gia program v application l gn ngha (near-synonym).

Quy nh v k hiu cho Lp (Class) v Keyphrase s dng: theo cc phn loi ti trang WEBOPEDIA v phn nh phm vi tin hc, ta s quy nh tng ng cho cc lp (Class) trong SPN-KOS nhng s c vit di dng ton b l ch hoa (v d: NGN NG LP TRNH,) v cc Keyphrase c th c trng hp Keyphrase dng ton b l ch hoa nu keyphrase l acronym ca mt Keyphrase khc (v d: keyphrase Ngn ng nh du siu vn bn c acronym l HTML) hoc k t u ca Keyphrase c th l ch hoa.3.2. Cch thc lu tr Ontology SPN-KOSCch thc t chc lu tr cho SPN-KOS chnh l cch thc lu tr thng tin cc lp, keyphrase, v mi quan h gia chng trn a nh th no. Cch thc lu tr Ontology dng d liu th c m t nh sau:

Mi quan h is-a gia cc lp s c lu tr trong c s d liu ACCESS (hoc SQL Server) nh sau: ta xt 66 lp chnh, c mi hai lp nu c quan h is-a vi nhau ta s lu thnh mt dng trong mt bng (table) gm 2 ct c tn ClassParentName, ClassChildName , v ni dung ca 2 d liu ca dng l cha tn hai lp c mi quan h is-a vi nhau. Khi chn thm d liu d liu vo bng s tng ng vic to mi quan h cho 2 lp, v tng s dng c c t bng gm 76 dng tng ng vi 76 mi quan h is-a.

Tng t ta cng lu tr thng tin v mi quan h ng ngha ca Keyphrase nh Acronym, Synonym, Near-Synonym, Broader, Narrower, Extension v Related di dng cc table gm 2 ct, nhng i vi mi quan h Acronym v Synonym th lu trong cng mt bng, v ngha ca chng c th c xem l tng t nhau. Cn cc mi quan h khc ca keyphrase nh

Boader : lu thnh mt bng hai ct KeyphraseName, Broader.

Narrower : lu thnh mt bng hai ct KeyphraseName, Narrower.

Extension : lu thnh mt bng hai ct KeyphraseName, Extension.

Related : lu thnh mt bng hai ct l KeyphraseName, Related.Ghi ch: Related th hin s c lin quan n Keyphrase cho nn d liu ca n c th s bao hm hu ht cc mi quan h khc.Nh th ta c: Bng d liu vi 76 dng v 2 ct tng ng vi 76 mi quan h gia mt cp lp v c tng s l 76 mi quan h gia tt c cc lp.3.2.1. Cch lu tr lp (Class)

Bng 3.3: Lp cha v lp con

Theo bng trn ta c 2 ct ClassParentName (lp cha) v ClassChildName (lp con), ta ni ClassParentName c is-a l ClassChildName. Ta khng ni ClassChildName c is-a l ClassParentName.V d: LP TRNH c is-a l CU TRC D LIU => iu ny cng cho bit rng LP TRNH l cha ng CU TRC D LIU, v nu khng tn ti LP TRNH th cng khng tn ti CU TRC D LIU. Nhng nu mt lp khc c is-a l CU TRC D LIU th CU TRC D LIU vn tn ti d rng khng tn ti LP TRNH.

Lu thut ton to mi quan h is-a gia cc lp theo bng d liu

u vo: Bng d liu gm 2 ct ClassParentName, ClassChildNameu ra: Danh sch cc lp c to mi quan h is-a gia chng

Lu :

Hnh 3.3: Lu to quan h is-a gia cc lp

3.2.2. Keyphrases v lp (Class) cha Keyphrasesy chnh l mi quan h cng lp gia cc keyphrase, v sau khi ta gn mi keyphrase cho mt lp tng ng th khi nu xt trong mt lp ta s bit c trong lp s cha keyphrase no, nh th cng bit c cc keyphrase cng lp v y cng chnh l mi quan h (r1) nh cp SPN-KOS mc quan nim

Bng 3.4: Keyphrase v lp cha KeyphraseGia Keyphrase v mt keyphrase c mi quan h cng lp (r1) vi n ta lu thnh mt bng gm hai ct tng ng nh mi quan h 2 ngi : trong keyphrase nm trong ct KeyphraseName v ct inClass s cho bit lp cha Keyphrase

Lu thut ton to mi quan h (r1) t bng d liu trn

u vo: Bng d liu c > 7600 dng gm 2 ct KeyphraseName, inClassu ra: Danh sch cc keyphrase c phn nh cho cc lp

Lu :

Hnh 3.4: Lu gn Keyphrase vo lp3.2.3. Keyphrases c quan h Broader, Narrower

Bng 3.5: Keyphrase v quan h Broader ca KeyphrasesGia Keyphrase v mt keyphrase c mi quan h Broader (r2) vi n ta lu thnh mt bng gm hai ct : trong keyphrase nm trong ct KeyphraseName v keyphrase c mi quan h Broader s nm trong ct Broader

Lu thut ton to mi quan h (r2)

u vo: Bng d liu c hn 2700 dng v gm 2 ct KeyphraseName, Broaderu ra: Danh sch cc keyphrase v cc keyphrase c quan h Broader vi n

Lu :

Hnh 3.5 : Lu to mi quan h Broader cho Keyphrases

Bng 3.6: Keyphrases v quan h Narrower ca KeyphrasesGia Keyphrase v mt keyphrase c mi quan h Narrower () vi n ta lu thnh mt bng gm hai ct : trong keyphrase nm trong ct KeyphraseName v keyphrase c mi quan h Narrower s nm trong ct Narrower

Lu thut ton to mi quan h ()

u vo: Bng d liu c hn 450 dng,v c 2 ct KeyphraseName, Narroweru ra: Danh sch cc keyphrase v cc keyphrase c quan h Narrower vi n

Lu :

Hnh 3.6: Lu to mi quan h Narrower cho Keyphrase3.2.4. Keyphrases c quan h Acronym, Synonym

Bng 3.7: Keyphrase v quan h Acronym, Synonym ca KeyphrasesGia mt keyphrase v keyphrase khc c mi quan h ng ngha (synonym) hoc vit tt (acronym) (r3): ta quy nh synonym v acronym l nh nhau, cho nn theo bng k bn ta lu cc keyphrase nm trong ct KeyphraseName s c Synonym, Acronym nm trong ct Acro_Synonym

Thut ton to mi quan h (r3) bng d liu trn

u vo: Bng d liu gm hn 2700 dng d liu v 2 ct KeyphraseName, Acro_Synonymu ra: Danh sch keyphrase km vi keyphrase c mi quan h l Acronym, Synonym

Lu :

Hnh 3.7: Lu to mi quan h Acronym, Synonym cho Keyphrases

Bng 3.8: Keyphrase v quan h Near-Synonym ca KeyphrasesGia mt keyphrase v keyphrase khc c mi quan h gn ngha (Near-Synonym) (r4) : ta lu tr keyphrase nm trong ct KeyphraseName v cc keyphrase gn ngha vi n nm trong ct Near-Synonym

Thut ton to mi quan h (r4) bng d liu trn

u vo: Bng d liu gm hn 1500 dng d liu v 2 ct KeyphraseName, NearSynonymu ra: Danh sch keyphrase km vi keyphrase c mi quan h l Near-Synonym

Lu :

Hnh 3.8: Lu to mi quan h Near-Synonym cho Keyphrases3.2.5. Keyphrases c quan h Extension

Bng 3.9: Keyphrase v quan h Extension ca KeyphraseGia mt keyphrase v keyphrase c mi quan h Extension (r5) ta cng lu thnh mt bng gm hai ct : trong keyphrase ct KeyphraseName s c Extension l keyphrase nm ct Extension

Thut ton to mi quan h (r5) bng d liu trn

u vo: Bng d liu gm 77 dng d liu v 2 ct KeyphraseName, Extensionu ra: Danh sch cc keyphrase km vi keyphrase c mi quan h l Extension

Lu :

Hnh 3.9: Lu to mi quan h Extension cho Keyphrases3.2.6. Keyphrases c lin h

Bng 3.10: Keyphrase v cc keyphrase c lin quan vi nMt keyphrase c th lin h vi nhiu keyphrase khc, v tt c cc keyphrase c lin h n mt keyphrase ang xt u c lit k ti trang web, v d liu th hin s c lin h Related (r6) c lu trong mt bng 2 ct nh hnh bn: trong keyphrase ct KeyphraseName s c lin h vi keyphrase nm trong ct Related

Thut ton to mi quan h (r6) bng d liu trn

u vo: Bng d liu gm hn 21000 dng d liu v 2 ct KeyphraseName, Relatedu ra: Danh sch cc keyphrase km vi keyphrase c lin quan

Lu :

Hnh 3.10: Lu gn cc Keyphrase c lin h n mt Keyphrases

Nh cp trong m hnh SPN-KOS mc quan nim ta c cc mi quan h c xy dng da theo cc nh ngha tng ng vi mt Keyphrase cho ti trang web v cc ti liu cng ngh thng tin ting Vit. Ngoi cc mi quan h m ta xy dng nh trn, v trong qu trnh phn tch nh ngha cho v mi quan h ng ngha gia cc keyphrase th s pht sinh thm cc keyphrase khc ngoi tng s hn 6000 keyphrase trn, cc keyphrase c c trch th cng t nh ngha cung cp. V th sau khi xt duyt th th c thm cc keyphrase mi t mi quan h broader, narrower, extension, related. V cc keyphrase mi pht sinh ny ta xem nh cng lp vi keyphrase hin ti. Cn i vi mi quan h acronym, synonym, near-synonym th ta ta cng khng xem xt liu c keyphrase mi pht sinh hay khng v i vi cc mi quan h ny ta xem nh ging nhau. Xy dng m hnh v cch thc lu tr, tm kim theo ng ngha

T kho d liu thu thp c, ta tin hnh phn tch cu trc, xc nh ni dung ti liu lm c s xy dng m hnh v t chc lu tr, tm kim theo hng kt hp.3.2.7. Xy dng m hnh lu tr mc quan nim thc hin mc tiu tm kim ti liu theo m hnh thc t, ta xy dng m hnh kt hp thu thp ti liu, t chc lu tr, suy din tr thc mc quan nim sau:

Hnh 3.11 : M hnh mc quan nimT m hnh lu tr mc quan nim, ta tin hnh phn tch h thng ti liu v xy dng m hnh tng qut qun l, lu tr ti liu v tm kim theo ng ngha Kho ti liu thu thp c ( M hnh cy th mc quy chun ( M hnh qun l theo ng ngha.

Trc tin ta thu thp ti liu v cng ngh thng tin ting Vit t nhiu ngun khc nhau, sau ta t chc lu tr theo cy th mc quy chun. Mi ti liu s c lu tr theo cy th mc quy chun lnh vc tng ng vi ti liu . Cui cng da vo h tr thc SPN-KOS suy din ng ngha cho tng ti liu tng ng.3.2.8. T chc lu tr theo cy th mc

Cc th mc dng lu tr cc tp tin c t chc theo m hnh SPN-KOS, l gm 66 th mc chnh, trong ta to mi quan h gia cc th mc vi nhau dng lin kt nh sauHnh 3.12 : T chc lu tr cc th mcTheo hnh trn ta c bucket D c cha l A v C, cn bucket B c cha l C, ng thi cng bit rng bucket A v C khng c cha, nh th cng bit c rng bucket A v C chnh l mt trong s 3 lp ban u ca SPN-KOS (xem hnh 3.1).3.2.9. T chc lu tr, tm kim theo theo ngh ngha.T m hnh mc quan nim (Hnh 3.11), ta xy dng m hnh c s d liu phc v cho vic lu tr v tm kim theo ng ngha.

Hnh 3.13: M hnh c s lu tr v tm kim theo ng nghaMt h thng tm kim thng tin theo ng ngha c hai chc nng chnh lp ch mc v tra cu. Qu trnh lp ch mc v tra cu c chia lm 2 giai on: xc lp gii php, t chc kho ti liu c ng ngha v Thit k x l tm kim.3.2.9.1. Thit k gii php v t chc kho ti liu c ng ngha

giai on ny, quy trnh thc hin gm 04 Bc, c th:

Bc 1: Rt trch keyphrase Kt hp vi m hnh qun l ti liu theo c s d liu quan h, ta tin hnh rt trch cc keyphrases ca ti liu bng cch th cng. Kt qu tr v l danh sch cc keyphrases tng ng vi tng ti liu c th.

Bc 2: pht sinh thm cc KeyphrasesK tip dng SPN-KOS pht sinh thm cc Keyphrase t cc mi quan h ng ngha ( Acronym, Synonym, Near-Synonym, Broader, Narrower, Extenstion, Related ) da trn cc Keyphrases c. Nh th th s keyphrases gia tng u ny cng c th khng nh rng tp tin m ta trch ra cc keyphrases c biu din ng ngha nhiu hn.

Bc 3: T chc lu tr tp tin vo th mcNgoi ra, ta cng bit thm rng lp cha cc Keyphrase v khi cn xem lp cha ca lp th ta c th bit thng qua cu trc t chc th mc nh trn. Cui cng, ta s xt xem lp no m s Keyphrase bao hm hu ht cc Keyphrase ca tp tin ang xt th ta lu tp tin vo th mc tng ng.V d: ta c ti liu A gm cc Keyphrase C v hp ng sau ta dng Ontology SPN-KOS suy lun cc mi quan h ng ngha c lin quan n Keyphrases ny ta s c thm cc Keyphrases nh sau: ngn ng my, ngn ng kt hp, ngn ng assembly, KLOC, APL, C cng cng, ngn ng lp trnh, C sharp, Eiffel, Ngn ng cp cao, Ngn ng mc cao, Rexx, Visual C cng cng. C tng cng 15 Keyphrases bao gm Keyphrase bn trong ti liu A v Keyphrases suy din t SPN-KOS.KeyphraseLp cha Keyphrase

CCC NGN NG LP TRNH

hp ngCC NGN NG LP TRNH, CNG C LP TRNH

KLOCCC NGN NG LP TRNH, LP TRNH

ngn ng assemblyCC NGN NG LP TRNH

ngn ng kt hpCC NGN NG LP TRNH

ngn ng myCC NGN NG LP TRNH

APLCC NGN NG LP TRNH

C cng cngCC NGN NG LP TRNH

C sharpCC NGN NG LP TRNH

EiffelCC NGN NG LP TRNH

Ngn ng cp caoCC NGN NG LP TRNH

Ngn ng mc caoCC NGN NG LP TRNH

Ngn ng lp trnhCC NGN NG LP TRNH

RexxCC NGN NG LP TRNH

Visual C cng cngCC NGN NG LP TRNH

Tng s lp lin quan vi 15 Keyphrase l 3 lp:

CC NGN NG LP TRNH, CNG C LP TRNH, LP TRNH

Trong lp CC NGN NG LP TRNH cha 15 keyphrases trn, lp LP TRNH cha 1/15 keyphrases, lp CNG C LP TRNH cha 1/15 keyphrases ( lp CC NGN NG LP TRNH cha hu ht cc Keyphrases v th ta lu tp tin ban u (tp tin trc khi tin x l) vo th mc CC NGN NG LP TRNH.

Lu thut ton lu tr tp tin vo th mcu vo: tp tin d liu dng PDF, HTML, DOC, TXT

u ra: tp tin c lu vo trong th mc tng ng. Dng Ontology SPN-KOS suy din ra cc mi quan h ng ngha ca cc keyphrases.

Hnh 3.14: Lu rt trch v suy din ng ngha keyphrase(Kt qu l cc ti liu c lu vo th mc tng ng cng keyphrases c quan h ng ngha. Bc 4: Xy dng ch mc

u vo: Tp cc keyphrases v cc quan h Acronym, Synonym, Near-Synonym, Broader, Narrower, Extenstion, Related tng ng vi mi ti liu u ra : Danh sch cc tp tin ng vi tng keyphrase cng vi mc u tin (SCORE) (Da vo Lucene mc lc C).Thnh phn trong Indexing:

- Dictionary: Lu cc danh sch keyphrases khc nhau c to suy din ng ngha trong tng ti liu.

-DSTapTin : danh sch m ti liu th i (MA_TAI_LIEU)-DSVitri: (keyphrases; DSTapTin)

( To thnh indexing (Ontology trong lnh vc ang xt)

Lu :

Hnh 3.15: Lu lp ch mc cho ti liu3.2.9.2. Qui trnh tm kim theo ng ngha giai on ny, quy trnh c th gm 3 bc sau:

Bc 1: X l rt trch keyphrases t ngi dng nhp vo giai on ny ta rt trch cc keyphrases c ngha do ngi dng nhp vo (kt qu danh sch keyphrases cha c quan h) da vo kho t inu vo: L cu truy vn do ngi dng nhp vo

u ra: Danh sch cc keyphrases rt trch c t ngi dng.

Hnh 3.16: Lu rt trch keyphrase t cu truy vn ca ngi s dng Bc 2: Tm mi quan h ng ngha.

giai on ny ta xy dng mng ng ngha da vo Ontology SPN-KOS (Acronym, Synonym, Synonym, Broader, Narrower, Extenstion, Related) ca cc keyphrases c rt trch bc 1 V d: bc ngi dng nhp vo cu Tm kim ti liu lp trnh C. Da vo t in ta rt ra keyphrase c ngha l C, k tip da vo Ontology ta suy din ngha ca keyphrase C. Ta c mng ng ngha sau:

Hnh 3.17 : th ng ngha ca kephrase C Bc 3: Tm kim vn bn c cha cc keyphrasesT danh sch kt qu do ngi dng nhp vo v c biu din ng ngha bc 2 v da vo INVERTED_INDEX (Kt qu bc 4 mc 3.3.4.1), ta tin hnh tm kim ni dung ti liu c cha cc keyphrases do ngi dng nhp vo v xp hng keyphrase c u tin ln (SCORE) nht tr v.

( Kt qu tr v l danh sch cc ti liu c cha cc keyphrases do ngi dng nhp vo.3.2.10. M hnh t chc lu tr tng th, tm kim ti liu theo ngh ngha

M hnh tng th qun l, tm kim h thng ti liu theo hng c

ng ngha c thc hin 2 giai on l xy dng ti liu theo hng c ng ngha v giai on x l tm kim theo ng ngha.

1. Lu lu tr ti liu theo hng ng ngha.

Hnh 3.18: M hnh qun l ti liu2. Lu x l tm kim ti liu theo hng c ng ngha

Hnh 3.19 : Lu tm kim theo ng ngha

4. dfgdf

5. dfgdfg

6. dfg

MNG MY TNH

LP TRNH

PHN MM

SPN-KOS

CS

b

a

e

d

c

Nu hai lp c mi quan h vi nhau th s lu tr thnh mt dng trong c s d liu, v th tng s dng th hin l 76, cng tng ng vi 76 mi quan h is-a gia cc lp

Bt u

Dng

Ht d liu ct ClassParentName

To mi quan h has-a cho ClassChildName v ClassParentName

Tng ng mi mu d liu ca ClassChildName ta duyt tng dng d liu ca ct ClassParentName

Ht d liu ct ClassChildName

Xc nh y l mt lp

Duyt tng dng ca ct ClassChildname trong table

cn

cn

Ht

Ht

Bt u

Ht d liu ct KeyphraseName

i vi mi Keyphrase ct KeyphraseName ta xc nh xem Keyphrase ny thuc lp no trong inClass

Duyt tng dng ca ct KeyphraseName trong table

cn

Ht

Dng

ht

cn

Duyt tng dng ca ct KeyphraseName

To mi quan h Broader cho KeyphraseName theo d liu ca ct Broader

Dng

Ht d liu ct KeyphraseName

Bt u

ht

cn

Duyt tng dng ca ct KeyphraseName

To mi quan h Narrower cho KeyphraseName theo d liu ca ct Narrower

Dng

Ht d liu ct KeyphraseName

Bt u

Bt u

Ht d liu ct KeyphraseName

Dng

To mi quan h Near-Synonym cho KeyphraseName theo d liu ca ct NearSynom

Duyt tng dng ca ct KeyphraseName

cn

ht

Bt u

Ht d liu ct KeyphraseName

Dng

To mi quan h Acronym, Synonym cho KeyphraseName theo d liu ca ct Acro_Synonym

Duyt tng dng ca ct KeyphraseName

cn

ht

Bt u

Ht d liu ct KeyphraseName

Dng

To mi quan h Extension cho KeyphraseName theo d liu ca ct Extension

Duyt tng dng ca ct KeyphraseName

cn

ht

Bt u

Ht d liu ct KeyphraseName

Dng

Gn cc keyphrase c lin h vi KeyphraseName theo d liu ca ct Related

Duyt tng dng ca ct KeyphraseName

cn

ht

Kho ti liu thu thp c cn lu tr

SPN-SKOS (H TR THC ONTOLOGY)

INVERTED_INDEX

MA_TAI_LIEU

KEYPHRASE

SCORE

TAI_LIEU

MA_TAI_LIEU

KEYPHRASES

MA_NHOM_LINH_VUC

TUA_DE

MA_LOAI_TAI_LIEU

MA_TAC_GIA

MO_TA

MA_DINH_DANG

BAN_QUYEN

MA_NGON_NGU

TAP_TIN

C(A

.

D

C

B

A

C

Parent

Parent

Parent

Parent

Tng s Bucket trong bng bm l 66 tng ng s lp trong Ontology

Ta xem mi th mc tng ng vi 1 lp trong SPN-KOS v cng l mt bucket trong bng bm theo phng php ni kt trc tip, mi bucket cha thng tin cc trng gm:

Tn th mc,

Th mc cha (Parent)

Trong :

Trng Parent l mt con tr s tr n mt danh sch lin kt m ti s cho bit bucket hin ti c nt cha l nt no, v t ta s ly c th mc cha tng ng

Pascal

RLaB

Extenstion

ht

cn

Ti liu dng PDF, HTML, DOC, TXT,

Rt trch ra cc keyphrase tng ng ca ti liu

Dng

Ht d liu cn x l

Bt u

Nu khng trng

Duyt tng dng ca 2 ct KEYPHRASE v MA_TAI_LIEU trong bng KEYPHRASES

Bt u

So snh vi ct KEYPHRASE v MA_TAI_LIEU trong bng

INVERTED_INDEX

Ht d liu cn x l

Lu vo bng INVERTED_INDEX

Dng

Nu trng

Duyt dng tip theo

cn

LISP

Prolog

C++

Visual C cng cng

Ngn ng lp trnh cp cao

C

Related

Broader

Broader

Kt thc

Tp ch mc

H tri thc Ontology (SPN-KOS)

Danh sch cc keyphrases v cc keyphrases c quan h ng ngha tng ng ca tng ti liu

Kho ti liu thu thp

Cy th mc quy chun

Bt u

Related

Broader

T in

Danh sch keyphrase c ngha tch c

Ngi dng

H tr thc Ontology SPN-KOS

Ch mc c

Thc hin truy vn

Xp hng v tr v kt qu

Xy dng truy vn

D_KEYPHRASES

MA_TAI_LIEU

KEYPHRASE

Giao din tm kim

Thng bo tm khng c

Nu lp no bao hm hu ht cc keyphrase th ta s lu tp tin dng ban u vo th mc tng ng

Dng Ontology SPN-KOS suy din ra cc lp cha keyphrase

Broader

Dng Ontology SPN-KOS suy din ra cc mi quan h ng ngha ca cc keyphrase

cn

Duyt dng tip theo

Nu khng

Dng

Lu keyphrase c ngha

Ht d liu cn x l

Kim tra xem d liu trong kho t in c trong cu truy vn ngi dng nhp vo?

Bt u

Duyt tng dng trong kho t in

Nu c

T chc lu tr ti liu theo th mc c quy chun

SPN-KOS

(H tr thc Ontology)

PAGE 4

_1366132737.unknown

_1366300516.unknown

_1307097220.unknown

_1309933685.unknown

_1306913348.unknown