text mining : experience

32
ประสบการณ์การวิเคราะห์ข้อมูลด้วย วิธีการทาเหมืองข้อมูล (Text Mining) ดร.อลิสา คงทน นักวิจัย ห้องปฏิบัติการวิจัยวิทยาการมนุษยภาษา ศูนย์เทคโนโลยีอิเล็กทรอนิกส์และคอมพิวเตอร์แห่งชาติ 1

Upload: boonlert-aroonpiboon

Post on 18-Nov-2014

2.877 views

Category:

Technology


0 download

DESCRIPTION

ประสบการณ์การวิเคราะห์ข้อมูลด้วยวิธีการทำเหมืองข้อมูล (Text Mining)

TRANSCRIPT

Page 1: Text Mining : Experience

ประสบการณ์การวเิคราะห์ข้อมลูด้วยวิธีการท าเหมืองข้อมูล (Text Mining)

ดร.อลสิา คงทน

นักวิจัย ห้องปฏบิัติการวิจัยวิทยาการมนุษยภาษา

ศูนย์เทคโนโลยีอิเล็กทรอนิกส์และคอมพิวเตอร์แห่งชาติ

1

Page 2: Text Mining : Experience

Text Mining is about…

“Sifting through vast collections of unstructured or semistructured data beyond the reach of data mining tools, text mining tracks information sources, links isolated concepts in distant documents, maps relationships between activities, and helps answer questions.”

Tapping the Power of Text Mining

Communications of the ACM, Sept. 2006

2

Page 3: Text Mining : Experience

Humans VS. Computers

• Humans: Ability to distinguish and apply linguistic patterns to text

– Could overcome language difficulties such as slangs, spelling

variations, contextual meaning

• Computers: Ability to process text in large volumes at high speed

– Could sift through a large collection of texts to find simple statistics

and relationship among terms in an instant of time

• Text mining requires a combination of both

Human's linguistic capability + computer's speed and accuracy

NLP Data Mining

Page 4: Text Mining : Experience

Text Mining Tasks

• Information extraction:

– Analyze unstructured text and identify key words or

phrases and relationships within text

• Topic detection and tracking:

– Filter and present only documents relevant to the user

profile

• Summarization:

– Text summarization reduces the content by retaining only its main points and overall meaning

4

Page 5: Text Mining : Experience

Text Mining Tasks

• Categorization:

– Automatic classify documents into predefined

categories

• Clustering:

– Group similar documents based on their similarity

• Concept Linkage

– Connect related documents by identifying their shared

concepts, helping users find information they perhaps

wouldn't have found through traditional search methods

5

Page 6: Text Mining : Experience

Text Mining Tasks

• Information Visualization

– Represent documents or information in graphical

formats for easily browsing, viewing, or searching

• Question and answering (Q&A)

– Search and extract the best answer to a given question

6

Page 7: Text Mining : Experience

Applications: Tech Mining

• Tech Mining is the application of text mining

tools to science and technology (S&T)

information particularly bibliographic abstracts

• It exploits the S&T databases to see patterns,

detect associations, and foresee opportunities

7

Page 8: Text Mining : Experience

Tech Mining Process

8

Page 9: Text Mining : Experience

Technical Intelligences:

Who, What, When, Where?

• Digest multiple S&T information resources

• Profile Research Domains:

– Who?

– What?

– When?

– Where?

• Map Relationships: Topics & Teams

• Analyze Trends: What’s Hot & What’s Coming

• And do so -- Quickly

9

Page 10: Text Mining : Experience

What if I don’t have Tech

Mining Software?

10

Page 11: Text Mining : Experience

What if I don’t have Tech

Mining Software?

11

Page 12: Text Mining : Experience

Output example from Tech

Mining Software

12Source: A.L. Porter, QTIP: quick technology intelligence processes, Technol. Forecast. Soc. Change 72 (2005)

Page 13: Text Mining : Experience

Applications: Expert Finder

13

Page 14: Text Mining : Experience

Applications: Expert Finder

14

Page 15: Text Mining : Experience

Applications: Expert Finder

15

Page 16: Text Mining : Experience

Applications: ABDUL

(Artificial BudDy U Love)

• An online information service which currently provides access to Thai linguistic (e.g., dictionary and sentence

translation) and information resources (e.g., weather

condition, stock price, gas price, traffic condition, etc.)

• Users are able to use natural language to interact with

ABDUL via Instant Messaging (IM) based protocol, Web

browser, and Mobile devices

16

Page 17: Text Mining : Experience

Applications: ABDUL

(Artificial BudDy U Love)

17

Page 18: Text Mining : Experience

Applications: ABDUL

(Artificial BudDy U Love)

18

Page 19: Text Mining : Experience

Web 1.0 VS. Web 2.0

19

Page 20: Text Mining : Experience

User-Generated Contents

• With the Web 2.0 or social networking websites, the amount of user-generated contents has increased

exponentially

• User-generated contents often contain opinions and/or sentiments

• An in-depth analysis of these opinionated texts could

reveal potentially useful information, e.g.,

– Preferences of people towards many different topics including news

events, social issues and commercial products

20

Page 21: Text Mining : Experience

Online Opinion Resources

Page 22: Text Mining : Experience

Characteristics of Online

Reviews

• Natural language and unstructured text format

• Some reviews are long and contain only a few

sentences expressing opinions on the product

• Could be difficult for a potential reader to

understand and analyze each review that

maybe relevant to his or her decision making

22

Page 23: Text Mining : Experience

Opinion Mining

• Opinion mining and sentiment analysis is a task for analyzing and summarizing what people think about a

certain topic

• Opinion mining has gained a lot of interest in text mining and NLP communities

• Three granularities of opinion mining:

– Document level

– Sentence level

– Feature level

23

Page 24: Text Mining : Experience

Feature-Based Opinion Mining

• This approach typically consists of two following

steps:

1. Identifying and extracting features of an object,

topic or event from each sentence

2. Determining whether the opinions regarding the

features are positive or negative

24

Page 25: Text Mining : Experience

Opinion Mining on Hotel Reviews in

Thailand (Graphical Display)

25

Page 26: Text Mining : Experience

Opinion Mining on Hotel Reviews in

Thailand (Textual Display)

26

Page 27: Text Mining : Experience

Comparison among Hotels

27

Page 28: Text Mining : Experience

Opinion Mining on Mobile

Network Operators in Thailand

28

Page 29: Text Mining : Experience

Opinion Mining on Mobile

Network Operators in Thailand

29

Page 30: Text Mining : Experience

Challenges in Text Mining

• Text Mining = NLP + Data Mining

• Statistical NLP

– Ambiguity

– Context

– Tokenization \ Sentence Detection

– POS tagging

• Data Mining

– Ability to process the data

– Massive amounts of data

– Determining and extracting information of interest

30

Page 31: Text Mining : Experience

Conclusions

• As the amount of data increases, text-mining

tools that sift through it will be increasingly

valuable

• Various applications for academic and industry

uses

31

Page 32: Text Mining : Experience

Thank you for your attention

Q&A

32