text mining - data mining

Download Text Mining - Data Mining

If you can't read please download the document

Upload: boonlert-aroonpiboon

Post on 16-Apr-2017

7.909 views

Category:

Lifestyle


7 download

TRANSCRIPT

Text Mining / Data Mining

STKS Applied ICT for Executive Librarians30 2553

Outlines

Definition Text Mining Techniques Applications Text mining tools STKS Text Mining

Text Mining

Text mining is the process of analyzing & structure Large sets of documents applying statistical and/or Computational linguistics technology in order to extract Previously unknown knowledge useful to take crucial Business decision.

/ / information extraction

Text mining ()

Text mining is a new and exciting research area in computer sciences that tries to solve the information overload problem by using techniques from data mining / machine learning / natural language processing (NLP) / information retrieval and knowledge management.

A key element of text mining is its on the document collection. At its simplest a document collection can be any grouping of text based documents such asbusiness report /legal memorandum / e-mail/ research Paper / manuscript /article /press release

Text Mining

/ Searching /

Text Mining

Text Mining Data Mining

Scientometrics

Webometrics

Bibliometrics etc.

Information Extraction / IE

Natural language processing community MUC conference1987 US DARPA (naval tactical operation)MUC-2 Conference 1989 MUC-3 conference 1991 Latin American Terrorisms MUC-4 1992MUC-5 1993 Japanese document ( joint ventures + microelectronics)MUC-6 1995 Financial domainMUC-7 1998 Airline Crashes domain ( Chinese, Japanese, English )European Commission / LRE ( linguistic research & engineering )

IE NAMIC /CROSSMARC , MOSES

Figure 1 The Evolution ofdatabase system technology.

Example of output from industry analyzer term extraction process

Biogen Idec Inc. ended its third quarter with $543 million inRevenues , slightly lower than analyst estimates as it near theOne-year anniversary of a merger that made it the world largestBiotech company

The Cambridge,Mass.-based company reported non-GAAP Earnings per share of 37 cents and net income of $132 million compared with 35 cents and 123 million for the quarter last year. Analysts consensus estimate for the quarter was 35 cents

Text Mining

TM

Consumer purchasing Pattern ()

Bioscience Don Swanson Hypothesizing causes of rare diseases TM great impact

Genomics TM

2 Co-Occurrence

TM

Security Application (CIA analyze terrorist events)

Software Application IBM , Microsoft

Academic Application

Nature / NIH / Univ.Manchester / Uinv.California Customer Service quick response

1000 /

Text Mining Techniques

TM Text Extraction

Summarized Extraction

Feature Selection

Cluster Generation

Topic Identification

Information Mapping, Visualization

Text Categorization

TM Data Mining / Information Retrieval / Linguistics / Machine pattern / Statistics/ Pattern recognition / Database / Visualization

Text Mining

TM 4 Customer Relationship Management /CRM
$ 15.2 bn

Intelligence security / corporate/research $ 12 bn

Knowledge & Content management $ 1.9 bn

Information Retrieval technology $ 3.5 bn

TM

Customer Transaction Analysis

Competitive Intelligence / CI

R & D support

Crime Pattern Detection

Virginia ,USA.

Police Information Report / PIR TM

data pre-process

Date District Event type Description 1/05/2003 Reston Robbery . 5/05/2003 Lake Accident . 6/05/2003 South Narcotics

()

2 Extract important & concept

3 Analyze pattern ( Co-occurrence )

software Poly Analyst for text mining German / Spanish / French /Russian / Italian / Portuguese / Dutch / Swedish / Greek

Text Mining Tools / Software

Megaputer Intelligence

SAS

SPSS

Synthema

TEMIS

Autonomy

Clearforest

Fast

IBM

Inxight

Vantage Point

etc.

Text Mining Tools Open Sources Software

Gate - Natural language processing & language

engineering tool

YALE- with its Word vector Tool plugin data and text

mining software

Pimiento- a text-mining application framework written

in Java (http://ee.usyd.edu.au/~jjga/pimiento)

Text Mining Applications
(have proven particularly fertile ground for TM)

Corporate Finance / /
business intelligence

Patent Research / /

Life Science identify complex patterns of interactivities between
proteins

Text Mining

Issue identificationSelection of information sourcesSearch refinement and data retrievalData cleaningBasic analysesAdvance analysesRepresentation

Text Mining Tasks

Search & Retrieval Information

mine various databases ( internal,external publications/patents ) retrieve search results analyses with text mining software

Profile ( Statistical analyses ) R&D activities /
Technology application emphases

Represent : text , tables , graphs activities by time / player
/ Technology map

Interpret : perform competitive analyses describe & project
technology by nation / company anticipate / forecast / trend technology

STKS TM

Tool TM : Vantage Point / VP ISI / Scopus Delphion

features data mining ISI : WOS / SCOPUS / Delphion / Aureka etc.

Thomson : ISI Web of Science

PT JAU Yoksan, R Akashi, MAF Yoksan, Rangrong Akashi, MitsuruTI Low molecular weight chitosan-g-L-phenylalanine: Preparation, characterization, and complex formation with DNASO CARBOHYDRATE POLYMERSLA EnglishDT ArticleDE Chitosan; Phenylalanine; DNA; Nanoparticle; Complex coacervation; DNA releaseID HUMAN ENDOTHELIAL-CELLS; GENE DELIVERY; PLASMID DNA; TRANSFECTION EFFICIENCY; IN-VITRO; NANOPARTICLES; OLIGOSACCHARIDE; SCAFFOLDS; VECTORS; REMOVALAB The grafting of L-phenylalanine onto low molecular weight chitosan is .............................................................................C1 [Akashi, Mitsuru] Osaka Univ, Grad Sch Engn, Dept Appl Chem, Suita, Osaka 5650871, Japan. [Yoksan, Rangrong] Kasetsart Univ, Fac Agroind, Dept Packaging Technol & Mat, Bangkok 10900, Thailand.RP Akashi, M, Osaka Univ, Grad Sch Engn, Dept Appl Chem, 2-2 Yamadaoka, Suita, Osaka 5650871, Japan.EM [email protected] Japan Society for the Promotion of Science (JSPS), Japan [P05133]FX This work was financially supported by the Japan Society for the Promotion of Science (JSPS), Japan (P05133). One of the authors (R.Y.) thanks Assist. Prof. Michiya Matsusaki (Osaka University, Japan) for the technique and discussion on cell culture.NR 36TC 5PU ELSEVIER SCI LTDPI OXFORDPA THE BOULEVARD, LANGFORD LANE, KIDLINGTON, OXFORD OX5 1GB, OXON, ENGLANDSN 0144-8617J9 CARBOHYD POLYMJI Carbohydr. Polym.PD JAN 5PY 2009VL 75IS 1BP 95EP 103DI 10.1016/j.carbpol.2008.07.001PG 9SC Chemistry, Applied; Chemistry, Organic; Polymer ScienceGA 361SYUT ISI:000260148600015

Thomson : Delphion

Text / Data Mining

Intelligence Market /
Technology Intelligence

(hidden content)

(relationship)

(sorting/ranking)

() 4 W (Who/What/When/Where)

() Mining

Metadata / Controlled Vocabulary / Taxonomy / Ontology

STKS Mining (Owned raw data) STKS ... . / . . ............... ...... / .................... 2545 / 1997

Zanasi A. 2005 Text mining and its applications to Intelligence , CRM and Knowledge Management

ppt Text Minning : Techniques and Application 2550.

Wikipedia Text Mining http://en.wikipedia.org as 13/11/2007

END

Thank you for your attention

Click to edit the title text format

Click to edit the outline text formatSecond Outline LevelThird Outline LevelFourth Outline LevelFifth Outline LevelSixth Outline LevelSeventh Outline LevelEighth Outline LevelNinth Outline Level