extension of disease-related pathway using text mining 2006.04.24 분자유전체의학 정희준...

7
Extension of disease- related pathway using text mining 2006.04.24 분분분분분분분 분분분 2006 B4GM Term Project

Upload: regina-elliott

Post on 10-Jan-2016

214 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Extension of disease-related pathway using text mining 2006.04.24 분자유전체의학 정희준 2006 B4GM Term Project

Extension of disease-related pathway using text mining

2006.04.24분자유전체의학

정희준

2006 B4GM Term Project

Page 2: Extension of disease-related pathway using text mining 2006.04.24 분자유전체의학 정희준 2006 B4GM Term Project

Introduction• Disease process 는 disease 의 이해와 치료에 있어서 중요• MeSH

• MeSH is NLM’s controlled vocabulary used for indexing articles for MEDLINE/PuMed

• MeSH terminology provides a consistent way to retrieve information that may use different terminology for the same concepts

• OMIM• OMIM archives mature, high-quality data of high significance, the standard in rare

mendelian disorders• ArrayXPath

• ArrayXPath have 3,088 genes or gene products• It has repository of meta-information for public pathway databases, GenMAPP, K

EGG, BioCarta and PharmGKB• If input disease name is matched to the corresponding MeSH heading or entry ter

m, PathMeSH outputs the list of the pathways containing the disease-related gene product

Page 3: Extension of disease-related pathway using text mining 2006.04.24 분자유전체의학 정희준 2006 B4GM Term Project
Page 4: Extension of disease-related pathway using text mining 2006.04.24 분자유전체의학 정희준 2006 B4GM Term Project

Problem

• OMIM 은 질병에대한 유전적 요인의 유전자를 정리

Page 5: Extension of disease-related pathway using text mining 2006.04.24 분자유전체의학 정희준 2006 B4GM Term Project

Concept diagram

Disease Gene pathway

MeSH hierarchies

OMIM MorbidMap New GRIP

PubMed

Page 6: Extension of disease-related pathway using text mining 2006.04.24 분자유전체의학 정희준 2006 B4GM Term Project

Method

• Step 1. Collect PubMed’s abstract• MeSH heading 과 유전자 symbol 을 입력하여

검색되는 PubMed 의 abstract 수집

• Step 2. Build Gene/Gene product dictionary• Entrez Gene, HGNC, SWISSPROT 에서 제공하는 symbol,

gene name 의 dictionary 구축

Page 7: Extension of disease-related pathway using text mining 2006.04.24 분자유전체의학 정희준 2006 B4GM Term Project

• Step 3. Extract gene/gene product in abstract• Step 1 에서 모은 각 질병의 abtract 에서 gene/gene pro

duct 를 추출

• Step 4. Apply filtering• Step 3 에서 추출한 gene/gene product 에 대한 유이성

검사• Filter 를 통과한 gene/gene product 를 disease-gene

관계에 포함