基因功能註解工具 :david. database for annotation, visualization and integrated discovery...

Post on 19-Dec-2015

238 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

基因功能註解工具 :DAVID

Database for Annotation, Visualization and Integrated Discovery (DAVID )

Functional Annotation Tool Gene Ontology Protein interaction Protein domain Pathway Disease

Gene ID Conversion Gene Functional Classification

上傳基因列表到網站

DAVID 操作流程

Gene Name Batch Viewer Gene Functional Classification

Functional Annotation Tool

選定類別以進行分析

取得結果

上傳基因列表AFFYMETRIX_3PRIME_IVT_IDAFFYMETRIX_EXON_GENE_IDAFFYMETRIX_SNP_IDAGILENT_CHIP_IDAGILENT_IDAGILENT_OLIGO_IDENSEMBL_GENE_IDENSEMBL_TRANSCRIPT_IDENTREZ_GENE_IDFLYBASE_GENE_IDFLYBASE_TRANSCRIPT_IDGENBANK_ACCESSIONGENOMIC_GI_ACCESSIONGENPEPT_ACCESSIONILLUMINA_IDIPI_IDMGI_IDOFFICIAL_GENE_SYMBOLPFAM_IDPIR_IDPROTEIN_GI_ACCESSIONREFSEQ_GENOMICREFSEQ_MRNAREFSEQ_PROTEINREFSEQ_RNARGD_IDSGD_IDTAIR_IDUCSC_GENE_IDUNIGENEUNIPROT_ACCESSIONUNIPROT_IDUNIREF100_IDWORMBASE_GENE_IDWORMPEP_IDZFIN_IDNot Sure

1. 確定物種

2. 選定後使用

3.

Functional Annotation Tool

DAVID Gene ID: It is an internal ID generated on "DAVID Gene Concept"  in DAVID system. One DAVID gene ID represents one unique gene cluster belonging to one single gene entry.

Input Gene list : 817Map to David Database : 754David IDs : 734

1.

Genes from your list involved in this annotation categories

2.

4. Single chart report only for this annotation categories.

3. 99 / 734

Functional Annotation ChartChart Report is an annotation-term-focused view which lists annotation terms and their associated genes under study. To avoid over counting duplicated genes, the Fisher Exact statistics is calculated based on corresponding DAVID gene IDs by which all redundancies in original IDs are removed. All result of Chart Report has to pass the thresholds (by default, Max.Prob.<=0.1 and Min.Count>=2)  in Chart Option section to ensure only statistically significant ones displayed.

Functional Annotation Chart

a modified Fisher Exact P-Value (EASE Score)

每頁可顯示多少結果

List Total(LT) - number of genes in the gene list mapping to the category of which the term is a memberPopulation Hits(PH) - number of genes in the background gene list mapping to a specific termPopulation Total(PT) - number of genes in the background gene list mapping to the category

RT (Related Term)Related Term Search can identify other similar terms

RT (Related Term)

Any given gene is associating with a set of annotation terms. If genes share similar set of those terms, they are most likely involved in similar biological mechanisms. The algorithm adopts kappa statistics to quantitatively measure the degree of the agreement how genes share the similar annotation terms. Kappa result ranges from 0 to 1. The higher the value of Kappa, the stronger the agreement. Any a biological process/term coming from all functional categories listed in DAVID.

COG_ONTOLOGY refers to an ontology from NCBI's COG databaseThe database of Clusters of Orthologous Groups of proteins (COGs): a tool for genome-scale analysis of protein functions and evolution

SP_PIR_KEYWORDS are keywords defined by the SwissProt/Uniprot and PIR (Protein Information Resource)

UP_SEQ_FEATURE refers to the annotation category, Uniprot Sequence Feature, found at the Uniprot site, within their report.

Annotation Category - Functional Categories

Annotation Category – Protein domain & Protein Interaction

Protein structure

GOTerms are categorized into 3 groups:BP - Biological ProcessMF - Molecular FunctionCC - Cellular Component

GOTERM_BP_1 -> GO term under Biological Process (BP) in the Level 1.

GOTERM_BP_ALL -> GO term under Biological Process (BP) in the ALL possible Levels.

GOTERM_BP_FAT - Basically this test exams the significance of enriched annotation

(GO FAT) filters out very broad GO terms based on a measured specificity of each term (not level-specificity)

Annotation Category - Gene Ontology

Annotation Category-Pathways

Biocarta

KEGG

挑選 11 項 Categories

總共 11 項 Categories

Combined View Annotation

Functional Annotation ClusterFunctional Annotation ClusteringDue to the redundant nature of annotations, Functional Annotation Chart presents similar/relevant annotations repeatedly. It dilutes the focus of the biology in the report.  To reduce the redundancy, the newly developed Functional Annotation Clustering report groups/displays similar annotations together which makes the biology clearer and more focused to be read vs. traditional chart report. • The Functional Annotation Clustering integrates the same techniques of  Kappa statistics to measure the degree of the common

genes between two annotations, and  fuzzy heuristic clustering to classify the groups of similar annotations according kappa values.

All gene involved in this annotation cluster

Ease score (modified fisher exact test)

Heat map

調整 Kappa statistics 的參數

調整 fuzzy heuristic clustering 的參數

P_value

Enrichment Score = [ -log(P_value 1) + -log(P_value 2) + -log(P_value N) ] / n

Initial Group Members (any value >=2; default = 4): the minimum gene number in a seeding group, which affects the minimum size of each functional group in the final. In general, the lower value attempts to include more genes in functional groups, particularly generates a lot small size groups.

Final Group Members (any value >=2; default = 4): the minimum gene number in one final group after “cleanup” procedure. In general, the lower value attempts to include more genes in functional groups, particularly generates a lot small size groups. It co-functions with previous parameters to control the minimum size of functional groups. In the final cluster, the number of terms that a cluster must have to be presented in the output.

Multi-linkage Threshold (any value between 0% to 100%; default = 50%): It controls how seeding groups merge each other, i.e. two groups sharing the same gene members over the percentage will become one group. The higher percentage, in general, gives sharper separation i.e. it generates more final functional groups with more tightly associated genes in each group. In addition, changing the parameter does not contribute extra genes into unclustered group.

• If you run both functions with defualt setting, they will not be totally overlapped. In general, clustering result may contain more result than chart. In clustering, some 'non-significant' terms could be included due to the link of their 'significant' neigthbors (co-members in on cluster).

• If you want to completely cross link the two reports, you should run chart report by setting p-value cutoff to "1" (ground level). Thus, you will have all possible terms with significant or insignificant p-values.

Chart vs Cluster

上傳基因列表到網站

Gene Name Batch Viewer Gene Functional Classification

Functional Annotation Tool

選定類別以進行分析

取得結果

Another Tools in DAVID

Gene Name Batch Viewer

Gene Functional Classification Tool

Term report

Gene Functional Classification Tool - Create sublist

Gene ID Conversion Tool

Thank you for your attention

top related