Download - Databases מאגרי מידע
Databases
מאגרי מידע
אחסון
שליפה
DNA RNA
•cDNA•ESTs•Non-coding RNA
phenotype
DNA sequences (individual genes or complete genomes)
•Protein sequences •Translated nuc sequences•Protein domains•Protein structure
protein
•Diseases•polymorhism•Gene expression•Prot-prot interactions
Different kinds of DBs dealing with biological information retrieved by various means
• A database is a structured collection of information.
• A database is composed of basic objects called records or entries (רשומות).
• Each record is composed of fields (שדות), which hold defined data that is related to that record.
Common to all databases
Let’s consider the following database of students learning bioinfo in HUJI
A database can be thought of as a large table, where the rows represent records and the columns represent fields.
Databases
IDFirst NameLast NameGenderComments0775523/7SharonAsulinfemaleLikes scuba
diving020304/4NuritNivfemaleComes from
Cuba03321/3NuritSharonfemale-88924/5YossiYarkonmaleFather of
sharon – must go home earlier
ID (Accession Numbers): Unique identifiers of the database records.
Each record has
unique identifier
For some records there is only partial information – some fields contain no
data (quality of DB)
Some records contain similar data in some of
the fields
Data Retrieval• The purpose of databases is
not merely to collect and organize data, but mainly to allow advanced data retrieval.
• A query (שאילתא) is a method to retrieve information from the database.
• The organization of each record into predetermined fields, allows us to use queries on fields.
The best search strategy…
1. Think – phrase your scientific question.
2 .Choose appropriate database
Boolean operatorsKeywords
Fields
Syntax
Phrase your query
4. Access additional entries discussing same or similar entities by links to additional databases.
5 .Think, evaluate. The computer is just a machine.
You are (hopefully) a thinking organism.
Terms/words for search [field] + (BOLLEAN OPERATORS) Terms/words for Search [field]
Phrasing a query…
cell OR cycle
cell NOT cycle
1 AND 2
1 OR 2
1 NOT 2
1
1
2
2
cell AND cycle1 2
“cell cycle”
Boolean Operators
Cell* - cell, cells, cellular etc)
The secretary wants to locate the record of the student Sharon Asulin but does not remember the last name – search Sharon
FieldID
First NameLast Name
GenderComments
0775523/7SharonAsulinfemaleLikes scuba diving
020304/4NuritNivfemaleComes from Cuba
03321/3NuritSharonfemaleReceives scholarship
88924/5YossiYarkonmaleProud father of sharonThe search was not limited to a certain field Sharon[all fields]
OOPS!!
Retrieved too many records that don’t match the required data - too much noise.
Not found (-)
Found (+)
RelatedFalse negative
True positive
UnrelatedTrue negative
False positive
Search results“sci
entific
trut
h”
Evaluating Search Results
FieldID
First NameLast NameGenderComments
0775523/7SharonTrue positive
AsulinfemaleLikes scuba diving
020304/4NuritNivfemaleComes from Cuba
03321/3NuritSharonFalse positive
femaleReceives scholarship
88924/5YossiYarkonmaleProud father of sharon False positiveWhat can we do to reduce/eliminate false positives
without reducing true positives?
Sensitivity
Ability of a method to detect positives, irrespective of how many false positives are reported.
Selectivity
Ability of a method to reject negatives, irrespective of how many false negatives are rejected.
Sensitivity Selectivity
Find all students whose first name is SharonSharon[first name]
Keyword synthax (NCBI) field definition
Let’s refine our search
IDFirst NameLast Name
GenderComments
0775523/7
SharonAsulinfemaleLikes scuba diving
020304/4NuritNivfemaleComes from Cuba
03321/3NuritSharonfemaleReceives scholarship
88924/5YossiYarkonmaleFather of sharon – must go home earlier
IDFirst NameLast Name
GenderComments
0775523/7
SharomAsulinfemaleLikes scuba diving
020304/4NuritNivfemaleComes from Cuba
03321/3NuritSharonfemaleReceives scholarship
88924/5YossiYarkonmaleFather of sharon – must go home earlier
Now we don’t retrieve any answer (false negative?) and we are still not distracted by the noise.
The original search phrase sharon[all fields] would have retrieved all the noise but not the required info.
The secretary wants to locate the record of the female student who comes from Cuba but does not remember her name.Search female[gender] AND *cuba*[comments] Keyword synthax (NCBI) field definition Boolean operator
FieldID
First NameLast Name
GenderComments
0775523/7SharonAsulinfemaleLikes scuba diving – false positive
020304/4NuritNivfemaleComes from Cuba true positive
03321/3NuritSharonfemaleReceives scholarship
88924/5YossiYarkonmaleProud father of sharon
והעיקר, והעיקר :
לא לפחד כלל