introduction for clinical database 陳勁辰2003/06/02
TRANSCRIPT
Introduction for Clinical Introduction for Clinical DatabaseDatabase
陳勁辰陳勁辰2003/06/022003/06/02
IntroductionIntroduction
Database management system (DBMS)Database management system (DBMS) Actual study informationActual study information Administrative informationAdministrative information Relational database model Relational database model
Relational database modelRelational database model
Multiple tablesMultiple tables Entities: rowsEntities: rows Attributes: columnsAttributes: columns Primary keyPrimary key Foreign keyForeign key Principle of table normalizationPrinciple of table normalization
Relational database modelRelational database model
Example: Infant Jaundice Study: whetheExample: Infant Jaundice Study: whether neonatal jaundice affects neuropsychiar neonatal jaundice affects neuropsychiatric scores at five years of agetric scores at five years of age
Table relations: one-to-many, many-to-Table relations: one-to-many, many-to-many, and one-to-onemany, and one-to-one
Need for a multi-table: repeated measurNeed for a multi-table: repeated measurements on individual subjects.ements on individual subjects.
Jaundice vs non-jaundice: t-test p=0.46
One-to-manyOne-to-many
Normal Form ViolationNormal Form Violation
Repeating columns: violate the First NorRepeating columns: violate the First Normal Form (1NF)mal Form (1NF)
Redundancy in column values: violates tRedundancy in column values: violates the Second Normal Form (2NF)he Second Normal Form (2NF)
Many-to-manyMany-to-many
One-to-one One-to-one
Referential integrityReferential integrity Referential Integrity in a normalized, relational Referential Integrity in a normalized, relational
databasedatabase Not allow creation of an exam record for a subjNot allow creation of an exam record for a subj
ect who does not already exist in the “Baby” ect who does not already exist in the “Baby” table, and it will not allow assigning an exam ttable, and it will not allow assigning an exam to a doctor who does not already exist in the “o a doctor who does not already exist in the “Doctor” table. Doctor” table.
A subject may not be deleted unless and until A subject may not be deleted unless and until all that subject’s examinations have also beeall that subject’s examinations have also been deleted.n deleted.
One-to-many: parent-childrenOne-to-many: parent-children Forbids the creation of “orphans”Forbids the creation of “orphans”
Undesirability of Storing Calculated Undesirability of Storing Calculated Values Values
Inconsistencies result if one of the “raInconsistencies result if one of the “raw-data” fields is updated without updaw-data” fields is updated without updating the calculated fieldting the calculated field
Solution: recalculating the value in a quSolution: recalculating the value in a queryery
Storing calculated values such as “AgeIStoring calculated values such as “AgeInMonths” : violate the Third Normal FonMonths” : violate the Third Normal Form (3NF) rm (3NF)
Data Dictionaries, Data Types, and Data Dictionaries, Data Types, and DomainsDomains
Data dictionary: a table of information aData dictionary: a table of information about the database itselfbout the database itself
““Metadata”Metadata” Data types: text, number, dates, etc.Data types: text, number, dates, etc. Domain: range of allowed valuesDomain: range of allowed values Object data type: BLOB (Binary Large ObObject data type: BLOB (Binary Large Ob
ject), eg. photo (*.jpeg); cannot be sorteject), eg. photo (*.jpeg); cannot be sorted or soughtd or sought
Extracting Data from the Database Extracting Data from the Database (Queries)(Queries)
Structured Query Language or SQLStructured Query Language or SQL A query can join data from two or more tA query can join data from two or more t
ables, display only selected fields, and filables, display only selected fields, and filter for records that meet certain criteriater for records that meet certain criteria
SQL has 3 sublanguages: DDL – Data DefiSQL has 3 sublanguages: DDL – Data Definition Language, DML – Data Manipulatinition Language, DML – Data Manipulation Language, and DCL – Data Control Laon Language, and DCL – Data Control Languagenguage
SQLSQL
SELECT Baby.SubjectID, Baby.DOB, Exam.SELECT Baby.SubjectID, Baby.DOB, Exam.ExDateExDate
FROM Baby INNER JOIN Exam ON Baby.SuFROM Baby INNER JOIN Exam ON Baby.SubjectID = Exam.SubjectIDbjectID = Exam.SubjectID
WHERE Exam.ExDate Between #1/1/2010# WHERE Exam.ExDate Between #1/1/2010# And #2/28/2010#And #2/28/2010#
ORDER BY Exam.ExDate;ORDER BY Exam.ExDate;
Guidelines for Database ManagemGuidelines for Database Management for Clinical Researchent for Clinical Research
1.1. Establish the database tables, their rows and columnEstablish the database tables, their rows and columns, and their relationships correctly at the outset.s, and their relationships correctly at the outset.
2.2. Establish and follow naming conventions for columnEstablish and follow naming conventions for columns and tables.s and tables.
3.3. Obtain baseline demographic and clinical informatioObtain baseline demographic and clinical information about members of the study population from existin about members of the study population from existing computer databases.ng computer databases.
4.4. Minimize the extent to which study measurements arMinimize the extent to which study measurements are recorded on paper forms.e recorded on paper forms.
5.5. Follow standard data entry conventions.Follow standard data entry conventions.6.6. Back up the database regularly.Back up the database regularly.
Guidelines for Database ManagemGuidelines for Database Management for Clinical Researchent for Clinical Research
Establish the database tables, their roEstablish the database tables, their rows and columns, and their relationshipws and columns, and their relationships correctly at the outset. s correctly at the outset.
A poorly organized database makes daA poorly organized database makes data maintenance and retrieval nearly imta maintenance and retrieval nearly impossible. Make sure the data are normpossible. Make sure the data are normalized. Avoid data structures that requialized. Avoid data structures that require duplicate data entry or redundant stre duplicate data entry or redundant storage.orage.
Guidelines for Database ManagemGuidelines for Database Management for Clinical Researchent for Clinical Research
Establish and follow naming conventions for Establish and follow naming conventions for columns and tables.columns and tables.
Short field names without spaces or underscShort field names without spaces or underscores are convenient for programming, queryiores are convenient for programming, querying, and other manipulations. Instead of spang, and other manipulations. Instead of spaces or underscores, use “IntraCaps” (upper ces or underscores, use “IntraCaps” (upper case letters within the variable name) to disticase letters within the variable name) to distinguish words, e.g. “StudyID”, “FName”, nguish words, e.g. “StudyID”, “FName”, or “ExamDate”. Table names should be sior “ExamDate”. Table names should be singular, e.g. “Baby” instead of “Babies”, ngular, e.g. “Baby” instead of “Babies”, “Exam” instead of “Exams”.“Exam” instead of “Exams”.
Guidelines for Database ManagemGuidelines for Database Management for Clinical Researchent for Clinical Research
Obtain baseline demographic and clinical informatiObtain baseline demographic and clinical information about members of the study population from exion about members of the study population from existing computer databases.sting computer databases.
Avoid re-entering data which are already available (iAvoid re-entering data which are already available (in digital format) from other sources. In the Infant Jan digital format) from other sources. In the Infant Jaundice Study, the patient demographic data and coundice Study, the patient demographic data and contact information are obtained from the hospital dantact information are obtained from the hospital database. Computer systems can almost always prodtabase. Computer systems can almost always produce character-delimited or fixed-column-width text fuce character-delimited or fixed-column-width text files that the database management system can impiles that the database management system can import.ort.
Guidelines for Database ManagemGuidelines for Database Management for Clinical Researchent for Clinical Research
Minimize the extent to which study measurements aMinimize the extent to which study measurements are recorded on paper forms.re recorded on paper forms.
Enter data directly into the computer database or mEnter data directly into the computer database or move data from paper forms into the computer databove data from paper forms into the computer database as close to the data collection time as possible. ase as close to the data collection time as possible. When you define a variable in a computer database, When you define a variable in a computer database, you specify both its format and its you specify both its format and its domaindomain or range or range of allowed values. Using these format and domain sof allowed values. Using these format and domain specifications, computer data entry forms give immepecifications, computer data entry forms give immediate feedback about improper formats and values tdiate feedback about improper formats and values that are out of range. The best time to receive this fehat are out of range. The best time to receive this feedback is when the study subject is still on site.edback is when the study subject is still on site.
Guidelines for Database ManagemGuidelines for Database Management for Clinical Researchent for Clinical Research
Follow standard data entry conventions.Follow standard data entry conventions. Several conventions for data entry and display have Several conventions for data entry and display have
developed over time. Although most users of screen fdeveloped over time. Although most users of screen forms are not aware of these conventions, they have corms are not aware of these conventions, they have come to expect them subconsciously. For example, a ome to expect them subconsciously. For example, a series of mutually exclusive, collectively exhaustive cseries of mutually exclusive, collectively exhaustive choices is usually displayed as an “option group” cohoices is usually displayed as an “option group” consisting of several different “radio buttons”, wherensisting of several different “radio buttons”, whereas choices which are not mutually exclusive are displas choices which are not mutually exclusive are displayed as check boxesayed as check boxes
Guidelines for Database ManagemGuidelines for Database Management for Clinical Researchent for Clinical Research
Back up the database regularly and chBack up the database regularly and check the adequacy of the back up proceeck the adequacy of the back up procedure by periodically restoring a file frodure by periodically restoring a file from the back up medium.m the back up medium.