digitization challenges for [jewish] genealogy
DESCRIPTION
Digitization Challenges for [Jewish] Genealogy. Jean-Pierre Stroweis [email protected]. EVA Minerva Jerusalem, November 2006. Genealogy, a Cultural Heritage?. Culture Ethnographic view (Edward Tylor). - PowerPoint PPT PresentationTRANSCRIPT
Digitization Challenges for [Jewish] Genealogy
Jean-Pierre [email protected]
EVA Minerva
Jerusalem, November 2006
Genealogy, a Cultural Heritage?
Culture Ethnographic view (Edward Tylor)
תרבות היא אותה שלמות מורכבת של ידע, אמונה, אומנות, מוסר, חוק, מנהגים וכל אותם הכשרונות וההרגלים שהאדם
.רוכש אותם בהיותו חלק מהחברה
Culture is that complex whole which includesknowledge, belief, art, morals, law, custom, and any other capabilities and habits acquired by man as a member of society.
Genealogy a Cultural Heritage?
• Monotheist Religions– Judaism– Christianity– Islam
Genealogy in the Torah
1. This is the book of the generations of Adam. In the day that G-d created man, in the likeness of G-d made He him;
2. male and female created He them, and blessed them, and called their name Adam, in the day when they were created.
3. And Adam lived a hundred and thirty years, and begot a son in his own likeness, after his image; and called his name Seth.
4. And the days of Adam after he begot Seth were eight hundred years; and he begot sons and daughters.
5. And all the days that Adam lived were nine hundred and thirty years; and he died.
6. And Seth lived a hundred and five years, and begot Enosh.
7. And Seth lived after he begot Enosh eight hundred and seven years, and begot sons and daughters.
Genesis, 5:1-7
Genealogy in the New Testament
12 After the exile to Babylon:Jeconiah was the father of Shealtiel, Shealtiel the father of Zerubbabel,
13 Zerubbabel the father of Abiud, Abiud the father of Eliakim, Eliakim the father of Azor,
14 Azor the father of Zadok, Zadok the father of Akim, Akim the father of Eliud,
15 Eliud the father of Eleazar, Eleazar the father of Matthan, Matthan the father of Jacob,
16 and Jacob the father of Joseph, the husband of Mary, of whom was born Jesus, who is called Christ
Matthew 1:12–16
+Luke 3:21-38
Genealogy in IslamExtracanonical traditions
Muhammad bin ‘Abdullah bin ‘Abdul-Muttalib (who was called Shaiba) bin Hashim, (named ‘Amr) bin ‘Abd Munaf (called Al-Mugheera) bin Qusai (also called Zaid) bin Kilab bin Murra bin Ka‘b bin Lo’i bin Ghalib bin Fahr (who was called Quraish and whose tribe was called after him) bin Malik bin An-Nadr (so called Qais) bin Kinana bin Khuzaiman bin Mudrikah (who was called ‘Amir) bin Elias bin Mudar bin Nizar bin Ma‘ad bin ‘Adnan.
References: Ibn Hisham 1/1,2Talqeeh Fuhoom Ahl Al-Athar, p. 5-6Rahmat-ul-lil'alameen 2/11-14,52
Genealogy a Cultural Heritage?
• Monotheist Religions– Judaism– Christianity– Islam
• Governments– Ontario Ministry of Culture– France Ministry of Culture
Ontario Ministry of Culture
French Ministry of Culture
Genealogy Heritage Who’s in charge of Conservation?
• Families? usually no
• Administrations?
• Towns?
their own records
cemetery not tombs
• Genealogical Societies?
nothing systematic
No systematic preservation of the genealogical heritage!
• Historical Museums and Archives?
out of their scope
Preserving Art versus Genealogy
ART• Created/tangible items • Selected items
are collected• Public sphere• Value to society:
museums, maintenance, academic research, intellectual rights, cost-value
GENEALOGY• Re-construction• Each individual is
a subject of study• Private sphere• Value to family:
no government support, little academic research, privacy rights
Genealogy Heritage Conservation
• No systematic preservation of the genealogical heritage
• No conservation of genealogy per se; Preservation of the sources that will enable future genealogical re-construction
• Genealogical sources are usually preserved by institutions for which genealogy is not the primary purpose
Genealogy HeritagePlayers
• LDS Church (Mormons)
accessible records
• Archives of former Administrations• Ellis Island• Hamburg StaatArchiv• Red Cross ITS Arolsen
their own records
• Holocaust Memorials• Yad Vashem• USHMM • Mémorial de la
Choah
collected records
Genealogy HeritageMore Players
• Private Companies– Ancestry.com– FamilyDNA.com
• National Archives• Genealogical Societies and SIGs• Genealogical Libraries
– Genealogical Library, Germantown, Tennessee– DNA Library, Glasgow, Scotland
Genealogy Heritage[Jewish World] Players
• Individual Initiatives– Jewish Genealogical Family Finder, JewishGen,
Jewish Records Indexing-Poland, Routes-to-Roots Foundation, Istanbul Rabbinate Records, One-Step Web Site…
• IAJGS• Center for Jewish History (NYC)• Hevrot Kaddisha (Burial Societies)
Preservation Life Cycle for Genealogical Sources1. Acquisition
2. Authentication3. Translation/Transliteration4. Accuracy Assessment5. Soundexing6. Storage7. Cataloguing8. Access rights9. Query & Retrieval Tool10.Publication/Distribution
Preservation Life Cycle for Genealogical Sources1. Acquisition
2. Authentication3. Translation/Transliteration4. Accuracy Assessment5. Soundexing6. Storage7. Cataloguing8. Access rights9. Query & Retrieval Tool10.Publication/Distribution
Data Acquisition
• Interviews (Text - Audio – Video)• Scanning Documents, Family Tree
Charts, Pictures• On-site visits to Archives &
Cemeteries• Manual Data Entry• Optical Character Recognition
Preservation Life Cycle for Genealogical Sources1. Acquisition
2. Authentication3. Translation/Transliteration4. Accuracy Assessment5. Soundexing6. Storage7. Cataloguing8. Access rights9. Query & Retrieval Tool10.Publication/Distribution
Digital Formats of Genealogical Data
• Family lore, artifacts and biographies Format: TEXT / IMAGE / AUDIO / VIDEO / 3D
• Documented events Format: TEXT / IMAGE / SPREADSHEET / DATABASE
• Physical traits Format: IMAGE / TAGGED FORMAT TBD
• Genetic profileFormat: TEXT / TAGGED FORMAT TBD
• Family TreesFormat: GEDCOM
GEDCOM
• GEDCOM – Genealogy Data COMmunication,– Neutral format for exchange of genealogical data,– Specification written by LDS Church (www.familysearch.org)
• GEDCOM Version 5.5 (1996)– Text-based,– ANSEL character encoding,– Widely used
• GEDCOM Version 6.0 (Draft 2002)– XML-based,– Unicode characters,– Not implemented
www.familysearch.org/GEDCOM/GedXML60.pdf
www.familysearch.org/GEDCOM/GEDCOM55.exe
Preservation Life Cycle for Genealogical Sources1. Acquisition
2. Authentication3. Translation/Transliteration4. Accuracy Assessment5. Soundexing6. Storage7. Cataloguing8. Access rights9. Query & Retrieval Tool10.Publication/Distribution
Standard for cataloguing
Body: Dublin Core Metadata Initiative
Goal: Development of interoperableonline metadata standards
Standard: The Dublin Core Element Set
Web: www.dublincore.org
Dublin Core Element Set
• Version 1.1, 2004• Standard for cross-domain
information resource description• Meta-data Elements:
title, creator, subject, description, publisher, contributor, date, time, format, identifier, source, language, relation, coverage, rights
Preservation Life Cycle for Genealogical Sources1. Acquisition
2. Authentication3. Translation/Transliteration4. Accuracy Assessment5. Soundexing6. Storage7. Cataloguing8. Access rights9. Query & Retrieval Tool10.Publication/Distribution
Standard for retrieval
Body Open Archives Initiative
Goal Promotes interoperability standards that aim to facilitate the efficient
dissemination of content
Standard The Open Archives Initiative Protocol for Metadata Harvesting
Web www.openarchives.org
Open Archives Initiative Protocol for Metadata Harvesting
• An application-independent interoperability framework based on metadata harvesting. – Data Providers: administer systems
that support the OAI-PMH as a means of exposing metadata
– Service Providers: use metadata harvested via the OAI-PMH as a basis for building value-added services
OAI - Architecture
Source: www.culture.gouv.fr/culture/dll/OAI-PMH.htm
OAI Example: Nominahttp://nomina.france-genealogie.fr
Preservation Life Cycle for Genealogical Sources1. Acquisition
2. Authentication3. Translation/Transliteration4. Accuracy Assessment5. Soundexing6. Storage7. Cataloguing8. Access rights9. Query & Retrieval Tool10.Publication/Distribution
Helkat Mehokek Index of Gravestone Hebrew Inscriptions on Mount of Olives Cemetery
1875 Census of the Jewish Population of Eretz Israel, Ordered by Sir Moses Montefiore
Paul Jacobi’s Index of the Names (listed in monographs)
Name Changes in the Palestine Gazette
Different types of recordsSimilar Verification Process
Our Experience
Helkat Mehokek
Montefiore Census 1875
Paul Jacobi’s Index
Name Changes in the Palestine GazetteName Changes in the Palestine Gazette
What is Quality?
• Accuracy• Integrity with Original Source• Internal Consistency• Completeness• Simplicity / Ease of Use
The Process
Source
The Process
Source Excel Table
The Process
Source Excel Table Searchable database
Quality during Design
• Goals – Index or Full Extract?
• Team Policy• Conventions
– Reference to source– Structure– Fields– Transliteration
Structure
Fields Semantics
Rabbi Schimon III "DAYAN"-"BROD"-"KARA" MI-WINA
Title: RabbiFirst Name: SchimonSurname: DAYAN-BROD-KARAKnown as: from Wien
Full Name: Schimon III DAYAN WIENER-BROD-KARA
Searchable Fields
Non Searchable Field
Z like Zacharia ז
or
Z like Zadok צ
Transliteration Issues
Tzadok
Quality at Verification
Two Steps
1. Unit Test (column-by-column)
2. Integration Test (correlate fields)
Unit TestTypes of Errors Detected
• Unexpected characters in field value,• Variant spellings of the same name
(suspect),• Letter characters embedded in a numeric
field (e.g. ‘O’ instead of zero), • Invalid and out-of-range values (e.g. for
dates, ages),• Inconsistent usage of acronyms,• Inconsistent transliteration
Unit Test Derived Benefits
• Maximum and Minimum Values• List of Distinct Values• Distribution of Values (Frequency)
Frequency of Values
Number of Tombs per CountryCountry Tombs %Belarus 1579 19.7%Lithuania 788 9.8%Poland 715 8.9%Ukraine 224 2.8%Hungary 159 2.0%Israel 85 1.1%Russia 60 0.7%Latvia 55 0.7%Slovakia 40 0.5%Romania 26 0.3%Bulgaria 23 0.3%Turkey 17 0.2%Bosnia 16 0.2%
MONTEFIORE - 1875 CENSUS - SUMMARY # Records PercentageNumber of index records 9955 100.0%Number of original records 4728 47.5%Number of individuals with known surname 5421 54.5%Number of individuals with known first name 7421 74.5%Number of individuals with known father name 3384 34.0%Number of individuals with known mother name 1692 17.0%Number of individuals with known grandfather name 298 3.0%Number of individuals with known title 132 1.3%Number of individuals with known spouse name 4132 41.5%Number of individuals with known year of birth 4759 47.8%Number of individuals with known place of birth 3835 38.5%Number of individuals with known country of birth 4161 41.8%Number of individuals with known year of alyah 1960 19.7%Number of individuals with known occupation 2101 21.1%
Unit Test How to Proceed?
• sort,• auto-filter,• advanced filter,• pivot table
Integration Test
• Redundancy in the Source Document– check that the various correlated values do
not contradict each other
• No Redundancy in Source Document– find recurring patterns and implicit rules
inherent to the nature of the document– Verify that these patterns are respected
Redundancy
Implicit Rules & Patterns
Conclusion on Accuracy Assessment
Common Verification Procedure for any kind of databases
1. Check Column-by-Column2. Check internal redundancy and
implicit internal rules
Conclusion on Digitization Challenges
for [Jewish] Genealogy (1)
• No systematic preservation of the genealogical heritage
• No conservation of genealogy; only sources for genealogical re-construction
• Genealogical sources preserved fornon-genealogy purposes
Conclusion on Digitization Challenges
for [Jewish] Genealogy (2)
• Many technical challenges• Standards for cataloguing /
retrieval• No standards family trees, DNA
samples, soundexes• Challenges not Jewish-specific
Genealogy Heritage Conservation
• No systematic preservation of the genealogical heritage
• No conservation of genealogy per se; Preservation of the sources that will enable future genealogical re-construction
• Genealogical sources are usually preserved by institutions for which genealogy is not the primary purpose