digitization challenges for [jewish] genealogy

56
Digitization Challenges for [Jewish] Genealogy Jean-Pierre Stroweis [email protected] EVA Minerva Jerusalem, November 2006

Upload: byron

Post on 15-Jan-2016

22 views

Category:

Documents


0 download

DESCRIPTION

Digitization Challenges for [Jewish] Genealogy. Jean-Pierre Stroweis [email protected]. EVA Minerva Jerusalem, November 2006. Genealogy, a Cultural Heritage?. Culture Ethnographic view (Edward Tylor). - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Digitization Challenges  for [Jewish] Genealogy

Digitization Challenges for [Jewish] Genealogy

Jean-Pierre [email protected]

EVA Minerva

Jerusalem, November 2006

Page 2: Digitization Challenges  for [Jewish] Genealogy

Genealogy, a Cultural Heritage?

Page 3: Digitization Challenges  for [Jewish] Genealogy

Culture Ethnographic view (Edward Tylor)

תרבות היא אותה שלמות מורכבת של ידע, אמונה, אומנות, מוסר, חוק, מנהגים וכל אותם הכשרונות וההרגלים שהאדם

.רוכש אותם בהיותו חלק מהחברה

Culture is that complex whole which includesknowledge, belief, art, morals, law, custom, and any other capabilities and habits acquired by man as a member of society.

Page 4: Digitization Challenges  for [Jewish] Genealogy

Genealogy a Cultural Heritage?

• Monotheist Religions– Judaism– Christianity– Islam

Page 5: Digitization Challenges  for [Jewish] Genealogy

Genealogy in the Torah

1. This is the book of the generations of Adam. In the day that G-d created man, in the likeness of G-d made He him;

2. male and female created He them, and blessed them, and called their name Adam, in the day when they were created.

3. And Adam lived a hundred and thirty years, and begot a son in his own likeness, after his image; and called his name Seth.

4. And the days of Adam after he begot Seth were eight hundred years; and he begot sons and daughters.

5. And all the days that Adam lived were nine hundred and thirty years; and he died.

6. And Seth lived a hundred and five years, and begot Enosh.

7. And Seth lived after he begot Enosh eight hundred and seven years, and begot sons and daughters.

Genesis, 5:1-7

Page 6: Digitization Challenges  for [Jewish] Genealogy

Genealogy in the New Testament

12 After the exile to Babylon:Jeconiah was the father of Shealtiel, Shealtiel the father of Zerubbabel,

13 Zerubbabel the father of Abiud, Abiud the father of Eliakim, Eliakim the father of Azor,

14 Azor the father of Zadok, Zadok the father of Akim, Akim the father of Eliud,

15 Eliud the father of Eleazar, Eleazar the father of Matthan, Matthan the father of Jacob,

16 and Jacob the father of Joseph, the husband of Mary, of whom was born Jesus, who is called Christ

Matthew 1:12–16

+Luke 3:21-38

Page 7: Digitization Challenges  for [Jewish] Genealogy

Genealogy in IslamExtracanonical traditions

Muhammad bin ‘Abdullah bin ‘Abdul-Muttalib (who was called Shaiba) bin Hashim, (named ‘Amr) bin ‘Abd Munaf (called Al-Mugheera) bin Qusai (also called Zaid) bin Kilab bin Murra bin Ka‘b bin Lo’i bin Ghalib bin Fahr (who was called Quraish and whose tribe was called after him) bin Malik bin An-Nadr (so called Qais) bin Kinana bin Khuzaiman bin Mudrikah (who was called ‘Amir) bin Elias bin Mudar bin Nizar bin Ma‘ad bin ‘Adnan.

References: Ibn Hisham 1/1,2Talqeeh Fuhoom Ahl Al-Athar, p. 5-6Rahmat-ul-lil'alameen 2/11-14,52

Page 8: Digitization Challenges  for [Jewish] Genealogy

Genealogy a Cultural Heritage?

• Monotheist Religions– Judaism– Christianity– Islam

• Governments– Ontario Ministry of Culture– France Ministry of Culture

Page 9: Digitization Challenges  for [Jewish] Genealogy

Ontario Ministry of Culture

Page 10: Digitization Challenges  for [Jewish] Genealogy

French Ministry of Culture

Page 11: Digitization Challenges  for [Jewish] Genealogy

Genealogy Heritage Who’s in charge of Conservation?

• Families? usually no

• Administrations?

• Towns?

their own records

cemetery not tombs

• Genealogical Societies?

nothing systematic

No systematic preservation of the genealogical heritage!

• Historical Museums and Archives?

out of their scope

Page 12: Digitization Challenges  for [Jewish] Genealogy

Preserving Art versus Genealogy

ART• Created/tangible items • Selected items

are collected• Public sphere• Value to society:

museums, maintenance, academic research, intellectual rights, cost-value

GENEALOGY• Re-construction• Each individual is

a subject of study• Private sphere• Value to family:

no government support, little academic research, privacy rights

Page 13: Digitization Challenges  for [Jewish] Genealogy

Genealogy Heritage Conservation

• No systematic preservation of the genealogical heritage

• No conservation of genealogy per se; Preservation of the sources that will enable future genealogical re-construction

• Genealogical sources are usually preserved by institutions for which genealogy is not the primary purpose

Page 14: Digitization Challenges  for [Jewish] Genealogy

Genealogy HeritagePlayers

• LDS Church (Mormons)

accessible records

• Archives of former Administrations• Ellis Island• Hamburg StaatArchiv• Red Cross ITS Arolsen

their own records

• Holocaust Memorials• Yad Vashem• USHMM • Mémorial de la

Choah

collected records

Page 15: Digitization Challenges  for [Jewish] Genealogy

Genealogy HeritageMore Players

• Private Companies– Ancestry.com– FamilyDNA.com

• National Archives• Genealogical Societies and SIGs• Genealogical Libraries

– Genealogical Library, Germantown, Tennessee– DNA Library, Glasgow, Scotland

Page 16: Digitization Challenges  for [Jewish] Genealogy

Genealogy Heritage[Jewish World] Players

• Individual Initiatives– Jewish Genealogical Family Finder, JewishGen,

Jewish Records Indexing-Poland, Routes-to-Roots Foundation, Istanbul Rabbinate Records, One-Step Web Site…

• IAJGS• Center for Jewish History (NYC)• Hevrot Kaddisha (Burial Societies)

Page 17: Digitization Challenges  for [Jewish] Genealogy

Preservation Life Cycle for Genealogical Sources1. Acquisition

2. Authentication3. Translation/Transliteration4. Accuracy Assessment5. Soundexing6. Storage7. Cataloguing8. Access rights9. Query & Retrieval Tool10.Publication/Distribution

Page 18: Digitization Challenges  for [Jewish] Genealogy

Preservation Life Cycle for Genealogical Sources1. Acquisition

2. Authentication3. Translation/Transliteration4. Accuracy Assessment5. Soundexing6. Storage7. Cataloguing8. Access rights9. Query & Retrieval Tool10.Publication/Distribution

Page 19: Digitization Challenges  for [Jewish] Genealogy

Data Acquisition

• Interviews (Text - Audio – Video)• Scanning Documents, Family Tree

Charts, Pictures• On-site visits to Archives &

Cemeteries• Manual Data Entry• Optical Character Recognition

Page 20: Digitization Challenges  for [Jewish] Genealogy

Preservation Life Cycle for Genealogical Sources1. Acquisition

2. Authentication3. Translation/Transliteration4. Accuracy Assessment5. Soundexing6. Storage7. Cataloguing8. Access rights9. Query & Retrieval Tool10.Publication/Distribution

Page 21: Digitization Challenges  for [Jewish] Genealogy

Digital Formats of Genealogical Data

• Family lore, artifacts and biographies Format: TEXT / IMAGE / AUDIO / VIDEO / 3D

• Documented events Format: TEXT / IMAGE / SPREADSHEET / DATABASE

• Physical traits Format: IMAGE / TAGGED FORMAT TBD

• Genetic profileFormat: TEXT / TAGGED FORMAT TBD

• Family TreesFormat: GEDCOM

Page 22: Digitization Challenges  for [Jewish] Genealogy

GEDCOM

• GEDCOM – Genealogy Data COMmunication,– Neutral format for exchange of genealogical data,– Specification written by LDS Church (www.familysearch.org)

• GEDCOM Version 5.5 (1996)– Text-based,– ANSEL character encoding,– Widely used

• GEDCOM Version 6.0 (Draft 2002)– XML-based,– Unicode characters,– Not implemented

www.familysearch.org/GEDCOM/GedXML60.pdf

www.familysearch.org/GEDCOM/GEDCOM55.exe

Page 23: Digitization Challenges  for [Jewish] Genealogy

Preservation Life Cycle for Genealogical Sources1. Acquisition

2. Authentication3. Translation/Transliteration4. Accuracy Assessment5. Soundexing6. Storage7. Cataloguing8. Access rights9. Query & Retrieval Tool10.Publication/Distribution

Page 24: Digitization Challenges  for [Jewish] Genealogy

Standard for cataloguing

Body: Dublin Core Metadata Initiative

Goal: Development of interoperableonline metadata standards

Standard: The Dublin Core Element Set

Web: www.dublincore.org

Page 25: Digitization Challenges  for [Jewish] Genealogy

Dublin Core Element Set

• Version 1.1, 2004• Standard for cross-domain

information resource description• Meta-data Elements:

title, creator, subject, description, publisher, contributor, date, time, format, identifier, source, language, relation, coverage, rights

Page 26: Digitization Challenges  for [Jewish] Genealogy

Preservation Life Cycle for Genealogical Sources1. Acquisition

2. Authentication3. Translation/Transliteration4. Accuracy Assessment5. Soundexing6. Storage7. Cataloguing8. Access rights9. Query & Retrieval Tool10.Publication/Distribution

Page 27: Digitization Challenges  for [Jewish] Genealogy

Standard for retrieval

Body Open Archives Initiative

Goal Promotes interoperability standards that aim to facilitate the efficient

dissemination of content

Standard The Open Archives Initiative Protocol for Metadata Harvesting

Web www.openarchives.org

Page 28: Digitization Challenges  for [Jewish] Genealogy

Open Archives Initiative Protocol for Metadata Harvesting

• An application-independent interoperability framework based on metadata harvesting. – Data Providers: administer systems

that support the OAI-PMH as a means of exposing metadata

– Service Providers: use metadata harvested via the OAI-PMH as a basis for building value-added services

Page 29: Digitization Challenges  for [Jewish] Genealogy

OAI - Architecture

Source: www.culture.gouv.fr/culture/dll/OAI-PMH.htm

Page 30: Digitization Challenges  for [Jewish] Genealogy

OAI Example: Nominahttp://nomina.france-genealogie.fr

Page 31: Digitization Challenges  for [Jewish] Genealogy

Preservation Life Cycle for Genealogical Sources1. Acquisition

2. Authentication3. Translation/Transliteration4. Accuracy Assessment5. Soundexing6. Storage7. Cataloguing8. Access rights9. Query & Retrieval Tool10.Publication/Distribution

Page 32: Digitization Challenges  for [Jewish] Genealogy

Helkat Mehokek Index of Gravestone Hebrew Inscriptions on Mount of Olives Cemetery

1875 Census of the Jewish Population of Eretz Israel, Ordered by Sir Moses Montefiore

Paul Jacobi’s Index of the Names (listed in monographs)

Name Changes in the Palestine Gazette

Different types of recordsSimilar Verification Process

Our Experience

Page 33: Digitization Challenges  for [Jewish] Genealogy

Helkat Mehokek

Page 34: Digitization Challenges  for [Jewish] Genealogy

Montefiore Census 1875

Page 35: Digitization Challenges  for [Jewish] Genealogy

Paul Jacobi’s Index

Page 36: Digitization Challenges  for [Jewish] Genealogy

Name Changes in the Palestine GazetteName Changes in the Palestine Gazette

Page 37: Digitization Challenges  for [Jewish] Genealogy

What is Quality?

• Accuracy• Integrity with Original Source• Internal Consistency• Completeness• Simplicity / Ease of Use

Page 38: Digitization Challenges  for [Jewish] Genealogy

The Process

Source

Page 39: Digitization Challenges  for [Jewish] Genealogy

The Process

Source Excel Table

Page 40: Digitization Challenges  for [Jewish] Genealogy

The Process

Source Excel Table Searchable database

Page 41: Digitization Challenges  for [Jewish] Genealogy

Quality during Design

• Goals – Index or Full Extract?

• Team Policy• Conventions

– Reference to source– Structure– Fields– Transliteration

Page 42: Digitization Challenges  for [Jewish] Genealogy

Structure

Page 43: Digitization Challenges  for [Jewish] Genealogy

Fields Semantics

Rabbi Schimon III "DAYAN"-"BROD"-"KARA" MI-WINA

Title: RabbiFirst Name: SchimonSurname: DAYAN-BROD-KARAKnown as: from Wien

Full Name: Schimon III DAYAN WIENER-BROD-KARA

Searchable Fields

Non Searchable Field

Page 44: Digitization Challenges  for [Jewish] Genealogy

Z like Zacharia ז

or

Z like Zadok צ

Transliteration Issues

Tzadok

Page 45: Digitization Challenges  for [Jewish] Genealogy

Quality at Verification

Two Steps

1. Unit Test (column-by-column)

2. Integration Test (correlate fields)

Page 46: Digitization Challenges  for [Jewish] Genealogy

Unit TestTypes of Errors Detected

• Unexpected characters in field value,• Variant spellings of the same name

(suspect),• Letter characters embedded in a numeric

field (e.g. ‘O’ instead of zero), • Invalid and out-of-range values (e.g. for

dates, ages),• Inconsistent usage of acronyms,• Inconsistent transliteration

Page 47: Digitization Challenges  for [Jewish] Genealogy

Unit Test Derived Benefits

• Maximum and Minimum Values• List of Distinct Values• Distribution of Values (Frequency)

Page 48: Digitization Challenges  for [Jewish] Genealogy

Frequency of Values

Number of Tombs per CountryCountry Tombs %Belarus 1579 19.7%Lithuania 788 9.8%Poland 715 8.9%Ukraine 224 2.8%Hungary 159 2.0%Israel 85 1.1%Russia 60 0.7%Latvia 55 0.7%Slovakia 40 0.5%Romania 26 0.3%Bulgaria 23 0.3%Turkey 17 0.2%Bosnia 16 0.2%

MONTEFIORE - 1875 CENSUS - SUMMARY # Records PercentageNumber of index records 9955 100.0%Number of original records 4728 47.5%Number of individuals with known surname 5421 54.5%Number of individuals with known first name 7421 74.5%Number of individuals with known father name 3384 34.0%Number of individuals with known mother name 1692 17.0%Number of individuals with known grandfather name 298 3.0%Number of individuals with known title 132 1.3%Number of individuals with known spouse name 4132 41.5%Number of individuals with known year of birth 4759 47.8%Number of individuals with known place of birth 3835 38.5%Number of individuals with known country of birth 4161 41.8%Number of individuals with known year of alyah 1960 19.7%Number of individuals with known occupation 2101 21.1%

Page 49: Digitization Challenges  for [Jewish] Genealogy

Unit Test How to Proceed?

• sort,• auto-filter,• advanced filter,• pivot table

Page 50: Digitization Challenges  for [Jewish] Genealogy

Integration Test

• Redundancy in the Source Document– check that the various correlated values do

not contradict each other

• No Redundancy in Source Document– find recurring patterns and implicit rules

inherent to the nature of the document– Verify that these patterns are respected

Page 51: Digitization Challenges  for [Jewish] Genealogy

Redundancy

Page 52: Digitization Challenges  for [Jewish] Genealogy

Implicit Rules & Patterns

Page 53: Digitization Challenges  for [Jewish] Genealogy

Conclusion on Accuracy Assessment

Common Verification Procedure for any kind of databases

1. Check Column-by-Column2. Check internal redundancy and

implicit internal rules

Page 54: Digitization Challenges  for [Jewish] Genealogy

Conclusion on Digitization Challenges

for [Jewish] Genealogy (1)

• No systematic preservation of the genealogical heritage

• No conservation of genealogy; only sources for genealogical re-construction

• Genealogical sources preserved fornon-genealogy purposes

Page 55: Digitization Challenges  for [Jewish] Genealogy

Conclusion on Digitization Challenges

for [Jewish] Genealogy (2)

• Many technical challenges• Standards for cataloguing /

retrieval• No standards family trees, DNA

samples, soundexes• Challenges not Jewish-specific

Page 56: Digitization Challenges  for [Jewish] Genealogy

Genealogy Heritage Conservation

• No systematic preservation of the genealogical heritage

• No conservation of genealogy per se; Preservation of the sources that will enable future genealogical re-construction

• Genealogical sources are usually preserved by institutions for which genealogy is not the primary purpose