digitization challenges for [jewish] genealogy

Post on 15-Jan-2016

23 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

DESCRIPTION

Digitization Challenges for [Jewish] Genealogy. Jean-Pierre Stroweis stroweis@zahav.net.il. EVA Minerva Jerusalem, November 2006. Genealogy, a Cultural Heritage?. Culture Ethnographic view (Edward Tylor). - PowerPoint PPT Presentation

TRANSCRIPT

Digitization Challenges for [Jewish] Genealogy

Jean-Pierre Stroweisstroweis@zahav.net.il

EVA Minerva

Jerusalem, November 2006

Genealogy, a Cultural Heritage?

Culture Ethnographic view (Edward Tylor)

תרבות היא אותה שלמות מורכבת של ידע, אמונה, אומנות, מוסר, חוק, מנהגים וכל אותם הכשרונות וההרגלים שהאדם

.רוכש אותם בהיותו חלק מהחברה

Culture is that complex whole which includesknowledge, belief, art, morals, law, custom, and any other capabilities and habits acquired by man as a member of society.

Genealogy a Cultural Heritage?

• Monotheist Religions– Judaism– Christianity– Islam

Genealogy in the Torah

1. This is the book of the generations of Adam. In the day that G-d created man, in the likeness of G-d made He him;

2. male and female created He them, and blessed them, and called their name Adam, in the day when they were created.

3. And Adam lived a hundred and thirty years, and begot a son in his own likeness, after his image; and called his name Seth.

4. And the days of Adam after he begot Seth were eight hundred years; and he begot sons and daughters.

5. And all the days that Adam lived were nine hundred and thirty years; and he died.

6. And Seth lived a hundred and five years, and begot Enosh.

7. And Seth lived after he begot Enosh eight hundred and seven years, and begot sons and daughters.

Genesis, 5:1-7

Genealogy in the New Testament

12 After the exile to Babylon:Jeconiah was the father of Shealtiel, Shealtiel the father of Zerubbabel,

13 Zerubbabel the father of Abiud, Abiud the father of Eliakim, Eliakim the father of Azor,

14 Azor the father of Zadok, Zadok the father of Akim, Akim the father of Eliud,

15 Eliud the father of Eleazar, Eleazar the father of Matthan, Matthan the father of Jacob,

16 and Jacob the father of Joseph, the husband of Mary, of whom was born Jesus, who is called Christ

Matthew 1:12–16

+Luke 3:21-38

Genealogy in IslamExtracanonical traditions

Muhammad bin ‘Abdullah bin ‘Abdul-Muttalib (who was called Shaiba) bin Hashim, (named ‘Amr) bin ‘Abd Munaf (called Al-Mugheera) bin Qusai (also called Zaid) bin Kilab bin Murra bin Ka‘b bin Lo’i bin Ghalib bin Fahr (who was called Quraish and whose tribe was called after him) bin Malik bin An-Nadr (so called Qais) bin Kinana bin Khuzaiman bin Mudrikah (who was called ‘Amir) bin Elias bin Mudar bin Nizar bin Ma‘ad bin ‘Adnan.

References: Ibn Hisham 1/1,2Talqeeh Fuhoom Ahl Al-Athar, p. 5-6Rahmat-ul-lil'alameen 2/11-14,52

Genealogy a Cultural Heritage?

• Monotheist Religions– Judaism– Christianity– Islam

• Governments– Ontario Ministry of Culture– France Ministry of Culture

Ontario Ministry of Culture

French Ministry of Culture

Genealogy Heritage Who’s in charge of Conservation?

• Families? usually no

• Administrations?

• Towns?

their own records

cemetery not tombs

• Genealogical Societies?

nothing systematic

No systematic preservation of the genealogical heritage!

• Historical Museums and Archives?

out of their scope

Preserving Art versus Genealogy

ART• Created/tangible items • Selected items

are collected• Public sphere• Value to society:

museums, maintenance, academic research, intellectual rights, cost-value

GENEALOGY• Re-construction• Each individual is

a subject of study• Private sphere• Value to family:

no government support, little academic research, privacy rights

Genealogy Heritage Conservation

• No systematic preservation of the genealogical heritage

• No conservation of genealogy per se; Preservation of the sources that will enable future genealogical re-construction

• Genealogical sources are usually preserved by institutions for which genealogy is not the primary purpose

Genealogy HeritagePlayers

• LDS Church (Mormons)

accessible records

• Archives of former Administrations• Ellis Island• Hamburg StaatArchiv• Red Cross ITS Arolsen

their own records

• Holocaust Memorials• Yad Vashem• USHMM • Mémorial de la

Choah

collected records

Genealogy HeritageMore Players

• Private Companies– Ancestry.com– FamilyDNA.com

• National Archives• Genealogical Societies and SIGs• Genealogical Libraries

– Genealogical Library, Germantown, Tennessee– DNA Library, Glasgow, Scotland

Genealogy Heritage[Jewish World] Players

• Individual Initiatives– Jewish Genealogical Family Finder, JewishGen,

Jewish Records Indexing-Poland, Routes-to-Roots Foundation, Istanbul Rabbinate Records, One-Step Web Site…

• IAJGS• Center for Jewish History (NYC)• Hevrot Kaddisha (Burial Societies)

Preservation Life Cycle for Genealogical Sources1. Acquisition

2. Authentication3. Translation/Transliteration4. Accuracy Assessment5. Soundexing6. Storage7. Cataloguing8. Access rights9. Query & Retrieval Tool10.Publication/Distribution

Preservation Life Cycle for Genealogical Sources1. Acquisition

2. Authentication3. Translation/Transliteration4. Accuracy Assessment5. Soundexing6. Storage7. Cataloguing8. Access rights9. Query & Retrieval Tool10.Publication/Distribution

Data Acquisition

• Interviews (Text - Audio – Video)• Scanning Documents, Family Tree

Charts, Pictures• On-site visits to Archives &

Cemeteries• Manual Data Entry• Optical Character Recognition

Preservation Life Cycle for Genealogical Sources1. Acquisition

2. Authentication3. Translation/Transliteration4. Accuracy Assessment5. Soundexing6. Storage7. Cataloguing8. Access rights9. Query & Retrieval Tool10.Publication/Distribution

Digital Formats of Genealogical Data

• Family lore, artifacts and biographies Format: TEXT / IMAGE / AUDIO / VIDEO / 3D

• Documented events Format: TEXT / IMAGE / SPREADSHEET / DATABASE

• Physical traits Format: IMAGE / TAGGED FORMAT TBD

• Genetic profileFormat: TEXT / TAGGED FORMAT TBD

• Family TreesFormat: GEDCOM

GEDCOM

• GEDCOM – Genealogy Data COMmunication,– Neutral format for exchange of genealogical data,– Specification written by LDS Church (www.familysearch.org)

• GEDCOM Version 5.5 (1996)– Text-based,– ANSEL character encoding,– Widely used

• GEDCOM Version 6.0 (Draft 2002)– XML-based,– Unicode characters,– Not implemented

www.familysearch.org/GEDCOM/GedXML60.pdf

www.familysearch.org/GEDCOM/GEDCOM55.exe

Preservation Life Cycle for Genealogical Sources1. Acquisition

2. Authentication3. Translation/Transliteration4. Accuracy Assessment5. Soundexing6. Storage7. Cataloguing8. Access rights9. Query & Retrieval Tool10.Publication/Distribution

Standard for cataloguing

Body: Dublin Core Metadata Initiative

Goal: Development of interoperableonline metadata standards

Standard: The Dublin Core Element Set

Web: www.dublincore.org

Dublin Core Element Set

• Version 1.1, 2004• Standard for cross-domain

information resource description• Meta-data Elements:

title, creator, subject, description, publisher, contributor, date, time, format, identifier, source, language, relation, coverage, rights

Preservation Life Cycle for Genealogical Sources1. Acquisition

2. Authentication3. Translation/Transliteration4. Accuracy Assessment5. Soundexing6. Storage7. Cataloguing8. Access rights9. Query & Retrieval Tool10.Publication/Distribution

Standard for retrieval

Body Open Archives Initiative

Goal Promotes interoperability standards that aim to facilitate the efficient

dissemination of content

Standard The Open Archives Initiative Protocol for Metadata Harvesting

Web www.openarchives.org

Open Archives Initiative Protocol for Metadata Harvesting

• An application-independent interoperability framework based on metadata harvesting. – Data Providers: administer systems

that support the OAI-PMH as a means of exposing metadata

– Service Providers: use metadata harvested via the OAI-PMH as a basis for building value-added services

OAI - Architecture

Source: www.culture.gouv.fr/culture/dll/OAI-PMH.htm

OAI Example: Nominahttp://nomina.france-genealogie.fr

Preservation Life Cycle for Genealogical Sources1. Acquisition

2. Authentication3. Translation/Transliteration4. Accuracy Assessment5. Soundexing6. Storage7. Cataloguing8. Access rights9. Query & Retrieval Tool10.Publication/Distribution

Helkat Mehokek Index of Gravestone Hebrew Inscriptions on Mount of Olives Cemetery

1875 Census of the Jewish Population of Eretz Israel, Ordered by Sir Moses Montefiore

Paul Jacobi’s Index of the Names (listed in monographs)

Name Changes in the Palestine Gazette

Different types of recordsSimilar Verification Process

Our Experience

Helkat Mehokek

Montefiore Census 1875

Paul Jacobi’s Index

Name Changes in the Palestine GazetteName Changes in the Palestine Gazette

What is Quality?

• Accuracy• Integrity with Original Source• Internal Consistency• Completeness• Simplicity / Ease of Use

The Process

Source

The Process

Source Excel Table

The Process

Source Excel Table Searchable database

Quality during Design

• Goals – Index or Full Extract?

• Team Policy• Conventions

– Reference to source– Structure– Fields– Transliteration

Structure

Fields Semantics

Rabbi Schimon III "DAYAN"-"BROD"-"KARA" MI-WINA

Title: RabbiFirst Name: SchimonSurname: DAYAN-BROD-KARAKnown as: from Wien

Full Name: Schimon III DAYAN WIENER-BROD-KARA

Searchable Fields

Non Searchable Field

Z like Zacharia ז

or

Z like Zadok צ

Transliteration Issues

Tzadok

Quality at Verification

Two Steps

1. Unit Test (column-by-column)

2. Integration Test (correlate fields)

Unit TestTypes of Errors Detected

• Unexpected characters in field value,• Variant spellings of the same name

(suspect),• Letter characters embedded in a numeric

field (e.g. ‘O’ instead of zero), • Invalid and out-of-range values (e.g. for

dates, ages),• Inconsistent usage of acronyms,• Inconsistent transliteration

Unit Test Derived Benefits

• Maximum and Minimum Values• List of Distinct Values• Distribution of Values (Frequency)

Frequency of Values

Number of Tombs per CountryCountry Tombs %Belarus 1579 19.7%Lithuania 788 9.8%Poland 715 8.9%Ukraine 224 2.8%Hungary 159 2.0%Israel 85 1.1%Russia 60 0.7%Latvia 55 0.7%Slovakia 40 0.5%Romania 26 0.3%Bulgaria 23 0.3%Turkey 17 0.2%Bosnia 16 0.2%

MONTEFIORE - 1875 CENSUS - SUMMARY # Records PercentageNumber of index records 9955 100.0%Number of original records 4728 47.5%Number of individuals with known surname 5421 54.5%Number of individuals with known first name 7421 74.5%Number of individuals with known father name 3384 34.0%Number of individuals with known mother name 1692 17.0%Number of individuals with known grandfather name 298 3.0%Number of individuals with known title 132 1.3%Number of individuals with known spouse name 4132 41.5%Number of individuals with known year of birth 4759 47.8%Number of individuals with known place of birth 3835 38.5%Number of individuals with known country of birth 4161 41.8%Number of individuals with known year of alyah 1960 19.7%Number of individuals with known occupation 2101 21.1%

Unit Test How to Proceed?

• sort,• auto-filter,• advanced filter,• pivot table

Integration Test

• Redundancy in the Source Document– check that the various correlated values do

not contradict each other

• No Redundancy in Source Document– find recurring patterns and implicit rules

inherent to the nature of the document– Verify that these patterns are respected

Redundancy

Implicit Rules & Patterns

Conclusion on Accuracy Assessment

Common Verification Procedure for any kind of databases

1. Check Column-by-Column2. Check internal redundancy and

implicit internal rules

Conclusion on Digitization Challenges

for [Jewish] Genealogy (1)

• No systematic preservation of the genealogical heritage

• No conservation of genealogy; only sources for genealogical re-construction

• Genealogical sources preserved fornon-genealogy purposes

Conclusion on Digitization Challenges

for [Jewish] Genealogy (2)

• Many technical challenges• Standards for cataloguing /

retrieval• No standards family trees, DNA

samples, soundexes• Challenges not Jewish-specific

Genealogy Heritage Conservation

• No systematic preservation of the genealogical heritage

• No conservation of genealogy per se; Preservation of the sources that will enable future genealogical re-construction

• Genealogical sources are usually preserved by institutions for which genealogy is not the primary purpose

top related