ua - gt aligner - icoc

Aligning images with ground truth transcriptions Rafael C. Carrasco ([email protected]) Departamento de Lenguajes y Sistemas Informáticos

Upload: impact-centre-of-competence

Post on 15-Apr-2017

674 views

Category:

Technology

3 download

Report

Download

Embed Size (px):

TRANSCRIPT

Aligning images with ground truth transcriptions

Rafael C. Carrasco ([email protected])Departamento de Lenguajes y Sistemas Informáticos

Impact Ground Truth

Over 30.000 pages of high-quality transcriptions.

difinicion à lo difinido: y antes de contarla, no dexèdicho quienes y quales fueron mis padres, y confuſo na-cimiento, que en ſu tanto, ſi dellos huuiera de eſcreuir-ſe, fuera ſin duda mas agradable y bien recebida que eſta

Identification of words and characters

Impact ground truth identifies regions (paragraphs).Lines can be usually identified with geometric methods.The identification of the words and characters is notstraightforward due to the variable separation betweenthem.Character breaking, overlapping and kerning are frequent.

Gap analysis is not sufficient

Bars mark the position of vertical gaps.

Objectives

Apply standard geometric methods to separate (anddeskew) the lines in the image.Use probabilistic models to identify the best segmentationof the characters in every line.Enrich the Impact ground truth with the additionalinformation (map between characters and images).Publish source code in the Impact Centre Github.

Character features

Candidate features: weight, shadow, gauge, profile,. . .

Methodology

Explore what character features are best for alignment.Employ simple training methods which (in contrast toHMM) require short training times.Font size and type (bold, slanted, etc) are not declared inthe ground truth files and they must be thereforeaddressed in a second phase of this project.

Applications

Training OCR engines, such as Tesseract, with largesamples of characters can be automatized.Adaptation of OCR engines to a particular book orcollection could be feasible with the manual transcriptionof only a few pages.

Note: Work on TEI P5

The Miguel de Cervantes library has created about 10,000books with TEI2 markup.TEI P5 has associated stylesheets, for example, to createe-books automatically. However, some limitations were foundto migrate to TEI P5:

Little support for indentation (normal/hanging).Automatic numeration of verse lines.No style-support for nested annotation.Headings cannot be marked for inclusion/exclusion in the(automatically generated) table of contents.

This experience can be an opportunity for cooperation betweenthe Centre and the TEI consortium.

AZIMUTH ALIGNER®

wiiw Jahresbericht 2018 · SISISISI SISI SI SI SI SI SI SI SI SI SI SI SK SK SKSK SK SK SK SK SKSK SK SK SK SK SK SK TR TR TR TR TR TR TR TRTR TR TR UAUA UA UA UA UA UA UA UA UA UA

UA UA UA UA · Valenciana, si desitgen accedir a qualsevol estudi de la UA, hauran de realitzar una preinscripció prèvia. Per a això, una TALLvegada conegudes les qualificacions

OSAIN ALIGNER EATIL - BioSAFin

Aligner les politiques pour une économie bas carbone : Synthèse

katalog tiande ua Online ua,каталог тианде украина

EL AZIMUTH ALIGNER® - getgeosite.com · El Azimuth Aligner® reemplaza los métodos convencionales de alineación de equipos de perforación tanto en proyectos de minería como de

X-631 Wheel Aligner Spare Parts List · Launch X-631 Wheel Aligner Spare Parts List i X-631 Wheel Aligner Spare Parts List X-631 四轮定位仪零部件手册 2013.06.27 V4.00.000

Thededicatedpositioningimageprocessor FV-aligner-UNT

德國 SCHEU 隱形矯正牙套 Clear Aligner

Plan Especial Reforma Interior Mejora Unidades UA.07 UA.37 ... DE ACTUACION UA-7 UA-37...Las unidades UA.07 y UA.37 del POM de Toledo, se encuentran en la zona central del suelo urbano