University of Quebec in Chicoutimi
BALSAC Project
Canada ✧ 2019-2021
Project
Extraction of genealogical information for the Quebec civil status register database.
Corpus
2.8 million pages of parish registers (1850-1920) from the civil registration records of Quebec, mainly birth, baptism and death certificates
Processing workflow
- Text line detection
- Text line recognition using a model specifically trained for the corpus
- Grouping of lines into acts
- Named entity recognition (names of people, dates, places and professions)
- Anomaly detection based on detected lines of text
