Skip to Content

University of Quebec in Chicoutimi

BALSAC Project

Canada ✧ 2019-2021

Project


Extraction of genealogical information for the Quebec civil status register database.


Corpus


2.8 million pages of parish registers (1850-1920) from the civil registration records of Quebec, mainly birth, baptism and death certificates

Processing 

->Automatic Tex Recognition 

-> Segmentation 

-> Information Extraction

Development of a document processing platform


Processing workflow 


  • Text line detection

  • Text line recognition using a model specifically trained for the corpus

  • Grouping of lines into acts

  • Named entity recognition (names of people, dates, places and professions)

  • Anomaly detection based on detected lines of text

The BALSAC project

-> on the UQAC website


About the BALSAC project 


The  BALSAC project  maintains and updates a database of digitized vital event records (birth, death and marriage certificates) from all over Quebec, from the first European settlements in the 17th century up to the present day.
These records are interconnected using a linkage method based on nominative information, which allows for the automatic recreation of genealogical relations and kinship structures in the Quebec population.
TEKLIA will handle the automated transcription, named entity recognition and extraction from over 6 million digitized parish register entries (birth/baptism and death records mainly), dating from 1850 to 1920.


Project management in Québec

The BALSAC database is the joint property and responsibility of the Université du Québec à Chicoutimi, Université Laval, McGill University and Université de Montréal. The project is managed by the Université du Québec à Chicoutimi; Hélène Vézina, a professor in the Department of Human and Social sciences at UQAC, is the current head of the project.