Skip to Content

University of Quebec in Chicoutimi

BALSAC Project

Canada ✧ 2019-2021

Project


Extraction of genealogical information for the Quebec civil status register database.


Corpus


2.8 million pages of parish registers (1850-1920) from the civil registration records of Quebec, mainly birth, baptism and death certificates

Processing 

->Automatic Tex Recognition 

-> Segmentation 

-> Information Extraction

Development of a document processing platform


Processing workflow 


  • Text line detection

  • Text line recognition using a model specifically trained for the corpus

  • Grouping of lines into acts

  • Named entity recognition (names of people, dates, places and professions)

  • Anomaly detection based on detected lines of text

The BALSAC project

-> on the UQAC website