Skip to Content

Defence Historical Service

ArchivIA Project

France ✧ 2024-2027

Corpus


Maritime records of Brest, Cherbourg, Toulon Archives of the Toulon Penal Colony

Batches 1 to 7: 890,467 images (double pages)

Objectives 


Framework agreement for the processing of archives of the Historical Service of the Defence (Ministry of the Armed Forces)

Extraction of personal information from maritime registration records and penal colony records.




Processing workflow


Batch processing:

  • Analysis of documents and definition of sub-batches to be processed automatically or manually (which involves clustering from batch 4)
  • For batches 1 to 3: segmentation of the area with information to be extracted; from batch 4, information extracted at the single-page level (segmentation of double pages).
  • Training of information extraction models: one per type of structure (sub-batch).
  • Matching certain fields with reference data (municipality, department, country, profession, rank)
  • Model evaluation and submission for manual transcription for predictions considered below the quality threshold
  • Export: EAD XML files (one per register) and a single batch CSV file containing the individual informations