Defence Historical Service
ArchivIA Project
France ✧ 2024-2027
Corpus
Maritime records of Brest, Cherbourg, Toulon
Archives of the Toulon Penal Colony
Batches 1 to 7: 890,467 images (double pages)
Artificial intelligence technologies used
Objectives
Framework agreement for the processing of archives of the Historical Service of the Defence (Ministry of the Armed Forces)
Extraction of personal information from maritime registration records and penal colony records.
Processing workflow
Batch processing:
- Analysis of documents and definition of sub-batches to be processed automatically or manually (which involves clustering from batch 4)
- For batches 1 to 3: segmentation of the area with information to be extracted; from batch 4, information extracted at the single-page level (segmentation of double pages).
- Training of information extraction models: one per type of structure (sub-batch).
- Matching certain fields with reference data (municipality, department, country, profession, rank)
- Model evaluation and submission for manual transcription for predictions considered below the quality threshold
- Export: EAD XML files (one per register) and a single batch CSV file containing the individual informations
