Automatic information extraction for cataloging
The Bibliothèque Sainte-Geneviève (BSG) is one of the most important French university libraries and the heir to the library of the Abbey of Sainte-Geneviève. The
BSG asked Teklia to carry out a project aimed at automatically
extracting information from a material file and a printed catalogue,
both of which were only available as images.
The aim was to complete the library's digital catalogue with the indexing information contained in these two printed tools.
This
reverse conversion project is part of a wider project to catalogue the
library's collection of over two million items by subject.
Processing the subject file
The subject file consists of 550,000 cards, already digitised and therefore available as images.
These
cards allow thematic access to the works thanks to the following
information: title, author's name, subject, bibliographical reference,
description and call number.
Training and extraction
Error recovery
Result
In the end, more than 85% of the 550,000 records could be processed fully automatically. Of the remaining 15%, some records were deliberately excluded (title pages, duplicate records, crushed documents), others could not be processed because of the format of the data (series with more than 5 volumes, records with several ratings, sometimes illegible handwritten records, etc.) and still others could not be processed because of OCR errors.

Processing of the Poirée-Lamouroux catalogue
The Catalogue abrégé de la Bibliothèque Sainte-Geneviève,
compiled by Elie Poirée and Georges Lamouroux at the end of the 19th
century, lists most of the works held by the library at the time.
The three volumes of the work are divided into sections delimited by titles and subsections indicated by numbers.
A
table of correspondences makes it possible to associate each
sub-section number with a sub-subject.
Training and extraction
Database integration
Result
A significant cost reduction
TEKLIA's models made it possible to automate the processing of the material file and the catalogue by more than 90%. While manual validation is still required, the human effort is significantly reduced, with a major impact on the cost of reverse conversion.
"The use of AI allowed us to process a considerable amount of data (550,000 records in the subject file and almost 6,000 pages in the printed catalogue). It would have been unthinkable for us to rely on manual processing and we were looking for a service provider who could help us with this retro-conversion project, which had a dual objective: to enrich the records of our online catalogue with very valuable indexing data that was previously difficult to use because it could only be queried in printed form; and to support us in our project to evaluate and map our collections, by partially automating the counting operations.
From our point of view, the initial objectives have been fully met. Despite a very heterogeneous data format, difficulties linked to the complexity of our rating and indexing system and the sometimes late expression of new needs according to the progress of our tests, Teklia has always shown itself to be very available and inclined to make the working method evolve, according to the difficulties and problems that arose. In this respect, we are fully satisfied with our collaboration with the Teklia teams."
Timothée RONY, Documentary Policy Department, Bibliothèque Sainte-Geneviève
