Skip to Content

Musée du Louvre

Departement of Islamic Arts

France ✧ 2025

Project


Indexing of auction catalogues from the documentation library to improve the referencing of works in the collection. 


Corpus


Auction catalogues in English and French (Sotheby’s, Bonhams, Christie’s…)

150, 000 pages

Processing workflow


  • Optical Character Recognition (OCR)

  • Segmentation of elements on the page and matching captions to images

  • Metadata creation using automatic key/value pair recognition: extraction of elements from captions (title, date, media…) to a spreadsheet

  • Extracting inventory numbers and associating them with the corresponding image for object indexing