Arkindex
Arkindex is TEKLIA's platform for managing and processing large collections of digitised documents. We have been actively developing Arkindex since 2019 and use it intensively in all our projects.
- Arkindex source code on Gitlab->
- Documentation d’Arkindex ->
- Contribute code to Arkindex
- Self-host Arkindex
- Arkindex tutorials on Youtube
Callico
Callico is the annotation and validation platform for digitised documents developed by TEKLIA. We use it in all our projects to generate training data for our Deep Learning models. It is available as open source.
Deep Learning libraries and tools
We publish and maintain our code as open source on Gitlab.
- Doc-UFCN, a library for detecting objects in scanned documents. See it on PyPi and our GitLab
- PyLaia, a handwriting recognition library. See it on PyPi and our Gitlab
- Nerval, a named entity extraction evaluation library. See on GitLab
- DISS, a document image segmentation scoring library. See it on GitLab
Open deep learning models
We publish our models in free access on HuggingFace :
- Handwriting recognition models for PyLaia PyLaia ->
- Modèles d’analyse de mise en page de documents pour Doc-UFCN ->
- Modèles de reconnaissance d’entités nommées pour spaCy ->
Data tools
- Transkribus client and parser PAGE XML ->
- Virtual keyboard as a web extension for eScriptorium Arkindex source code on Gitlab->
Arkindex tool
Open-source tools to interact with Arkindex, the document processing platform.
- Arkindex command line client: a command line interface to Arkindex instance. See it on PyPi and GitLab. See documentation ->
- Arkindex API client: a python library to communicate with Arkindex API. See it on PyPi and GitLab. See documentation ->
- Arkindex Export: a library for exploring and using Arkindex exports in sqlite format. See it on PyPi and GitLab.
- Arkindex base worker: a base class for integrating processing algorithms in Arkindex. See it on PyPi and GitLab.
Public Databases from TEKLIA projects
We publish ready to use datasets on HugginFace :
- RIMES: Handwritten documents in French
- NorHand : a dataset for handwritten text recognition in Norwegian
- SIMARA : a dataset of handwritten index cards.