Skip to Content

Arkindex

Our document processing software


Demo 

Arkindex is a web-based platform built for processing large collections of scanned or digitized documents.


ORGANIZE, 

ANNOTATE, 

ACCESS YOUR DATA

in a single customizable workflow. 

About Arkindex


Since 2019, TEKLIA has been developing Arkindex, a large-scale open-source platform for digital and digitized document processing. 

Arkindex is designed with openness and flexibility as core principles. Here are the key elements of this open design and how they benefit our users: 

Integration of a wide range of algorithms and models

One of Arkindex's key advantages is its ability to integrate any type of algorithm or model. Through the use of Docker and a highly generic algorithm and model management system, new algorithms or models can be easily and quickly integrated into Arkindex, often in a matter of hours. This feature is particularly important given the rapid evolution of automatic document processing. 

Flexibility in corpus and document structure

Arkindex offers complete flexibility in document structuring. You can create any type of hierarchy within Arkindex to display and analyze the structure of your corpus and the elements contained in your documents. This flexibility is essential for adapting to the diverse and often complex nature of archives and heritage collections. 

Full access and control with a comprehensive API 

All interactions with Arkindex are facilitated by a fully documented and tested API. This API is the backbone of our system and provides clients with full access and control over their data and processes. The availability of multiple software clients, ranging from a web-based interface to an open source command line interface (CLI), ensures that clients can interact with Arkindex in the most convenient way for their workflow. 

Arkindex Features

Arkindex is easy to use and can be customized to suit your needs.

AI Integration Flexibility 

(Docker and Python)

Each edition of Arkindex allows the integration of custom AI models via Docker and Python, giving users the flexibility to integrate any document processing library or model.

Full API Access

Full API access in all editions ensures that developers can build and integrate custom applications that extend the functionality of the Arkindex platform.


Command Line Access

The availability of several software clients, ranging from a web interface to an open source command line interface (CLI), allows you to tailor your use of Arkindex to your needs.


Illimited Data and Metadata Volume

There is no limit to the amount of data and metadata that can be processed across all editions, ensuring that users can confidently manage large and complex datasets.


Illimited Number of Users

All editions support an unlimited number of users, ensuring scalability and collaboration flexibility for teams of any size.


Data Export Formats

Export options include formats such as PDF, CSV, PAGE XML, ALTO, Microsoft Word (Docx), and SQ Lite. It is possible to integrate custom export formats. 

The following algorithms and models are currently integrated in Arkindex :

Semantic segmentation: Doc-UFCN, MaskRCNN, YOLO V8, Kraken, Grounding DINO, TableTransformer, LayoutParser
Text Recognition : Tesseract, PyLaia, Kraken, DAN, Google Vision
Named-entity recognition: Spacy, Flair, Stanza
Image Classification: Resnet, Yolo V8
Image Description: Llava
Machine Translation: MarianMT
Large Language Model, LLM: QWEN, Open AI, Claude, Gemini.

Arkindex Editions

Arkindex offers three distinct plans (Community, Academia and Enterprise) to meet the needs of organizations with varying levels of resources, technical expertise, and infrastructure.

Arkindex Community (Open Source)

The Community edition gives small teams or research groups full control over deployment (self-hosting), without licensing costs. It is ideal when internal IT capabilities exist and customisation is important.  

Arkindex Academia

The Academia edition is a discounted license offer exclusively for university research projects in France and internationally. The edition offers an access control list for teamwork. The number of documents that can be processed is limited to 250,000 images. 
 

Arkindex Enterprise

The Enterprise adds advanced features and support, such as more granular access control, integration with high-performance computing clusters, and optional assistance from TEKLIA, while still supporting self-hosting. It is suited for larger organizations that want to scale up.  


Pricing

Arkindex Community (Open Source)
Free


  •  Self-hosted
  •  Illimited Document Capacity
  •  Public or single-user projects only
  •  High-Performance Computing (HPC) not included
  •  Self-service Installation and Maintenance

Arkindex Academia 
1 500€ / mois

  •   Self-hosting or TEKLIA hosting available
  •  Document Capacity limited to 250 000 images
  •  Advanced Access Control List (ACL)
  •  High-Performance Computing (HPC) integrated  
  •   Self-service Installation and Maintenance 
  •  Option for TEKLIA's assistance services 

Arkindex Enterprise 
4 000€ / mois

  •   Self-hosting or TEKLIA hosting available
  •   Illimited Document Capacity 
  •  Advanced Access Control List (ACL)
  •  High-Performance Computing (HPC) integrated  
  •   Self-service Installation and Maintenance 
  •  Option for TEKLIA's assistance services 

Arkindex Specifications

Document import


Arkindex allows you to easily create elements from your images in a pre-defined data structure. This way, you can import a few amount of images either from your computer or from an IIIF server.

Import and organize images of document from files (jpeg, tiff, png), PDF, IIIF manifests.

Import Documentation ->

Organization Documentation ->

Annotation



Arkindex allows you to produce the annotations needed to process your documents. Before starting the project, you need to manually annotate a few examples of the expected processing in order to better target the automatic analysis.

Annotate your images with :

  • zones of elements on the image, with type and position
  • text transcriptions at any level (page, paragraph, line, word)
  • classifications
  • meta-data

TEKLIA can also provide you with the Callico interface upon request, which integrates with Arkindex for advanced collaborative annotation campaigns.


Annotation Documentation ->

Processing


Arkindex is a platform for executing any document processing algorithm: OCR, HTR, feature extraction, captioning, translation, etc. Its architecture has been designed to be generic, enabling it to store any type of result, with generic and configurable types.

The following processing types are possible with Arkindex:

Processing type

Description

Image Classification

Associate a class with an image or a portion of an image.

Object Detection

Detect an object in an image using a bounding box and identify its type.

Object Segmentation 

Detect the precise outline of an object in an image and identify its type.

Image Captionning

Generate a caption or a tag for an image

Transcription

Transcribe printed or handwritten text from an image.

Classification

Aociate a class with a text.

Key-value extraction

Extract information from an image or text in the form of a key-value association.

Table Recognition

Detect and transcribe information presented in the form of a table while preserving its structure.

Named entity Recognition

Named entity recognition.

Entity Linking

Link a named entity to an existing reference system.

Translation

Translate a text from a source language to a target language.

Geolocation

Associate GPS coordinates with an image or text.

Grouping Objects

Group elements in the same structure.


See our video tutorials

Workflow


Arkindex offers extensive capabilities, unmatched by its competitors, for managing complex workflows tailored to your document processing needs: 

  1. Customisable Workflow Design : Arkindex gives you the freedom to define complex workflows tailored to your unique processing requirements. From layout analysis and classification to text recognition (OCR/HTR), named entity recognition and metadata generation, you can curate each step to achieve your desired outcome.

  2. Real-time monitoring: With Arkindex, you can monitor the progress of each task within your workflow in real time. This powerful feature provides you with an estimated time of arrival for each step, ensuring you can make informed decisions and adjust resources as necessary. 

  3. Error Analysis & Rerun: Not all processes run perfectly every time. Arkindex understands this and provides tools to analyse any errors that may occur in your workflow. Once identified, you can easily rerun processes for those specific elements, ensuring consistency and accuracy. 

  4. Flexible Processing Nodes: To accommodate different infrastructure requirements, Arkindex provides the flexibility to distribute your processing tasks across multiple nodes. Whether it's on-premises, in a cloud environment or even on high performance clusters using SLURM, we've got you covered..

  5. Integration with custom & open source components: Arkindex is not limited to its built-in functionality. You can effortlessly define your processing steps using your proprietary code or benefit from the vast ocean of open source components available. Docker integration makes integrating these components easy.




Arkindex code and releases


We aim to produce high quality open-source software at TEKLIA:

Access the source code of the Python backend, the Vue.js frontend and more Arkindex tools and documentation.

View the details of our latest releases.

A question about Arkindex? 

Contact our team to learn more!
You can also visit our forum