CBAD – table subset

Description

In the context of a work on text-lines localization in handwritten documents containing tables, we proposed to evaluate our system on a subset of cBAD (Competition on Baseline Detection, ICDAR 2017 [1]) dataset (track B) that contains exclusively documents with tabular structures. The dataset of the cBAD competition is available here.

Identifying which structure can be consider as a tabular structure in cBAD dataset is not always obvious. This is why we consider the following rule in order to select the documents of the subset.

A document contains a tabular structure if at least one of those two properties is verified:

• the tabular structure is materialized by vertical and horizontal rulings,

• columns of the tabular structure are materialized by vertical rulings and those columns have names.

The table subset is composed of 315 documents (51 084 text-lines).

[1] M. Diem, F. Kleber, S. Fiel, T. Grüning, B. Gatos (2017, November). cbad: Icdar2017 competition on baseline detection. In 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR) (Vol. 1, pp. 1355-1360). IEEE.

Data

Download the list of images chosen for our table subset.

Équipe IntuiDoc
Privacy Overview

This website uses cookies so that we can provide you with the best user experience possible. Cookie information is stored in your browser and performs functions such as recognising you when you return to our website and helping our team to understand which sections of the website you find most interesting and useful.