CBAD – table subset

Description

In the context of a work on text-lines localization in handwritten documents containing tables, we proposed to evaluate our system on a subset of cBAD (Competition on Baseline Detection, ICDAR 2017 [1]) dataset (track B) that contains exclusively documents with tabular structures. The dataset of the cBAD competition is available here.

Identifying which structure can be consider as a tabular structure in cBAD dataset is not always obvious. This is why we consider the following rule in order to select the documents of the subset.

A document contains a tabular structure if at least one of those two properties is verified:

• the tabular structure is materialized by vertical and horizontal rulings,

• columns of the tabular structure are materialized by vertical rulings and those columns have names.

The table subset is composed of 315 documents (51 084 text-lines).

[1] M. Diem, F. Kleber, S. Fiel, T. Grüning, B. Gatos (2017, November). cbad: Icdar2017 competition on baseline detection. In 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR) (Vol. 1, pp. 1355-1360). IEEE.

Data

Download the list of images chosen for our table subset.