ImadocSen-OnDB: Imadoc on-line handwritten sentence database


ImadocSen-OnDB is a database containing on-line handwritten sentences made up of English words in lowercase letters. It can be used to train and evaluate handwriting recognition systems, either on sentences or on isolated words, or for writer identification tasks. The data were collected on Tablet PCs.

The first version of this database was published in [Quiniou2005], at ICDAR 2005. The size of the database has been growing ever since [Quiniou2009a] (see References for further details).

The sentences have been written from texts of the Brown corpus [Francis1979] (see References). The database contains files in the InkML format, in which each sentence is saved as well as information on the acquisition device, on the writer, and on the sentence transcription. The words of the sentences, that were manually extracted, are also given and can be used to perform isolated word recognition, for example.
The data collection protocol, as well as the storage format, is described in more details in the given files (file dataset_infos.txt, in the zipfile).

Examples of on-line handwritten sentences from the database


ImadocSen-OnDB is structured as follows (on 11/20/2010):

* 51 writers (including 42 different writers)
* 1,017 handwritten sentences
* 15,849 extracted words


The handwritten sentence database can be downloaded as a zipfile containing all the data as well as information on the data collected.

