Bidimensional visual languages integrating the user-interaction concept

The knowledge associated with the structure of the documents are modeled using bidimensional grammars and visual languages, and by studying new approaches based on constraint multiset grammars. The aim is to design generic methods for structured document analysis and composition.

The introduction of the user in structured document recognition process requires to model this interaction to be able to describe what are the possible interactions for the user, in association with the structural modeling of the document (bidimensional grammars).

With the introduction of the user in the analysis process we need to control the requests for the user. If the user interaction occurs on one isolated document, the interaction can be synchronous. On the other hand, during the treatment of a large collections of document, the challenge consists for the analyzer to collect requests and postpone the interaction with the user, to build an asynchronous interaction.

To elaborate evolving systems for structured document recognition we explore grammatical inference. This objective, that is already a real challenge for mono-dimensional grammars, is very complex for bidimensional grammars. Our strategy is to perform this inference with the help of user interaction and by focusing inference for the physical structure analysis.

Combining points of view for image interpretation

Combinig several ways of interpreting the content of a document can improve its recognition. Thus we study some mechanisms of knowledge fusion to combine the results various document analysis techniques that are usually studied separately. The knowledge fusion must be as flexible as possible, and if necessary in an asynchronous way.

First, we study different levels of analysis of the image : the analysis of multiresolution images enables to inspire from the human perceptive vision that detects salient objets in a document without specific knowledge.

Secondly, we explore some low level image processing techniques to extract some local primitives: line segment extraction with Kalman filtering, use of particle filter for text line extraction, word spotting using interest point detector, texture analysis\dots We also introduce some knowledge coming from other processes, such as the result of commercial OCR. All these kinds of contents have to be combined, depending on the studied kind of document, to process the best results.

At last, the originality of our work is to combine a structural analysis with the introduction of statistical data. This combination enables to exploit the expression power of a structural analysis while benefiting from the large range of statistical approaches.

Incremental learning and evolving fuzzy classifiers

To develop a robust and contextual recognition of the elements which form a printed or handwritten document, we design hybrid recognition methods (statistical / structural) which relies in particular on the theory of fuzzy logic to manage the inaccuracy of handwritten strokes.

Traditionally, a classification system is trained using a learning dataset under the supervision of an expert that controls and optimizes the learning process. The system performance is fundamentally related to the learning algorithm and the learning dataset. The classification system is delivered to the final user to be used in real applicative contexts. Typically, no learning algorithms are available at the user side.

The main weakness in the above-mentioned conception paradigm is that the knowledge base is constrained by the learning dataset available on the expert side and cannot be extended by the data provided on the user side. These drawbacks increase the need for new type of classification systems that can learn, adapt and evolve in a lifelong continuous manner.

For example, in the framework of on the fly composition of documents, it is interesting to allow user to choose its own set of gestures to assign them to different symbols or commands. In the context of interactive document recognition, it is essential to learn from the user interactions, the unknown symbols by integrating dynamically new classes of these symbols in the recognition system.

In evolving systems, incremental learning algorithms are used to learn from the data samples provided by the user after sending a validation or a correction signal in order to confirm or change the label suggested by the classifier. Contrary to the traditional paradigm, there is no separation between the learning phase and the operation phase in evolving classification systems. One of the key features in evolving classifiers is that incoming samples may bring in new unseen classes that are learned by the classifier without destroying its knowledge base or forgetting the existing classes.

IntuiDoc designs new incremental approaches for the learning of classification models based on first-order Takagi-Sugeno fuzzy inference systems. This approach includes, on the one hand, the adaptation of consequences of the fuzzy rules using the recursive least-squares method, and, on the other hand, an incremental learning of the antecedent of these rules according to the evolution of data density in the input space.

Pen- and Gesture-Based Interaction

Accuracy and robustness of developed systems are key elements for the User Acceptance. To cope with these requirements, the recognition systems have to be adjustable during their use in the application. We design an evolving recognition engine with an online, incremental, and lifelong learning process.

Portable, touch or pen capable devices such as smartphones, tablets and multitouch surfaces become more and more ubiquitous. Such devices allow for natural interaction via handwriting and gestures. For this sort of usage we designed personalisable gesture recognition engine. We aim to provide the user the possibility to define his own gesture library for various activities. For instance, we work on intuitive mechanism to get user feedback on recognizer answers, which allows the latter to continuously enhance its performance. We also design direct object manipulation such as rotation, zoom or translation…

Some complex applications need a lot of commands thus the definition of gesture commands and the gesture memorization becomes an important task. Our objective is to obtain natural fluid gestures and to help the user to learn as quickly as possible. The key point is to obtain a complete and customizable set of gestural commands to interact with applications: this induces to be able to design an auto-evolutional gesture recognition system and, in the same time, a framework to help the user to memorize his gestural command set.

The main approaches of gesture learning help are based on Marking Menus which propose two ways of utilization: a novice mode where the user has menus displayed to help him to finalize his gesture and an expert mode where he only needs to draw the required gesture and the recognizer will try to understand which command is invoked. All these approaches help the users to memorize gestures by making them practice drawing. Obviously, the final form of gestures strongly depends on the menu ergonomics.

In this way, we design the Customizable Gesture Menus which combine the advantages of marking menus and personalization capability to give the user an optimal memorization help with customizable set of gestures.

In this scientific area, it is vital to take into account the user and therefore the uses. This is why IntuiDoc actively collaborates with the multidisciplinary research platform LOUSTIC to assist the experiments on gestural commands and on the learning strategies which explicitly involves the final user.