This text was written in the frame of a European research project, GUIB (EC/DGXIII/TIDE 103: Graphical User Interfaces for the Blind). The text is a working document for the participants of the project. Some parts are only relevant as such. The full text is about 50 pages long. The file is in RTF format which you can read with MS-Word. Its size is150 K.


The increasing use of graphical user interfaces (GUIs) in today's computers is introducing major problems in access to information and information processing by blind computer users. GUIs implicitly assume the understanding of graphical image information. Image understanding of e.g. raised line drawings through touch may be difficult in general for blind people, but experience shows that an understanding of well designed graphics is possible.

In designing technology for improving the access to computerised information by blind people, it is important to avoid whenever possible limiting factors in the information flow between its source (the computer) and the (blind) end user. This can be achieved by modelling the information flow and by analysing the design in the framework of the model. This text proposes a model for image understanding, situates it with respect to other approaches, and analyses the model step by step.

In different models, the words used may correspond to different concepts. As an example, image understanding in robotics usually means the interpretation of camera images for correct positioning, whereas it means the correct detection and classification of objects in pattern recognition applications such as remote sensing. In our model, image understanding means the extraction of as much relevant information out of an image as present in it and its context. We assume thereby that our perceptive channels as well as our classification and recognition schemes (whether they are direct or indirect) are goal dependent. We also broaden the field of image understanding to encompass the integration of non-pictorial knowledge, such as the imaging context (e.g. location and time), ecological information or socio-cultural information (e.g. historical data). This is of special relevance to blind people who have to rely on all information channels and therefore on all directly and indirectly related information which is available to substitute for the missing visual information.

The model for image understanding which we propose, assumes that one can learn to understand images and that a learning methodology associated with an explicit model based on active exploration can be described. Because of the importance of vision for image understanding, the model often refers to this sense, but the model is build with tactual perception in mind.

A first important fact on which the model is based is that perception is goal oriented. This is made explicit by describing 4 classes of `looking' to `see':

• to orient (using landmarks) • to take (distance measures) • to classify and recognise (pattern recognition) • to explore (different pathways) It is shown that whereas the first three give rise to apparently `direct' perception, the exploratory perception is `indirect'. The text also points to some fundamental differences between the senses:

• vision is the most important quantitative source of knowledge gathering for sighted people • it presents itself in a strongly structured manner • there is a high coherence between vision and the mental image of the world which a sighted person has • vision is always external as opposed to touch which is a surface (skin) phenomenon (sometimes artificially extended by a cane) • touch is limited to what is within reach and therefore spatial structuring based on touch is essentially limited to body size, whereas vision extends to infinity • in vision and in hearing, the source and the strength of the external signal often drives our perception because it attracts and stimulates our attention. Visual and auditory perception is therefore only partly dependent on exploration strategies. In tactual perception, there is no attraction by external signals: we need direct contact. Therefore, the tactual perception is much more strategy dependent.

The model which we will propose takes these observations as starting points and describes the tactile picture exploratory understanding process as an indirect process, modelled by a dynamic information flow through a set of sequential modules representing:

In contrast to this approach, the ecological perception theory developed by Gibson states that we perceive what the objects and events can afford for us as individuals by reflecting the light in such a way that invariant characteristics of the object are directly specified by the structure of the light. Even complex objects can therefore be directly perceived. It is therefore not compatible with our indirect model which is based on the perception of elementary objects, indirectly structured in our brain as complex ones. Both models are explained in the text and the reasons why the ecological direct perception theory is not considered adequate for exploratory picture understanding are discussed.

The ecological theory states that all perception occurs without inference of memory or knowledge. Nowadays, we have proof from studies by means of positron emission tomography that the same stimulus can be perceived in multiple ways, according to the goal of the perception and that memory and knowledge influence the perception mechanism. The ecological theory is not compatible with this experimental finding.

Gibson's theory assumes an exclusively optical basis for the perception, but obviously, the same approach can be extended to the acoustic perception of sounds and to tactile perception of textures. However, based on the theory, Gibson and others predicted that haptic pictures would be meaningless to blind people. Today, enough experiments have shown the opposite and demonstrate that the traditional ecological model cannot be used to explain tactile picture processing.

A fundamental criticism with respect to the ecological model and which applies to both visual and tactual picture perception, involves the problem of duality and ambiguity. Pictures can be observed in two ways: either as a flat set of lines and patches, or for what they may represent. In simple pictures such as a circular line on a flat white sheet of paper, what is represented? the circular line?, a coin?, the outline of a sphere?, a hole? The ecological theory offers no valid framework to understand the dual nature of pictures and the ambiguity of the pictorial information. Because of this, we do not consider the ecological perception theory to be an adequate model for describing the understanding of pictorial information, including GUIs.

The detailed description of the proposed `indirect' exploratory perception model starts with a short overview of techniques which can or could be used for the computer output of graphics:

Each of these techniques (except Nomad which allows the association of sound with locations on a picture) varies in terms of resolution, production time, ease of use, sensorial quality, and availability. The physical perception paragraph emphasises that the blind person's perception is task related as it depends on the goal he wants to achieve. The acuity or the two-point threshold is a logical parameter to consider if we think of the different technical means for transduction. It is the minimal distance between two points which is needed to perceive them as separate points. It is typically smaller than two millimetres (± 1.66 mm ). For active perception of a three dimensional object the two-point threshold is not a highly relevant parameter. When during exploration both hands and fingers are moved, the movements of the hands can be coded as kinesthetic sequences (left, right, half turn,..), and the textural perception by the fingertips associated with them. To code such information, one needs some additional cues or supplementary information to be used as a reference cue or frame. This may be sound or the body which can act as a coordinate reference frame for touch (given that touch is always to objects within reach of the body). An often stated point is that blind people have exceptional hearing and touch as compared to sighted people. There is no experimental evidence that hearing ability improves as a function of loss of sight, but there is evidence that training by blind and sighted people improves the use of the perceived sound. Concerning touch, the two-point threshold is not significantly different for sighted and blind people, except for those blind people suffering from diabetic retinopathy: their sense of touch is impaired. The third step in the model is classification, recognition and mental representation building. Indeed, the perceived stimuli have to be structured mentally in spatial representations. This structuring is not unique in the sense that the same image or picture can be structured, and therefore described, in several ways, depending on the goal of the structuring. This structuring in more or less complex objects or entities is only possible when the concepts underlying the structuring elements are known to the observer, hence the extensive discussion on the importance of concepts and the information on how to test or teach concepts. A special section is devoted to the concepts associated with the graphical elements in GUIs. Recognition - perception cycles are used in the model to describe the iterative process which leads to a valid mental representation. A mental representation is assumed valid by an observer when the representation corresponds to what he perceives. Ideally, the mental representation is now a faithful visual or tactile 'photograph' of the picture and only that. Until this point of the model, the perception and representation did only involve pictorial information. The model now integrates general know-how, such as causality, personal and cultural factors. Psychological elements are important in this discussion: although touch is the most fundamental and vital of our senses and therefore of major importance for the harmonious development of our personality, we are often embarrassed when touching. In observing blind people touching artwork, we have found that blind people who feel free to touch and like to do so show in general a performant tactual recognition, and also feel more emotions towards what they touch. The ensemble of cultural and psychological characteristics will influence the personal interpretation of the perceived information by the observer. He now has to use his understanding to achieve his original goal and to integrate this experience for future use. This step closes the description of the model. A comprehensive reference list and an annex which lists important and common concepts close the report.