The combination of vision and speech, together with the resulting necessity for formal representations, builds a central component
of an autonomous system. A robot that is supposed to navigate autonomously through space must be able to perceive its environment
as automatically as possible. But each recognition system has its own inherent limits. Especially a robot whose task is to
navigate through unknown terrain has to deal with unidentified or even unknown objects, thus compounding the recognition problem
still further. The system described in this paper takes this into account by trying to identify objects based on their functionality
where possible. To handle cases where recognition is insufficient, we examine here two further strategies: on the one hand,
the linguistic reference and labeling of the unidentified objects and, on the other hand, ontological deduction. This approach
then connects the probabilistic area of object recognition with the logical area of formal reasoning. In order to support
formal reasoning, additional relational scene information has to be supplied by the recognition system. Moreover, for a sound
ontological basis for these reasoning tasks, it is necessary to define a domain ontology that provides for the representation
of real-world objects and their corresponding spatial relations in linguistic and physical respects. Physical spatial relations
and objects are measured by the visual system, whereas linguistic spatial relations and objects are required for interactions
with a user.