This paper presents the work currently done at LAAS/CNRS about scene interpretation required for manipulation tasks by a mobile
arm. This task is composed of two steps: the approach of the mobile platform along the manipulation site and the grasping
itself. The paper focuses on the object recognition and localization: the approach step is performed by a simple laser-based
navigation procedure. For the grasping step, we use a CAD model of the object and discuss of the problems linked with such
a representation: visibility informations must be added so that recognition and grasping strategies could be selected in a
formal way. For the recognition, first matchings concerning discriminant patterns allow to generate a first prediction about
the object situation; an optimal verification viewpoint can be computed. From this new camera position, we search for maximal
sets of matched image features and model primitives; the best recognition hypothesis is determined by the best score. If no
prediction can be determined, the system may switch to other discriminant patterns or move the camera respectfull to the arm
and robot constraints.