This paper explores high-level scene interpretation with logic-based conceptual models. The main interest is in aggregates
which describe interesting co-occurrences of physical objects and their respective views in a scene. Interpretations consist
of instantiations of aggregate concepts supported by evidence from a scene. It is shown that flexible interpretation strategies
are possible which are important for cognitive vision, e.g. mixed bottom-up and top-down interpretation, exploitation of context,
recognition of intentions, task-driven focussing. The knowledge representation language is designed to easily map into a Description
Logics (DL), however, current DL systems do not (yet) offer services which match high-level vision interpretation requirements.
A table-laying scene is used as a guiding example. The work is part of the EU-project CogVis.