Describing audio-visual documents amounts to consider documentary aspects (the structure) as well as conceptual aspects (the
content). In this paper, we propose an architecture which describes formally the content of the videos and which constrains
the structure of their descriptions. This work is based on languages and technologies underlying the Semantic Web and in particular
ontologies. Therefore, we propose to combine emerging Web standards, namely MPEG-7/XML Schema for the structural part and
OWL/RDF for the knowledge part of the description. Finally, our work offers reasoning support on both aspects when querying
a database of videos.