Περίληψη σε άλλη γλώσσα
Digital multimedia content is omnipresent on the Web; Google posted on August 2005, a total image size of 2; 187; 212; 422, Yahoo estimated that its index covered 1:5 billion of images at that time, while nowadays statistics show a continuous growth in these numbers (indicatively Flickr uploads amount to an average of about 3000 images per minute). Given such numbers, the availability of machine processable semantic descriptions for this content becomes a key factor for the realisation of applications of practical interest, perpetuating the challenge of what constitutes the multimedia community holy grail, i.e. the semantic gap between representations that can be automatically extracted and the underlying meaning. In the late 1970s and early 1980s, inuenced by the AI paradigm, the analysis and understanding of audiovisual content became a problem of achieving intelligent behaviour by simulating what humans know through computational means. Hence, the rst attempts towards knowledge-dire ...
Digital multimedia content is omnipresent on the Web; Google posted on August 2005, a total image size of 2; 187; 212; 422, Yahoo estimated that its index covered 1:5 billion of images at that time, while nowadays statistics show a continuous growth in these numbers (indicatively Flickr uploads amount to an average of about 3000 images per minute). Given such numbers, the availability of machine processable semantic descriptions for this content becomes a key factor for the realisation of applications of practical interest, perpetuating the challenge of what constitutes the multimedia community holy grail, i.e. the semantic gap between representations that can be automatically extracted and the underlying meaning. In the late 1970s and early 1980s, inuenced by the AI paradigm, the analysis and understanding of audiovisual content became a problem of achieving intelligent behaviour by simulating what humans know through computational means. Hence, the rst attempts towards knowledge-directed image analysis emerged. A period of explosive growth in approaches conditioned by knowledge followed: varying knowledge representation and reasoning schemes, in accordance with the contemporary AI assets, were proposed, and knowledge attempted to address all aspects involved, ranging from perceptual characteristics of the visual manifestations to control strategies. The broad and ambitious scope targeted by the use of knowledge, resulted in representations and reasoning mechanisms that exhibited high complexity and inexibility, while the lack of well-founded semantics further reduced e#cacy and interoperability. Research focus shifted to machine learning, which gained particular popularity as means for capturing knowledge that cannot be represented e ectively or explicitly. Recent analysis in multimedia has reached a point where detectors can be learned in a generic fashion for a signi cant number of conceptual entities. The obtained performance however exhibits versatile behaviour, reecting implications over the training set selection, similarities in visual manifestations of distinct conceptual entities, and appearance variations of the conceptual entities. A factor partially accountable for these limitations relates to the fact that machine learning techniques realise the transition from visual features to conceptual entities based solely on information regarding perceptual features. Hence, a signi cant part of the knowledge pertaining to the semantics underlying the interpretation is missed. The advent of the Semantic Web paved a new era in knowledge sharing, reuse and interoperability, by making formal semantics explicit and machine understandable rather than just machine processable. The multimedia community embraced the new technologies, utilising ontologies at rst in order to attach explicit meaning to the produced annotations (at the content and the media layers), and subsequently as means for assisting the very extraction of the annotations. The state of the art with respect to the latter approaches is characterised by particular features, among which the poor handling of uncertainty, the restricted utilisation of formal semantics and inference services, and by focus on representing associations between perceptual features and domain entities rather than logical relations between the domain entities or on modelling analysis aspects. This thesis addresses the problem of how enhanced semantic descriptions of visual content may be automatically derived through the utilisation of formal semantics and reasoning, and how the domain speci c descriptions can be transparently integrated with media related ones referring to the structure of the content. The central contributions of the thesis lie in: i) the de nition of a uni ed representation of the domain speci c knowledge required for the extraction of semantics and of the analysis speci c knowledge that implements the process of extraction, ii) the development of a formal reasoning framework that supports uncertainty handling for the purpose of the semantic integration and enrichment of initial descriptions deriving from di erent analysis systems, and iii) the de nition of an MPEG-7 compliant ontology that formally captures the structure of multimedia content allowing for precise semantics and for serving as means for the de nition of mappings between the existing ontologies addressing multimedia content structural aspects. The rst refers to a uni ed ontology-based knowledge representation framework that allows one to model the process of extracting semantic descriptions in accordance to perceptual and conceptual aspects of the knowledge characterising the speci c domain. The use of ontologies for both knowledge components enhances the potential of sharing and reuse of the respective components, but most importantly enables the extensibility of the framework to other application domains and its sharing across di erent systems. More speci cally, semantic concepts in the context of the examined domain are de ned in an ontology, enriched with qualitative attributes (e.g., color homogeneity), low-level features (e.g., color model components distribution), object spatial relations, and multimedia processing methods (e.g., color clustering). The RDF(S) language has been used for the representation of the developed domain and analysis ontologies, while for the rules that determine how tools for multimedia analysis should be applied depending on concept attributes and low-level features, are expressed in F-Logic. The second part of the contribution refers to the development of a fuzzy DL-based reasoning framework in order to integrate image annotations at scene and region level, into a semantically consistent nal description, further enhanced by means of inference. The use of fuzzy DLs semantics allow to formally handle the uncertainty that charasterises multimedia analysis and understanding, while the use of DLs allows to bene t from the high expressivity and the e#cient reasoning algorithms in the management of the domain speci c semantics. The initial annotations forming the input may come from di erent modalities and analysis implementations, and their degrees can be re-adjusted using weights to specify the reliability of the corresponding analysis technique or modality. Finally, the third part tackles the engineering of a multimedia ontology, and more speci cally of one addressing aspects related to the structure and decomposition schemes of multimedia content. The existing MPEG-7 based ontologies despite induced by the need for formal descriptions and precise semantics raise new interoperability issues as they build on di erent rationales and set to serve varying roles. The ontology developed within the context of this thesis re-engineers part of the MPEG-7 speci cations to ensure precise semantics and transparency of meaning
περισσότερα