| Intro
Although many efforts aim to the automatic discovery of equivalence relations between the elements of ontologies, we believe that this is not enough: To deal effectively with the ontologies’ alignment problem, we have to deal with the discovery of subsumption relations among ontology elements. This is particularly true when we deal with ontologies at a different “granularity level”, i.e. with ontologies where the elements of the one are more generic than the elements formalized by the other. This implies that although in some cases subsumption relations between the elements of two ontologies may be deduced by the equivalence relations of other elements (by means of the reasoning mechanisms used), in the general case where no equivalence relations exist, this can not be done. In any case, we conjecture that the discovery of subsumption relations between elements of different ontologies may further facilitate the discovery/filtering of equivalence relations, and vise-versa, augmenting the effectiveness of our ontology alignment and merging methods. This is of great importance when dealing with real-world ontologies, where, as it is also stated in the conclusions of the Consensus Track of OAEI 06, current state of the art systems “confuse” subsumption relations with equivalence ones.
CSR briefly
CSR computes subsumption relations between concept pairs of two distinct ontologies by means of a classification task, using decision trees, and by exploiting the selected features. Given a pair of concepts, the supervised machine learning method “locates” a hypothesis concerning their relation in a space of hypotheses, which best fits (but not restricted) to the training examples, generalizing beyond them. Concept pairs are represented as feature vectors of length equal to the number of the distinct selected features of source and target ontologies. The training examples for the learning method are being generated from the target and source ontologies.
Features
Although other features may be used, currently we study the importance of:
- concepts’ properties to assessing the subsumption between concepts: This is an important first step to assessing subsumption relations among concepts, since (a) it appeals to our intuition about the importance of properties as distinguishing characteristics of classes of entities, (b) it makes the least possible commitment to the precision of any method for the discovery of equivalence relations among ontology elements, (c) it provides a basic method that can be further enhanced with other concepts’ distinguishing features (e.g., concepts in a given vicinity), and can be further combined with other alignment methods.
- words occurring in both input ontologies. Specifically, for each concept of the input ontologies, words are being extracted from its “vicinity”, as it is specified by the following rule: Given a concept, the method extracts words occuring in the local name, label and comments of concepts, from all of its properties (exploiting the properties’ local names, labels and comments), as well as from all of its related concepts. Finally, words from all instances of the corresponding concept are being extracted. As far as the use of words is concerned, (a) their use for describing the intended meaning of concepts appeals to our intuition, and (b) it does not necessitate the use of any method for the discovery of equivalence relations among ontology elements.
Why Machine Learning
The machine learning approach has been chosen since (a) there are no evident generic rules directly capturing the existence of a subsumption relation between ontology elements (e.g., by means of their surface appearance - same or similar labels) and (b) concept pairs of the same ontology provide examples for the subsumption relation, making the method self-adapting to idiosyncrasies of specific domains, and non-dependant to external resources. |