Undergraduate Thesis: Network structural properties and their application to missing property prediction
The volume of available structured data is increasing, particularly in the form of Linked Data, where relationships between individual pieces of data are encoded by a graph-like structure. Despite increasing scales of the data, the use and applicability of these resources is currently limited by mistakes and omissions in the linked data.
In this diploma thesis, I study the problem of predicting missing relation-types (properties) for nodes (objects) in large-scale knowledge graphs.
Image on the left depicts a tiny example of a knowledge graph with numerious objects and relations between these objects. Three objects are of particular interest: Audi, Mercedes-Benz and Fiat.
My thesis addresses the problem of learning the graph structure to predict the missing relations: learn from objects, Audi and Mercedes-Benz, that Fiat is missing relations name, subsidiary and parentCompany.
I address the problem by encoding the objects with numerous local and global graph descriptors. Local descriptors are relation-distributions to neighbours and directionality of relations, and global descriptors are shortest paths and diffusion kernels on graphs. These object-descriptors were then used to learn to predict the missing properties of objects.
I apply the methods to large knowledge bases, DBpedia (based on Wikipedia) and Freebase (subset of Google Knowledge Graph), with hundreds of millions of nodes (objects) and edges (relations).