Thematic signatures for cleansing and enriching place-related linked data

Abstract

There has been significant progress transforming semi-structured data about places into knowledge graphs that can be used in a wide variety of geographic information systems such as digital gazetteers or geographic information retrieval systems. For instance, in addition to information about events, actors, and objects, DBpedia contains data about hundreds of thousands of places from Wikipedia and publishes it as Linked Data. Repositories that store data about places are among the most interlinked hubs on the Linked Data cloud. However, most content about places resides in unstructured natural language text, and therefore it is not captured in these knowledge graphs. Instead, place representations are limited to facts such as their population counts, geographic locations, and relations to other entities, for example, headquarters of companies or historical figures. In this paper, we present a novel method to enrich the information stored about places in knowledge graphs using thematic signatures that are derived from unstructured text through the process of topic modeling. As proof of concept, we demonstrate that this enables the automatic categorization of articles into place types defined in the DBpedia ontology (e.g., mountain) and also provides a mechanism to infer relationships between place types that are not captured in existing ontologies. This method can also be used to uncover miscategorized places, which is a common problem arising from the automatic lifting of unstructured and semi-structured data.

Publication
International Journal of Geographical Information Science
Date
Links