Where to go and what to do: Extracting leisure activity potentials from Web data on urban space


Web data is the most prominent source of information for deciding where to go and what to do. Exploiting this source for geographic analysis, however, does not come without difficulties. First, in recent years, the amount and diversity of available Web information about urban space have exploded, and it is therefore increasingly difficult to overview and exploit. Second, the bulk of information is in an unstructured form which is difficult to process and interpret by computers. Third, semi-structured sources, such as Web rankings, geolocated tags, check-ins, or mobile sensor data, do not fully reflect the more subtle qualities of a place, including the particular functions that make it attractive. In this article, we explore a method to capture leisure activity potentials from Web data on urban space using semantic topic models. We test three supervised multi-label machine learning strategies exploiting geolocated webtexts and place tags to estimate whether a given type of leisure activity is afforded or not. We train and validate these models on a manually curated dataset labeled with leisure ontology classes for the city of Zwolle, and discuss their potential for urban leisure and tourism research and related city policies and planning. We found that multi-label affordance estimation is not straightforward but can be made to work using both official webtexts and user-generated content on a medium semantic level. This opens up new opportunities for data-driven approaches to urban leisure and tourism studies.

Computers, Environment and Urban Systems