OpenStreetMap: Access, data mining and data manipulation

2 minute read

Published:


OpenStreetMap: Access, data mining and data manipulation

OSM OpenStreetMap (OSM) is a wonderful source of geospatial data, constantly growing with the support of amazing volunteers. To use the data for any purpose, I like to query it via overpass-turbo by using its overpass query language (OQL). For a small number of requests, we can use the default web-page for our query and export the data directly in a geojson (gpx and kml are also possible). We can also access data programmatically with Python very easily. overpy assist in the access and basic processing of this data. An example can be found in a previous post where in the first part, we get the data from OSM using OQL itself. Additionally, this is a great article, to begin with OQL in Python. For heavy usage, I would recommend setting your own overpass server, which with docker, is quite easy. You can check out the docker setup from the github which is forked from an existing repository which I basically updated for a newer working version of the overpass and some notes to troubleshoot. If you are on a smaller memory machine then you might quickly run out of memory. To overcome this, configure the docker settings along with the smaller OSM file (e.g. cropping the OSM data to a city level). One of the alternatives is to crop the data to an ROI is described in the following.

The OSM dataset can be downloaded from the generous servers of geofgeofabrik. They have organized the OSM data at different levels (e.g. planet>continent>country>state). For some regions (e.g. India), the dataset is not divided into states. In addition to it, in a certain situation (like to out of memory issue with docker as above-mentioned), we might only want to have the only subset of the entire dataset (e.g. a city). In these scenarios, osmconvert comes handy. The binaries can directly be downloaded and then we are good to go. With this tool, we can define a rectangle (polygon is also possible) to crop the dataset. For a detailed list of options just execute: osmconvert --help. After downloading data for India, we can crop the approximate region for Delhi by:

osmconvert india-latest.osm.pbf -b=76.908869,28.345084,77.443079,28.732346 --complete-ways -o=delhi.pbf

It is important to note that we have to provide the southwestern corner first and then the northeastern corner. Besides, we need to switch the coordinates what we get from google map.