Efficient spacial indexing with R-trees
We’re halfway through oSoc right now, and we’ve made quite a bit of progress already. For those of you who aren’t already familiar with the #LopenInGent project, it’s about generating customizable, dynamic and interesting running routes through the city of Ghent. Our first problem was creating an accessible and single point of access for all of our city data. I’ll explain how we went about this issue and what issues we’re still facing going into the future.
Assessing data sources
First of all, we had to assess all of our data sources. Our primary data source comes from OpenStreetMap, which provides several APIs for accessing its data. Our secondary source is a combination of latitudes and longitudes of interesting places in Ghent, categorised by activity. To actually make this data useful we had to convert it into the data structure used by the already existing routing algorithm. This is currently a combination of vertices connected by edges. For those of you not familiar with graph theory, an edge connects two or more vertices together and it usually has a length property used to determine the shortest path between them. So you can think of an edge as a road connecting intersections with each other.
Producing usable data
Turning the OpenStreetMap data into usable data was really easy thanks to the already existing Osmosis tool. You can pass it certain parameters and it will give you the usable roads according to the configuration you provide it. However, mapping the interesting places to the edges proved to be a much harder task. We went about this issue by employing a grid-like data structure called an R-tree to map the intersections of edges and points.
You can compare an R-tree to a giant 2D or 3D grid of points, but instead of points we’re using boxes. To initialize the R-tree we had to convert about 75 000 edges into boxes. We went about this by taking the biggest and smallest latitude and longitude out of every edge and applying a padding to it, effectively creating a big box around every edge. After this, it was just a matter of feeding the R-tree a bunch of boxes which represent interesting points, whenever there’s an intersection we give the intersecting edge the label of the interesting point, like park, water or tourism.
Now that every edge or road has the appropriate labels. Whenever the user decides he wants to run in parks, the algorithm will favor edges with the park label, effectively making the algorithm biased towards parks. With this part done, we saved our progress to a PostgreSQL database for easy access by our application.
This week, we’re actually building the part that does the edge weighting manipulations so the algorithm knows how to favor certain labels. It’s totally head-spinning stuff at some times, but it’s really rewarding once it works.