GTFS Part 2: Think out of the box

For the GTFS-project we (@timtijssens and @brechtvdv) have to make Open Data from the Belgian railway company NMBS. In this blog post we’ll explain some technical issues we had and how we solved those.

For those who don’t know what GTFS is about, you can read our first blog post.

Train schedules

On which days does a certain train ride? That’s one of the main questions we had to solve. The GTFS-reference says you have to make a calendar-file where you specify which weekdays (Monday, Tuesday …) the train rides. You also have to specify the start- and enddates on which this pattern occurs.

Ok, this doesn’t sound so difficult, does it?

The part of getting the right data, that is the real snag. We were able to scrape text that describes these calendars from the NMBS website (see picture previous blog post). The only thing we had to do was writing a parser that converts this text into useful data.

Jenga photo 440px-Jenga_zpsxidpkdic.jpg
Parsing, it will be fun, they said

(c) https://en.wikipedia.org/wiki/Jenga

After writing a lot of code in PHP, the parser started to handle most use cases pretty well, but everytime we thought we had the solution, a new use case arose. Later on, we found out that the website info isn’t consistent: exceptions prove the rule, right?

 

Our solution:

embrace_exceptions photo exceptions_zpsh1hme8rw.jpg
To make sure the data we deliver is of high quality we had to acknowledge the fact that there are a LOT of exceptions when trains ride. Luckily, the GTFS-reference says we can use calendar_dates to add exceptions. So we specified a rule for every day a train rides inside this file. This way, a lot of rules are introduced, but we are now certain that the data is of high quality.

 

Beta coming this week!

We will open a beta-version of the GTFS-files and GTFS-RealTime feed this week, so stay tuned for our next blog post!

Skip to toolbar