TomTom is handling large volumes of geospatial data. The shape of this data poses some unique challenges but also some opportunities to exploit when it comes to distributed processing. In this talk we shed some light on the data processing pipeline we have built and do a deep dive into geo spatial indexing on top of HBase.
So how do we handle that multitude of concurrent writes? PostgreSQL cluster for transactional integrity as well
However, we also need to read lots of features to create our output. How do we do this?
So we can store the data in a read-optimized way. How can we read from it in a performant way?
So we can store the data in a read-optimized way. How can we read from it in a performant way?
So we can store the data in a read-optimized way. How can we read from it in a performant way?
We are now capable of reading efficiently from HBase. So how do we transform that data?