What to do with all that memory in a Hadoop cluster? Should we load all of our data into memory to process it?
The goal should be to put memory into its right place in the storage hierarchy, alongside disk and solid-state drives (SSD). Data should reside in the right place for how it is being used, and should be organized appropriately for where it resides. This proposed solution requires a new kind of data set called the Discardable, In-Memory, Materialized Query (DIMMQ).
In this session we will talk through how we can build on existing Hadoop facilities to deliver three key underlying concepts that enable this approach.
This software does not yet exist. I hope and believe that it will. I don’t believe in promoting vaporware! I want to have a discussion about where Hadoop is going so as many of us can shape it. This seems like a great time and place to have it. My background is in database. I think like a database guy. I’ve been thinking really hard about how to bring the “good stuff” from databases over to Hadoop I have a deep knowledge of BI, and what BI tools need from their underlying data system, having written Mondrian
Hadoop today brings a lot of brute force.
It can be more effective if it were a bit smarter. We cannot make effective use of memory unless we are a bit smarter.
Lots of memory now. But still not enough
Bringing all kinds of data into the Hadoop tent - Batch, interactive, streaming, iterative Declarative, algebraic, transparent - Applications use different copies of data without knowing it Making Hadoop smarter – Adding brains to compliment Hadoop’s formidable brawn
Discardable In-Memory Materialized Queries With Hadoop