I presented this talk at the Open World Forum in Paris in 2013. The ideas here are that you can do basic recommendations and extended forms of recommendation such as intelligent search or cross recommendation or multi-modal recommendation using Mahout's cooccurrence analysis together with a search engine.
Note to speaker: Move quickly through 1st two slides just to set the tone of familiar use cases but somewhat complicated under-the-covers math and algorithms… You don’t need to explain or discuss these examples at this point… just mention one or twoTalk track: Machine learning shows up in many familiar everyday examples, from product recommendations to listing news topics to filtering out that nasty spam from email….
Note to trainers: the next series of slides start with a cartoon example just to set the pattern of how to find co-occurrence and use it to find indicators of what to recommend. Of course, real examples require a LOT of data of user-item interaction history to actually work, so this is just an analogy to get the idea across…
* A history of what everybody has done. Obviously this is just a cartoon because large numbers of users and interactions with items would be required to build a recommender* Next step will be to predict what a new user might like…
*Bob is the “new user” and getting apple is his history
*Here is where the recommendation engine needs to go to work…Note to trainer: you might see if audience calls out the answer before revealing next slide…
Now you see the idea of co-occurrence as a basis for recommendation…
*Now we have a new user, Amelia. Like everybody else, she gets a pony… what should the recommender offer her based on her history?
* Pony not interesting because it is so widespread that it does not differentiate a pattern
Note to trainer: This is the situation similar to that in which we started, with three users in our history. The difference is that now everybody got a pony. Bob has apple and pony but not a puppy…yet
*Binary matrix is stored sparsely
*Convert by MapReduce into a binary matrixNote to trainer: Whether consider apple to have occurred with self is open question
Old joke: all the world can be divided into 2 categories: Scotch tape and non-Scotch tape… This is a way to think about the co-occurrence
Only important co-occurrence is puppy follows apple
*Take that row of matrix and combine with all the meta data we might have…*Important thing to get from the co-occurrence matrix is this indicator..Cool thing: analogous to what a lot of recommendation engines do*This row forms the indicator field in a Solr document containing meta-data (you do NOT have to build a separate index for the indicators)Find the useful co-occurrence and get rid of the rest. Sparsify and get the anomalous co-occurrence
Note to trainer: take a little time to explore this here and on the next couple of slides. Details enlarged on next slide
*This indicator field is where the output of the Mahout recommendation engine are stored (the row from the indicator matrix that identified significant or interesting co-occurrence. *Keep in mind that this recommendation indicator data is added to the same original document in the Solr index that contains meta data for the item in question
Note to trainer: you could ask the class to consider which data is related… for example, the first 3 bullets of the query relate to meta data for the item, not to data produced by the recommendation algorithm. The last 3 bullets refer to data in the sample query related to data in the indicator field(s) that were produced by the Mahout recommendation engine.