4. Multiple ad systems and content platforms Content platforms: Blogsmith Huffington Post (Movable type) 5min Truveo StudioNow DAM New York 2011 Page 4 Some ad systems: AdTech Advertising.com Feedpoint/Dynamic Banners
5. All speaking different languages… DAM New York 2011 Page 5 Tag.aol.com “beyonce” Tag… “beyonceknowles” AOL Music “beyonce” AOL music “beyonceknowles” Moviefone “beyonceknowles” Huffington Post “beyonce” H… Post “beyonceknowles”
6. What we were asked to do Effectively and granularly classify content: For improved ad sales To relate content within and between the brands In some cases, to assist editors with external-facing tags All sorts of other bits of magic (which will be touched on later) DAM New York 2011 Page6
8. Faceted Ontology DAM New York 2011 Page 8 “…structural frameworks for organizing information on the semantic Web and within semantic enterprises. They provide unique benefits in discovery, flexible access, and information integration due to their inherent connectedness; that is, their ability to represent conceptual relationships. ” -M.K. Bergman, “An Executive Intro to Ontologies” http://www.mkbergman.com/900/an-executive-intro-to-ontologies/
9. Subjects We have approx. 6800 subjects Generally hierarchical, but some associative relationships Iterative process with editors (subject specialists) 12 Top levels (or classes) DAM New York 2011 Page 9 Arts and Humanities Education Entertainment Health and Medicine Lifestyle Money and Finance News and Politics Science and Tech Social Sciences Sports Transportation Travel and Tourism
10. Entities Named Things (includes persons) Locations Works Events Groups Brands Products DAM New York 2011 Page 10 Proper nouns (specific persons, places, things) Not hierarchical, but rather associative relationships 7 Entities Vocabularies
13. HELLO TEL AVIV! When we were tasked with this, we had very little direct communication with the team in Tel Aviv that runs the classification engine… We also were under the impression that auto-classification was their issue and they’d just have to classify with whatever we gave them. This was WRONG! DAM New York 2011 Page 13
14. Train in vain? DAM New York 2011 Page 14 ‘Women's Shoes’ We had to find training data for each subject in the taxonomy… and are continually doing so to improve classification.
15. DAM New York 2011 Page 15 More Contact with the Classification Team Providing Feedback on tagging results Collaborating on priorities What data is most valuable to the tagger? Getting to Know You
16. Turning large amounts of data into an ontology DAM New York 2011 Page 16 More data sources means multiple records for the same Entity More sources = More effort required in Merging records Name: Beyoncé MusicPerson MoviePerson Alias (synonym): Beyonce Knowles Alias (synonym): Beyonce Source:Wikipedia Source: AolMusicDB Source: AolMovieDB After Merge, one record remains with metadata and relationships from all sources More sources = More valuable records
18. DAM New York 2011 Page 18 Integrating with Advertising systems Our subjects can be mapped to Advertising categories to serve ads for related products Current Department Store campaign: Page 18
21. On the Roadmap… More projects with Advertising teams More data in our ontology to make classification better Refining the ontology- because it’s a living thing DAM New York 2011 Page 21
23. Life lessons… Keep your eye on the prize Expect people to think this is a much smaller task than it is Don’t reinvent the wheel Never underestimate the power of the ability to manipulate data DAM New York 2011 Page 23
Editor's Notes
How many of you knew that all of these are owned by aolHow many of these were purchased since we started the taxonomy process
Photo platform (mention it)At a minimum, 3 ad systems that we’ve had to deal with
url to link out here
Ad Sales: so products with some relation to the article can be served2.Relating content: Within: e.g. Someone on Aol Music can see all Beyonce articles Between: see Beyonce articles on Moviefone, Stylelist, Popeater: keep people on Aol sites instead of linking out3. Assist editors: standardize tags so content not being lost without relationships – can’t find it if not tagged properly
Difference between taxo and onto
Be flexible and remember your purpose (for us its aol content)Subjects may be called topics/categories in other placesSubjects describe ‘aboutness’ of an articlee.g. Report on world series is about ‘Baseball’e.g. Article about best airlines is about ‘Air Travel’
We have around 3.8 million and countingTogether subjects and entities make up the taxonomy
More Contact with the Classification Team Providing Feedback on tagging results Collaborating on priorities Focus on what is most valuable to the tagger
Mix of NLP and machine learningPicks up important related terms that imply content is about a subject (heels, flats, etc).. Brands..etcMention that now entities extracted can actually improve subject taggingDMOZ: Voluntary human-edited directory of the web: lists of websites by subject
One record will have multiple node types, aliases, metadata will be brought together: albums, date of birth, marriedto, spokesperson for brandVery rich records result: opportunity to create multiple relationships
Subjects and entitiesWe met with teams, one thing they liked was the fact they could tag a ‘master version’ with a taxonomy ID-Bring all articles mentioning ‘Charlie Sheen’ together, just like the Beyonce example not different versions like charliesheen,charlie sheen, charlie+sheen