A Drupal case study on developing the Australian Broadcasting Corporation's Dig Music website. I gave this talk at Drupal Downunder #ddu2011 in Brisbane, Australia (Jan 23, 2011).
I discuss how the Semantic Web was used to create a real time snapshot of a musical artist that is pulled live from the digital radio broadcast.
I also talk about performance issues we encountered and ways that they were overcome.
21. Finding a Solution
• Which APIs to use
• Which APIs can we use
• How can we combine data from multiple
sources
• How can we automate it
22. The Curse of too Much
• There are over 50 APIs listed on
programmableweb.com
• Too many to look into
• Each has its own API methods and return data
formats
– JSON, XML, RSS, RDF !!!
23. Take your Pick
• APIs everywhere
– BBC Music
– Discogs
– Last.fm
– MusicBrainz
– Yahoo Music
– Flickr
– Youtube
– The Hype Machine
24. Finding the Key
• One common feature was the usage of a
MusicBrainz ID
– Last.fm
– Discogs
– Freebase
– Wikipedia/Dbpedia
– BBC
25. Eureka!
• Great, now all I had to do was use the
MusicBrainz API to look up the ID and I was
done. Easy...
• :(
• The search API sucked. It returned too many
fuzzy results
• crap
26. Back to the Future
• This is where the Semantic Web enters the
picture
– All that stuff about story telling
– Shared understanding
– URIs (web links)
33. Raw Data
• Not too pretty to look at
• But computers LOVE this stuff
34. So, what do we get
• Disambiguation
• MusicBrainz ID
• Discography
• Related Artists
• Official homepage
• Bio
• Credit card details (sometime in 2012)
35. The Rosetta Stone
• MusicBrainz ID is our key to the wild web of
APIs
• Wikipedia URL is the key to Semantic Web
• One happy family :)
http://www.flickr.com/photos/vportals/
40. Don’t use Drupal
• To get the best performance out of Drupal 6,
don’t use Drupal 6!
41. Pressflow
• Key patches and enhancements
• Releases mirror official Drupal releases
• Big players are using it
– Drupal.org
– ABC
– Music labels
– Newspapers
42. Start your Engines
MySQL base install is ... lacking
• MyISAM == slow
• Use Percona XtraDB
• ... or ... InnoDB
44. Search
• Drupal’s built in search can be a dawg
• Solr
– Much faster search
– Offers faceting
– Can become a platform in its own right
45. A Fresh Coat of Paint
• Varnish
– Last but certainly not least
– Up to millions of hits per hour
46. Performance Optimisations
• Switch host to Linode
• Two-server architecture - db server and app
server
• Master-slave relationship for mysql
• Migrated Drupal to Pressflow
• Changed tables to InnoDB
• Varnish for serving pages
• memcached for caching
• Setup munin to monitor servers