Hacking RSS: Filtering & Processing Obscene Amounts of Information (short version)
1. Hacking RSS:
Filtering & Processing
Obscene Amounts of Information
#hackingRSS
Dawn Foster
Intel Community Manager
for MeeGo
dawn@fastwonder.com
2. Information Overload
CD Photo: http://www.flickr.com/photos/chefranden/2751354004/
3. Who Cares?
● Most of it is …
– complete crap
– out of date / obsolete
– not interesting to you
– irrelevant for you
Junk Pile: http://www.flickr.com/photos/zen/4013525/
4. You Want to Find the Needle
Haystacks: http://www.flickr.com/photos/rasekh/4911673659/
5. RSS Alone is a Start
● Sources you care about delivered right to you. But …
– Do you care about everything in each feed?
– What about the feeds you aren't subscribed to?
– Can you keep up with what you have?
6. Prioritize Your Reader
● Put things you care about at the top
● Categorize
● Don't try to read everything
7. The Real Magic is in Filtering RSS
Complete Crap
Interesting
Maybe Relevant
Yay!
● In my Google Reader right now:
– Analyst research blogs mentioning Online Community
– Analyst research blogs mentioning MeeGo
– Searches across social sites mentioning me, my projects, my
websites etc. - filtering out things I don't care about
– My favorite blogs filtered using PostRank to find only the
ones with a lot of comments or social mentions
8. RSS Filtering Tools
● Yahoo Pipes (my favorite)
– More powerful & fexible: options to filter any data found in
any field in the rss feed (URL, title, description, author …)
– Downside: takes some time to learn & can be a little faky at
times. Also a single point of failure if Yahoo ever killed it.
● Other Options
– FeedRinse: easy to use, not as fexible. Import RSS feeds,
add filters, get new RSS feeds out.
– RSS readers with filtering / alerts (FeedDemon)
– Code: write your own filters
– Note: many free RSS filtering services have gone out of
business – can be bandwidth intensive & costly to host.
10. PostRank
● Best Posts in a
feed
● Ranked on
engagement (links,
sharing, comments)
● Can get output as
RSS feed
● Feed includes
postrank number as
a field
11. What's In a Feed? PostRank (Yahoo Pipes View)
● Content in feeds varies wildly depending on site.
● Common: title, author, pubDate, link, content, description
● Site-specific: postrank, lat/long, image links, username,
twitter source … (most RSS readers don't show these)
● API: usually has additional data & can output RSS
● If it's in the feed, you can use it!
12. Reformatting / Modifying RSS Feeds
Don't be satisfied with default RSS feed formats!
Twitter
Search
Twitter
RSS
Feed
Modify & more quickly scan key data
13. Yahoo Pipes: Reformat Twitter Feed
● Input:
– Twitter Search
feed
● Loop String Build:
– Author
– : (spacing)
– Title
● Loop Assign:
– Store result back
into title
● Output:
– 1 RSS feed
– Efficient format
14. BackTweets (BackType API)
● Data about links on
Twitter
● Finds links regardless of
shortening service
● No RSS Feeds
● But … You can use
API + Pipes to build
one!
15. BackType + Twitter API + Pipes Output
● Data from BackType + Twitter
● Built an RSS feed using Yahoo Pipes
● Included the information relevant for me
● Could have included or filtered on: name, listed count,
location, profile image, user URL, ...
16. Admit it, we ALL do vanity searches
● You can enter your search queries in Google, Twitter,
Flickr …
– Add a new project & have to update all of them
– Can be hard to filter out some results
– May have duplicates from multiple searches
● Yahoo Pipes
– Update keywords in a CSV file
– Use CSV file as input into a bunch of searches (RSS or
API inputs)
– Filter out what you don't want
– Get 1 filtered RSS feed as output
2 minute video: http://fastwonderblog.com/2009/05/01/keyword-csv-files-and-searching-2-minute-yahoo-pipes-demo/
17. How Should / Shouldn't You Use All of This?
● Do:
– Use this for personal productivity
– Play around, create prototypes and understand the possibilities
● Don't:
– Don't violate licenses on content or republish w/o permission
– Don't use in critical or production environments
● For production use or putting data on websites:
– Re-write in a real programming language with cached results
and error checking
XKCD Comic: http://xkcd.com/327/
18. Learn More
About Dawn:
● Intel Community Manager for MeeGo
● Author of Companies and Communities
● More Info: http://fastwonderblog.com
● Dawn@FastWonder.com
● @geekygirldawn on Twitter
18
Additional Reading & audio from 1 hour version of this talk:
● http://fastwonderblog.com/yahoo-pipes-and-rss-hacks/
Photo of Dawn: http://www.flickr.com/photos/ahockley/3036575066/
21. Yahoo Pipes: Reformat PostRank Feed
● Input:
– 3 PostRank feeds
● Loop String Build:
– PostRank
– : (spacing)
– Title
● Loop Assign:
– Store result back
into title
● Output:
– 1 RSS feed
– Efficient format
22. Yahoo Pipes PostRank Example
● Input PostRank
Feeds:
– Engadget
– CrunchGear
– Boy Genius
● Filter by content
– Tablet
● Sort:
– PostRank
● Output
– 1 RSS feed
– Best tablet posts
23. Using Web APIs 101
● Many API calls are basically URLs
● Constructing URLs
– Use API documentation/examples to
format the URL
– http://api.twitter.com/1/statuses/show
/ID.xml
● Version 1 of API show status for ID
in .format
● API keys
– Tells API who you are (password)
● Rate limiting
– Only get so much & you're cut of
– Limited by IP or API key
– Chill out for a while & come back
XKCD Comic: http://xkcd.com/844/
24. Backtweets API + Twitter API + Yahoo Pipes
● What we want to do:
– Start with a set of URLs (blog posts in a feed)
– Find any tweet mentioning those URLs
– Return the tweet and data about the person who posted it
● Mission: Build feed using only data from these 2 APIs
● BackType API provides Tweet ID (not humanly useful)
– http://api.backtype.com/tweets/search/links.xml?
q=URL&mode=batch&key=KEY
– List of Twitter Status IDs for Tweets linking to URL
– Note: I think this feature may be deprecated
● Twitter API uses Tweet ID to get everything else
– http://api.twitter.com/1/statuses/show/ID.xml
– Returns a single status all relevant data for ID
25. BackTweets API: Get Tweet ID
● Take WebWorkerDaily Author Feed
● Use WWD URLs to build URLs for BackType API call
● Fetch data from BackType URLs to get Tweet ID
26. Twitter API: Get Data Based on Tweet ID
● Use BackType tweet ID to build URL for Twitter API
● Fetch data about Tweet & User from Twitter API
● Re-Build title to show “user (followers): tweet”
27. Add Filters to BackType + Twitter Example
● Show only tweets from people with 1000+ followers