SlideShare a Scribd company logo
1 of 27
Download to read offline
Hacking RSS:
        Filtering & Processing
    Obscene Amounts of Information
              #hackingRSS

       Dawn Foster
Intel Community Manager
        for MeeGo
 dawn@fastwonder.com
Information Overload




                       CD Photo: http://www.flickr.com/photos/chefranden/2751354004/
Who Cares?


●   Most of it is …
    –   complete crap
    –   out of date / obsolete
    –   not interesting to you
    –   irrelevant for you




                                 Junk Pile: http://www.flickr.com/photos/zen/4013525/
You Want to Find the Needle




                      Haystacks: http://www.flickr.com/photos/rasekh/4911673659/
RSS Alone is a Start
●   Sources you care about delivered right to you. But …
    –   Do you care about everything in each feed?
    –   What about the feeds you aren't subscribed to?
    –   Can you keep up with what you have?
Prioritize Your Reader



●   Put things you care about at the top
●   Categorize
●   Don't try to read everything
The Real Magic is in Filtering RSS
                       Complete Crap
                         Interesting
                        Maybe Relevant
                               Yay!
●   In my Google Reader right now:
    –   Analyst research blogs mentioning Online Community
    –   Analyst research blogs mentioning MeeGo
    –   Searches across social sites mentioning me, my projects, my
        websites etc. - filtering out things I don't care about
    –   My favorite blogs filtered using PostRank to find only the
        ones with a lot of comments or social mentions
RSS Filtering Tools
●   Yahoo Pipes (my favorite)
    –   More powerful & fexible: options to filter any data found in
        any field in the rss feed (URL, title, description, author …)
    –   Downside: takes some time to learn & can be a little faky at
        times. Also a single point of failure if Yahoo ever killed it.



●   Other Options
    –   FeedRinse: easy to use, not as fexible. Import RSS feeds,
        add filters, get new RSS feeds out.
    –   RSS readers with filtering / alerts (FeedDemon)
    –   Code: write your own filters
    –   Note: many free RSS filtering services have gone out of
        business – can be bandwidth intensive & costly to host.
Yahoo Pipes Filtering Example
●   Input:
    –   WebWorkerDaily
    –   ReadWriteWeb
●   Filter by content:
    –   Collaborate
    –   Collaboration
    –   Collaborative
●   Output:
    –   1 RSS Feed
    –   Matching 3 keywords




          2 Minute Yahoo Pipe Video How-to's: http://fastwonderblog.com/yahoo-pipes-and-rss-hacks/
PostRank
●   Best Posts in a
    feed
●   Ranked on
    engagement (links,
    sharing, comments)
●   Can get output as
    RSS feed
●   Feed includes
    postrank number as
    a field
What's In a Feed? PostRank (Yahoo Pipes View)




●   Content in feeds varies wildly depending on site.
●   Common: title, author, pubDate, link, content, description
●   Site-specific: postrank, lat/long, image links, username,
    twitter source … (most RSS readers don't show these)
●   API: usually has additional data & can output RSS
●   If it's in the feed, you can use it!
Reformatting / Modifying RSS Feeds
   Don't be satisfied with default RSS feed formats!

 Twitter
 Search




 Twitter
 RSS
 Feed

           Modify & more quickly scan key data
Yahoo Pipes: Reformat Twitter Feed
●   Input:
    –   Twitter Search
        feed
●   Loop String Build:
    –   Author
    –   : (spacing)
    –   Title
●   Loop Assign:
    –   Store result back
        into title
●   Output:
    –   1 RSS feed
    –   Efficient format
BackTweets (BackType API)
●   Data about links on
    Twitter
●   Finds links regardless of
    shortening service
●   No RSS Feeds
●   But … You can use
    API + Pipes to build
    one!
BackType + Twitter API + Pipes Output
●   Data from BackType + Twitter
●   Built an RSS feed using Yahoo Pipes
●   Included the information relevant for me
●   Could have included or filtered on: name, listed count,
    location, profile image, user URL, ...
Admit it, we ALL do vanity searches
 ●   You can enter your search queries in Google, Twitter,
     Flickr …
       –   Add a new project & have to update all of them
       –   Can be hard to filter out some results
       –   May have duplicates from multiple searches
 ●   Yahoo Pipes
       –   Update keywords in a CSV file
       –   Use CSV file as input into a bunch of searches (RSS or
           API inputs)
       –   Filter out what you don't want
       –   Get 1 filtered RSS feed as output



2 minute video: http://fastwonderblog.com/2009/05/01/keyword-csv-files-and-searching-2-minute-yahoo-pipes-demo/
How Should / Shouldn't You Use All of This?
●   Do:
    –   Use this for personal productivity
    –   Play around, create prototypes and understand the possibilities
●   Don't:
    –   Don't violate licenses on content or republish w/o permission
    –   Don't use in critical or production environments




●   For production use or putting data on websites:
    –   Re-write in a real programming language with cached results
        and error checking
                       XKCD Comic: http://xkcd.com/327/
Learn More
About Dawn:
● Intel Community Manager for MeeGo

● Author of Companies and Communities

● More Info: http://fastwonderblog.com

● Dawn@FastWonder.com

● @geekygirldawn on Twitter




                                                                         18


Additional Reading & audio from 1 hour version of this talk:
● http://fastwonderblog.com/yahoo-pipes-and-rss-hacks/


                            Photo of Dawn: http://www.flickr.com/photos/ahockley/3036575066/
Backup
Outsource / Crowdsource New Sources
Yahoo Pipes: Reformat PostRank Feed
●   Input:
    –   3 PostRank feeds
●   Loop String Build:
    –   PostRank
    –   : (spacing)
    –   Title
●   Loop Assign:
    –   Store result back
        into title
●   Output:
    –   1 RSS feed
    –   Efficient format
Yahoo Pipes PostRank Example
●   Input PostRank
    Feeds:
    –   Engadget
    –   CrunchGear
    –   Boy Genius
●   Filter by content
    –   Tablet
●   Sort:
    –   PostRank
●   Output
    –   1 RSS feed
    –   Best tablet posts
Using Web APIs 101
●   Many API calls are basically URLs
●   Constructing URLs
    –   Use API documentation/examples to
        format the URL
    –   http://api.twitter.com/1/statuses/show
        /ID.xml
         ●   Version 1 of API show status for ID
             in .format
●   API keys
    –   Tells API who you are (password)
●   Rate limiting
    –   Only get so much & you're cut of
    –   Limited by IP or API key
    –   Chill out for a while & come back
                                                   XKCD Comic: http://xkcd.com/844/
Backtweets API + Twitter API + Yahoo Pipes
●   What we want to do:
    –   Start with a set of URLs (blog posts in a feed)
    –   Find any tweet mentioning those URLs
    –   Return the tweet and data about the person who posted it
●   Mission: Build feed using only data from these 2 APIs
●   BackType API provides Tweet ID (not humanly useful)
    –   http://api.backtype.com/tweets/search/links.xml?
        q=URL&mode=batch&key=KEY
    –   List of Twitter Status IDs for Tweets linking to URL
    –   Note: I think this feature may be deprecated
●   Twitter API uses Tweet ID to get everything else
    –   http://api.twitter.com/1/statuses/show/ID.xml
    –   Returns a single status all relevant data for ID
BackTweets API: Get Tweet ID




●   Take WebWorkerDaily Author Feed
●   Use WWD URLs to build URLs for BackType API call
●   Fetch data from BackType URLs to get Tweet ID
Twitter API: Get Data Based on Tweet ID




●   Use BackType tweet ID to build URL for Twitter API
●   Fetch data about Tweet & User from Twitter API
●   Re-Build title to show “user (followers): tweet”
Add Filters to BackType + Twitter Example
●   Show only tweets from people with 1000+ followers

More Related Content

Similar to Hacking RSS: Filtering & Processing Obscene Amounts of Information (short version)

Optimizing Content Visibility (St. Louis WordCamp)
Optimizing Content Visibility (St. Louis WordCamp)Optimizing Content Visibility (St. Louis WordCamp)
Optimizing Content Visibility (St. Louis WordCamp)Teresa Lane
 
Webinar Structured Data
Webinar Structured DataWebinar Structured Data
Webinar Structured DataBotify
 
SMX Advanced 2015 Seattle | SEO Recap
SMX Advanced 2015 Seattle | SEO RecapSMX Advanced 2015 Seattle | SEO Recap
SMX Advanced 2015 Seattle | SEO RecapRenee Girard
 
How to annotate_with_wordpress
How to annotate_with_wordpressHow to annotate_with_wordpress
How to annotate_with_wordpressSTIinnsbruck
 
SEO for Developers - Little Rock Tech Fest 2014
SEO for Developers - Little Rock Tech Fest 2014SEO for Developers - Little Rock Tech Fest 2014
SEO for Developers - Little Rock Tech Fest 2014Bill Hartzer
 
Tracking online conversations with Yahoo Pipes
Tracking online conversations with Yahoo PipesTracking online conversations with Yahoo Pipes
Tracking online conversations with Yahoo PipesCorinne Weisgerber
 
StripeCon EU 2021 - Can you make it more like google?
StripeCon EU 2021 - Can you make it more like google?StripeCon EU 2021 - Can you make it more like google?
StripeCon EU 2021 - Can you make it more like google?Andrew Paxley
 
Social Media Data Collection & Analysis
Social Media Data Collection & AnalysisSocial Media Data Collection & Analysis
Social Media Data Collection & AnalysisScott Sanders
 
The Zeitgeist Movement
The Zeitgeist MovementThe Zeitgeist Movement
The Zeitgeist Movementguest915c8c5
 
India Pr Wire May 11, 2009 Sensex Down 193 Points On Profit Booking
India Pr Wire May 11, 2009 Sensex Down 193 Points On Profit BookingIndia Pr Wire May 11, 2009 Sensex Down 193 Points On Profit Booking
India Pr Wire May 11, 2009 Sensex Down 193 Points On Profit BookingJagannadham Thunuguntla
 
Miyagawa
MiyagawaMiyagawa
Miyagawaguru100
 
Curiosity Bits Python Tutorial: Mining Facebook Fan Page - getting posts and ...
Curiosity Bits Python Tutorial: Mining Facebook Fan Page - getting posts and ...Curiosity Bits Python Tutorial: Mining Facebook Fan Page - getting posts and ...
Curiosity Bits Python Tutorial: Mining Facebook Fan Page - getting posts and ...Weiai Wayne Xu
 
DMAP: Data Aggregation and Presentation Framework
DMAP: Data Aggregation and Presentation FrameworkDMAP: Data Aggregation and Presentation Framework
DMAP: Data Aggregation and Presentation FrameworkParang Saraf
 
Working Smarter: SEO Automation to Increase Efficiency and Effectiveness - Pa...
Working Smarter: SEO Automation to Increase Efficiency and Effectiveness - Pa...Working Smarter: SEO Automation to Increase Efficiency and Effectiveness - Pa...
Working Smarter: SEO Automation to Increase Efficiency and Effectiveness - Pa...State of Search Conference
 

Similar to Hacking RSS: Filtering & Processing Obscene Amounts of Information (short version) (20)

Optimizing Content Visibility (St. Louis WordCamp)
Optimizing Content Visibility (St. Louis WordCamp)Optimizing Content Visibility (St. Louis WordCamp)
Optimizing Content Visibility (St. Louis WordCamp)
 
Webinar Structured Data
Webinar Structured DataWebinar Structured Data
Webinar Structured Data
 
SMX Advanced 2015 Seattle | SEO Recap
SMX Advanced 2015 Seattle | SEO RecapSMX Advanced 2015 Seattle | SEO Recap
SMX Advanced 2015 Seattle | SEO Recap
 
How to annotate_with_wordpress
How to annotate_with_wordpressHow to annotate_with_wordpress
How to annotate_with_wordpress
 
SEO for Developers - Little Rock Tech Fest 2014
SEO for Developers - Little Rock Tech Fest 2014SEO for Developers - Little Rock Tech Fest 2014
SEO for Developers - Little Rock Tech Fest 2014
 
Tracking online conversations with Yahoo Pipes
Tracking online conversations with Yahoo PipesTracking online conversations with Yahoo Pipes
Tracking online conversations with Yahoo Pipes
 
StripeCon EU 2021 - Can you make it more like google?
StripeCon EU 2021 - Can you make it more like google?StripeCon EU 2021 - Can you make it more like google?
StripeCon EU 2021 - Can you make it more like google?
 
Recsys 2016
Recsys 2016Recsys 2016
Recsys 2016
 
Social Media Data Collection & Analysis
Social Media Data Collection & AnalysisSocial Media Data Collection & Analysis
Social Media Data Collection & Analysis
 
The Zeitgeist Movement
The Zeitgeist MovementThe Zeitgeist Movement
The Zeitgeist Movement
 
India Pr Wire May 11, 2009 Sensex Down 193 Points On Profit Booking
India Pr Wire May 11, 2009 Sensex Down 193 Points On Profit BookingIndia Pr Wire May 11, 2009 Sensex Down 193 Points On Profit Booking
India Pr Wire May 11, 2009 Sensex Down 193 Points On Profit Booking
 
Miyagawa
MiyagawaMiyagawa
Miyagawa
 
Miyagawa
MiyagawaMiyagawa
Miyagawa
 
Miyagawa
MiyagawaMiyagawa
Miyagawa
 
Miyagawa
MiyagawaMiyagawa
Miyagawa
 
Rss Feeds
Rss FeedsRss Feeds
Rss Feeds
 
Curiosity Bits Python Tutorial: Mining Facebook Fan Page - getting posts and ...
Curiosity Bits Python Tutorial: Mining Facebook Fan Page - getting posts and ...Curiosity Bits Python Tutorial: Mining Facebook Fan Page - getting posts and ...
Curiosity Bits Python Tutorial: Mining Facebook Fan Page - getting posts and ...
 
DMAP: Data Aggregation and Presentation Framework
DMAP: Data Aggregation and Presentation FrameworkDMAP: Data Aggregation and Presentation Framework
DMAP: Data Aggregation and Presentation Framework
 
Indexing repositories: Pitfalls & best practices
Indexing repositories: Pitfalls & best practicesIndexing repositories: Pitfalls & best practices
Indexing repositories: Pitfalls & best practices
 
Working Smarter: SEO Automation to Increase Efficiency and Effectiveness - Pa...
Working Smarter: SEO Automation to Increase Efficiency and Effectiveness - Pa...Working Smarter: SEO Automation to Increase Efficiency and Effectiveness - Pa...
Working Smarter: SEO Automation to Increase Efficiency and Effectiveness - Pa...
 

More from Dawn Foster

CHAOSS Metrics Overview and Examples
CHAOSS Metrics Overview and ExamplesCHAOSS Metrics Overview and Examples
CHAOSS Metrics Overview and ExamplesDawn Foster
 
Be a Good Corporate Citizen in Kubernetes
Be a Good Corporate Citizen in KubernetesBe a Good Corporate Citizen in Kubernetes
Be a Good Corporate Citizen in KubernetesDawn Foster
 
Overcoming Imposter Syndrome to Become a Conference Speaker!
Overcoming Imposter Syndrome to Become a Conference Speaker!Overcoming Imposter Syndrome to Become a Conference Speaker!
Overcoming Imposter Syndrome to Become a Conference Speaker!Dawn Foster
 
How to Be a Good Corporate Citizen in Open Source
How to Be a Good Corporate Citizen in Open SourceHow to Be a Good Corporate Citizen in Open Source
How to Be a Good Corporate Citizen in Open SourceDawn Foster
 
Open Source Collaboration and Companies: Finding the Right Balance
Open Source Collaboration and Companies: Finding the Right BalanceOpen Source Collaboration and Companies: Finding the Right Balance
Open Source Collaboration and Companies: Finding the Right BalanceDawn Foster
 
Navigating Open Source Risk
Navigating Open Source RiskNavigating Open Source Risk
Navigating Open Source RiskDawn Foster
 
Measuring Project Health at VMware
Measuring Project Health at VMwareMeasuring Project Health at VMware
Measuring Project Health at VMwareDawn Foster
 
Navigating Open Source Risk
Navigating Open Source RiskNavigating Open Source Risk
Navigating Open Source RiskDawn Foster
 
Collaborative Leadership: Governance Beyond Company Affiliation
Collaborative Leadership: Governance Beyond Company AffiliationCollaborative Leadership: Governance Beyond Company Affiliation
Collaborative Leadership: Governance Beyond Company AffiliationDawn Foster
 
Collaborative Leadership: Governance Beyond Company Affiliation
Collaborative Leadership: Governance Beyond Company AffiliationCollaborative Leadership: Governance Beyond Company Affiliation
Collaborative Leadership: Governance Beyond Company AffiliationDawn Foster
 
Collaborative Leadership: Governance Beyond Company Affiliation
Collaborative Leadership: Governance Beyond Company AffiliationCollaborative Leadership: Governance Beyond Company Affiliation
Collaborative Leadership: Governance Beyond Company AffiliationDawn Foster
 
Collaborative Leadership: Governance Beyond Company Affiliation
Collaborative Leadership: Governance Beyond Company AffiliationCollaborative Leadership: Governance Beyond Company Affiliation
Collaborative Leadership: Governance Beyond Company AffiliationDawn Foster
 
Is this Open Source Project Healthy or Lifeless?
Is this Open Source Project Healthy or Lifeless?Is this Open Source Project Healthy or Lifeless?
Is this Open Source Project Healthy or Lifeless?Dawn Foster
 
Collaboration in Linux Kernel Mailing Lists
Collaboration in Linux Kernel Mailing Lists Collaboration in Linux Kernel Mailing Lists
Collaboration in Linux Kernel Mailing Lists Dawn Foster
 
Be a Good Corporate Citizen in Kubernetes
Be a Good Corporate Citizen in KubernetesBe a Good Corporate Citizen in Kubernetes
Be a Good Corporate Citizen in KubernetesDawn Foster
 
Being a Good Corporate Citizen in Open Source
Being a Good Corporate Citizen in Open SourceBeing a Good Corporate Citizen in Open Source
Being a Good Corporate Citizen in Open SourceDawn Foster
 
Building Community for your Company’s OSS Projects
Building Community for your Company’s OSS ProjectsBuilding Community for your Company’s OSS Projects
Building Community for your Company’s OSS ProjectsDawn Foster
 
Building Community for your Company’s OSS Project
Building Community for your Company’s OSS ProjectBuilding Community for your Company’s OSS Project
Building Community for your Company’s OSS ProjectDawn Foster
 
How to be a terrible hiring manager
How to be a terrible hiring managerHow to be a terrible hiring manager
How to be a terrible hiring managerDawn Foster
 
A week in the Life of Kubernetes
A week in the Life of KubernetesA week in the Life of Kubernetes
A week in the Life of KubernetesDawn Foster
 

More from Dawn Foster (20)

CHAOSS Metrics Overview and Examples
CHAOSS Metrics Overview and ExamplesCHAOSS Metrics Overview and Examples
CHAOSS Metrics Overview and Examples
 
Be a Good Corporate Citizen in Kubernetes
Be a Good Corporate Citizen in KubernetesBe a Good Corporate Citizen in Kubernetes
Be a Good Corporate Citizen in Kubernetes
 
Overcoming Imposter Syndrome to Become a Conference Speaker!
Overcoming Imposter Syndrome to Become a Conference Speaker!Overcoming Imposter Syndrome to Become a Conference Speaker!
Overcoming Imposter Syndrome to Become a Conference Speaker!
 
How to Be a Good Corporate Citizen in Open Source
How to Be a Good Corporate Citizen in Open SourceHow to Be a Good Corporate Citizen in Open Source
How to Be a Good Corporate Citizen in Open Source
 
Open Source Collaboration and Companies: Finding the Right Balance
Open Source Collaboration and Companies: Finding the Right BalanceOpen Source Collaboration and Companies: Finding the Right Balance
Open Source Collaboration and Companies: Finding the Right Balance
 
Navigating Open Source Risk
Navigating Open Source RiskNavigating Open Source Risk
Navigating Open Source Risk
 
Measuring Project Health at VMware
Measuring Project Health at VMwareMeasuring Project Health at VMware
Measuring Project Health at VMware
 
Navigating Open Source Risk
Navigating Open Source RiskNavigating Open Source Risk
Navigating Open Source Risk
 
Collaborative Leadership: Governance Beyond Company Affiliation
Collaborative Leadership: Governance Beyond Company AffiliationCollaborative Leadership: Governance Beyond Company Affiliation
Collaborative Leadership: Governance Beyond Company Affiliation
 
Collaborative Leadership: Governance Beyond Company Affiliation
Collaborative Leadership: Governance Beyond Company AffiliationCollaborative Leadership: Governance Beyond Company Affiliation
Collaborative Leadership: Governance Beyond Company Affiliation
 
Collaborative Leadership: Governance Beyond Company Affiliation
Collaborative Leadership: Governance Beyond Company AffiliationCollaborative Leadership: Governance Beyond Company Affiliation
Collaborative Leadership: Governance Beyond Company Affiliation
 
Collaborative Leadership: Governance Beyond Company Affiliation
Collaborative Leadership: Governance Beyond Company AffiliationCollaborative Leadership: Governance Beyond Company Affiliation
Collaborative Leadership: Governance Beyond Company Affiliation
 
Is this Open Source Project Healthy or Lifeless?
Is this Open Source Project Healthy or Lifeless?Is this Open Source Project Healthy or Lifeless?
Is this Open Source Project Healthy or Lifeless?
 
Collaboration in Linux Kernel Mailing Lists
Collaboration in Linux Kernel Mailing Lists Collaboration in Linux Kernel Mailing Lists
Collaboration in Linux Kernel Mailing Lists
 
Be a Good Corporate Citizen in Kubernetes
Be a Good Corporate Citizen in KubernetesBe a Good Corporate Citizen in Kubernetes
Be a Good Corporate Citizen in Kubernetes
 
Being a Good Corporate Citizen in Open Source
Being a Good Corporate Citizen in Open SourceBeing a Good Corporate Citizen in Open Source
Being a Good Corporate Citizen in Open Source
 
Building Community for your Company’s OSS Projects
Building Community for your Company’s OSS ProjectsBuilding Community for your Company’s OSS Projects
Building Community for your Company’s OSS Projects
 
Building Community for your Company’s OSS Project
Building Community for your Company’s OSS ProjectBuilding Community for your Company’s OSS Project
Building Community for your Company’s OSS Project
 
How to be a terrible hiring manager
How to be a terrible hiring managerHow to be a terrible hiring manager
How to be a terrible hiring manager
 
A week in the Life of Kubernetes
A week in the Life of KubernetesA week in the Life of Kubernetes
A week in the Life of Kubernetes
 

Recently uploaded

Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfPrecisely
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Manik S Magar
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piececharlottematthew16
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostZilliz
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsMiki Katsuragi
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyAlfredo García Lavilla
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionDilum Bandara
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 

Recently uploaded (20)

Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piece
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering Tips
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easy
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An Introduction
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 

Hacking RSS: Filtering & Processing Obscene Amounts of Information (short version)

  • 1. Hacking RSS: Filtering & Processing Obscene Amounts of Information #hackingRSS Dawn Foster Intel Community Manager for MeeGo dawn@fastwonder.com
  • 2. Information Overload CD Photo: http://www.flickr.com/photos/chefranden/2751354004/
  • 3. Who Cares? ● Most of it is … – complete crap – out of date / obsolete – not interesting to you – irrelevant for you Junk Pile: http://www.flickr.com/photos/zen/4013525/
  • 4. You Want to Find the Needle Haystacks: http://www.flickr.com/photos/rasekh/4911673659/
  • 5. RSS Alone is a Start ● Sources you care about delivered right to you. But … – Do you care about everything in each feed? – What about the feeds you aren't subscribed to? – Can you keep up with what you have?
  • 6. Prioritize Your Reader ● Put things you care about at the top ● Categorize ● Don't try to read everything
  • 7. The Real Magic is in Filtering RSS Complete Crap Interesting Maybe Relevant Yay! ● In my Google Reader right now: – Analyst research blogs mentioning Online Community – Analyst research blogs mentioning MeeGo – Searches across social sites mentioning me, my projects, my websites etc. - filtering out things I don't care about – My favorite blogs filtered using PostRank to find only the ones with a lot of comments or social mentions
  • 8. RSS Filtering Tools ● Yahoo Pipes (my favorite) – More powerful & fexible: options to filter any data found in any field in the rss feed (URL, title, description, author …) – Downside: takes some time to learn & can be a little faky at times. Also a single point of failure if Yahoo ever killed it. ● Other Options – FeedRinse: easy to use, not as fexible. Import RSS feeds, add filters, get new RSS feeds out. – RSS readers with filtering / alerts (FeedDemon) – Code: write your own filters – Note: many free RSS filtering services have gone out of business – can be bandwidth intensive & costly to host.
  • 9. Yahoo Pipes Filtering Example ● Input: – WebWorkerDaily – ReadWriteWeb ● Filter by content: – Collaborate – Collaboration – Collaborative ● Output: – 1 RSS Feed – Matching 3 keywords 2 Minute Yahoo Pipe Video How-to's: http://fastwonderblog.com/yahoo-pipes-and-rss-hacks/
  • 10. PostRank ● Best Posts in a feed ● Ranked on engagement (links, sharing, comments) ● Can get output as RSS feed ● Feed includes postrank number as a field
  • 11. What's In a Feed? PostRank (Yahoo Pipes View) ● Content in feeds varies wildly depending on site. ● Common: title, author, pubDate, link, content, description ● Site-specific: postrank, lat/long, image links, username, twitter source … (most RSS readers don't show these) ● API: usually has additional data & can output RSS ● If it's in the feed, you can use it!
  • 12. Reformatting / Modifying RSS Feeds Don't be satisfied with default RSS feed formats! Twitter Search Twitter RSS Feed Modify & more quickly scan key data
  • 13. Yahoo Pipes: Reformat Twitter Feed ● Input: – Twitter Search feed ● Loop String Build: – Author – : (spacing) – Title ● Loop Assign: – Store result back into title ● Output: – 1 RSS feed – Efficient format
  • 14. BackTweets (BackType API) ● Data about links on Twitter ● Finds links regardless of shortening service ● No RSS Feeds ● But … You can use API + Pipes to build one!
  • 15. BackType + Twitter API + Pipes Output ● Data from BackType + Twitter ● Built an RSS feed using Yahoo Pipes ● Included the information relevant for me ● Could have included or filtered on: name, listed count, location, profile image, user URL, ...
  • 16. Admit it, we ALL do vanity searches ● You can enter your search queries in Google, Twitter, Flickr … – Add a new project & have to update all of them – Can be hard to filter out some results – May have duplicates from multiple searches ● Yahoo Pipes – Update keywords in a CSV file – Use CSV file as input into a bunch of searches (RSS or API inputs) – Filter out what you don't want – Get 1 filtered RSS feed as output 2 minute video: http://fastwonderblog.com/2009/05/01/keyword-csv-files-and-searching-2-minute-yahoo-pipes-demo/
  • 17. How Should / Shouldn't You Use All of This? ● Do: – Use this for personal productivity – Play around, create prototypes and understand the possibilities ● Don't: – Don't violate licenses on content or republish w/o permission – Don't use in critical or production environments ● For production use or putting data on websites: – Re-write in a real programming language with cached results and error checking XKCD Comic: http://xkcd.com/327/
  • 18. Learn More About Dawn: ● Intel Community Manager for MeeGo ● Author of Companies and Communities ● More Info: http://fastwonderblog.com ● Dawn@FastWonder.com ● @geekygirldawn on Twitter 18 Additional Reading & audio from 1 hour version of this talk: ● http://fastwonderblog.com/yahoo-pipes-and-rss-hacks/ Photo of Dawn: http://www.flickr.com/photos/ahockley/3036575066/
  • 20. Outsource / Crowdsource New Sources
  • 21. Yahoo Pipes: Reformat PostRank Feed ● Input: – 3 PostRank feeds ● Loop String Build: – PostRank – : (spacing) – Title ● Loop Assign: – Store result back into title ● Output: – 1 RSS feed – Efficient format
  • 22. Yahoo Pipes PostRank Example ● Input PostRank Feeds: – Engadget – CrunchGear – Boy Genius ● Filter by content – Tablet ● Sort: – PostRank ● Output – 1 RSS feed – Best tablet posts
  • 23. Using Web APIs 101 ● Many API calls are basically URLs ● Constructing URLs – Use API documentation/examples to format the URL – http://api.twitter.com/1/statuses/show /ID.xml ● Version 1 of API show status for ID in .format ● API keys – Tells API who you are (password) ● Rate limiting – Only get so much & you're cut of – Limited by IP or API key – Chill out for a while & come back XKCD Comic: http://xkcd.com/844/
  • 24. Backtweets API + Twitter API + Yahoo Pipes ● What we want to do: – Start with a set of URLs (blog posts in a feed) – Find any tweet mentioning those URLs – Return the tweet and data about the person who posted it ● Mission: Build feed using only data from these 2 APIs ● BackType API provides Tweet ID (not humanly useful) – http://api.backtype.com/tweets/search/links.xml? q=URL&mode=batch&key=KEY – List of Twitter Status IDs for Tweets linking to URL – Note: I think this feature may be deprecated ● Twitter API uses Tweet ID to get everything else – http://api.twitter.com/1/statuses/show/ID.xml – Returns a single status all relevant data for ID
  • 25. BackTweets API: Get Tweet ID ● Take WebWorkerDaily Author Feed ● Use WWD URLs to build URLs for BackType API call ● Fetch data from BackType URLs to get Tweet ID
  • 26. Twitter API: Get Data Based on Tweet ID ● Use BackType tweet ID to build URL for Twitter API ● Fetch data about Tweet & User from Twitter API ● Re-Build title to show “user (followers): tweet”
  • 27. Add Filters to BackType + Twitter Example ● Show only tweets from people with 1000+ followers