1. Get yourself ready
• Google ‘Google Refine download’
http://code.google.com/p/google-
refine/wiki/Downloads
• Download and install Google Refine
• Open it up - it should open in a
browser at http://127.0.0.1:3333/
Saturday, 15 October 2011
2. Google Refine
Combining data
OnlineJournalismBlog.com
Twitter.com/PaulBradshaw
Saturday, 15 October 2011
3. In a nutshell...
cell.cross("GPdata2008", "Practice Code").cells["Total Listsize"].value[0]
• Using GREL to combine datasets
• Using APIs to grab geographical
data
• Using Reconcile services to grab
company data
Saturday, 15 October 2011
4. GREL
Google Refine Expression Language
Saturday, 15 October 2011
5. cell.cross("GPdata2008",
"Practice Code").cells
["Total Listsize"].value[0]
Saturday, 15 October 2011
6. Using APIs
Getting contextual data
.
Saturday, 15 October 2011
7. What’s an API again?
Ask it a question, it gives you an answer:
“For each of these codes, give me the
region.”
“For each of these names, tell me their
political party”
Saturday, 15 October 2011
8. Useful APIs
Geo: UK Postcodes, Google Maps
Social: Twitter, Facebook, Flickr
Politics: They Work For You, Data.gov.uk
News: Guardian, NYT, USA Today, NPR
Health, business, etc.
Search for specific ones
Saturday, 15 October 2011
9. API keys
Sometimes needed - apply through the
site
Use it in the request as a password
Saturday, 15 October 2011
10. API limits
Can prevent you getting data for all your
records.
Try multiple APIs or split your data into
multiple sheets - or buy a licence
Saturday, 15 October 2011
15. Walkthrough: Reconciliation with
Open Corporates
• Click on arrow at top of column
• Select Reconcile > Start
Reconciling...
• Click on Add Standard Service...
• http://opencorporates.com/reconcile
• And start...
Saturday, 15 October 2011
16. Walkthrough: Reconciliation with
Open Corporates
• Click ‘Search for Match’ and select
• Click double tick icon to bulk
reconcile
• Reconcile > Action > Match each cell
to its best candidate
Saturday, 15 October 2011
25. Walkthrough: Using Google
Refine to pull out data
> Create new column based on this
one...
GREL:
value.parseJson().item1.part2[1]
Saturday, 15 October 2011
26. Links
Delicious.com/paulb/kiev11
Delicious.com/paulb/googlerefine
OnlineJournalismBlog.com/tag/
google-refine
Saturday, 15 October 2011