Submit Search
Upload
Gors appropriate
ā¢
2 likes
ā¢
2,493 views
Tony Hirst
Follow
GORS appropriate with notes
Read less
Read more
Education
Report
Share
Report
Share
1 of 36
Download now
Download to read offline
Recommended
Fco open data in half day th-v2
Fco open data in half day th-v2
Tony Hirst
Ā
Lincoln jun14datajournalism
Lincoln jun14datajournalism
Tony Hirst
Ā
ROBORACE
ROBORACE
Roborace
Ā
Robotlab jupyter
Robotlab jupyter
Tony Hirst
Ā
Open Government Data: Implications for Auditors
Open Government Data: Implications for Auditors
Andrew Stott
Ā
Calrg14 tm351
Calrg14 tm351
Tony Hirst
Ā
Gephi Tutorial Layouts
Gephi Tutorial Layouts
Gephi Consortium
Ā
Gephi Tutorial Visualization
Gephi Tutorial Visualization
Gephi Consortium
Ā
Recommended
Fco open data in half day th-v2
Fco open data in half day th-v2
Tony Hirst
Ā
Lincoln jun14datajournalism
Lincoln jun14datajournalism
Tony Hirst
Ā
ROBORACE
ROBORACE
Roborace
Ā
Robotlab jupyter
Robotlab jupyter
Tony Hirst
Ā
Open Government Data: Implications for Auditors
Open Government Data: Implications for Auditors
Andrew Stott
Ā
Calrg14 tm351
Calrg14 tm351
Tony Hirst
Ā
Gephi Tutorial Layouts
Gephi Tutorial Layouts
Gephi Consortium
Ā
Gephi Tutorial Visualization
Gephi Tutorial Visualization
Gephi Consortium
Ā
My self assessment
My self assessment
jcmahoney76
Ā
5 tactics for practical privacy protection
5 tactics for practical privacy protection
Amber Macintyre
Ā
Choose Boring Technology
Choose Boring Technology
Dan McKinley
Ā
A Few of My Favorite Tools
A Few of My Favorite Tools
Shimon Shmueli
Ā
Paradox of the Active User
Paradox of the Active User
Nguyį» n ThĘ°į»£ng Äan
Ā
Reading response #3
Reading response #3
cfregoso
Ā
Sourceress cover letter
Sourceress cover letter
Tala Shivute
Ā
Collabtipskennedymighellts09
Collabtipskennedymighellts09
denniskennedy
Ā
English for Computer Unit 1 Introduction
English for Computer Unit 1 Introduction
anchalee khunseesook
Ā
Respond toĀ twoĀ of your colleagues inĀ one or moreĀ of the following .docx
Respond toĀ twoĀ of your colleagues inĀ one or moreĀ of the following .docx
audeleypearl
Ā
Respond toĀ twoĀ of your colleagues inĀ one or moreĀ of the following .docx
Respond toĀ twoĀ of your colleagues inĀ one or moreĀ of the following .docx
wilfredoa1
Ā
Arduino lessons learned
Arduino lessons learned
Bryce Roberts
Ā
Formal vs informal.pptx
Formal vs informal.pptx
FionaKee3
Ā
Ass6
Ass6
GokceKacmaz
Ā
TIP OF THE DAY series about DIP
TIP OF THE DAY series about DIP
Darshana Samanpura
Ā
Hacker High School-Book 01- being_a_hacker
Hacker High School-Book 01- being_a_hacker
Bons Ju
Ā
C programming guide new
C programming guide new
Kuntal Bhowmick
Ā
15 in 20 research fiesta
15 in 20 research fiesta
Tony Hirst
Ā
Dev8d jupyter
Dev8d jupyter
Tony Hirst
Ā
Ili 16 robot
Ili 16 robot
Tony Hirst
Ā
Jupyternotebooks ou.pptx
Jupyternotebooks ou.pptx
Tony Hirst
Ā
Virtual computing.pptx
Virtual computing.pptx
Tony Hirst
Ā
More Related Content
Similar to Gors appropriate
My self assessment
My self assessment
jcmahoney76
Ā
5 tactics for practical privacy protection
5 tactics for practical privacy protection
Amber Macintyre
Ā
Choose Boring Technology
Choose Boring Technology
Dan McKinley
Ā
A Few of My Favorite Tools
A Few of My Favorite Tools
Shimon Shmueli
Ā
Paradox of the Active User
Paradox of the Active User
Nguyį» n ThĘ°į»£ng Äan
Ā
Reading response #3
Reading response #3
cfregoso
Ā
Sourceress cover letter
Sourceress cover letter
Tala Shivute
Ā
Collabtipskennedymighellts09
Collabtipskennedymighellts09
denniskennedy
Ā
English for Computer Unit 1 Introduction
English for Computer Unit 1 Introduction
anchalee khunseesook
Ā
Respond toĀ twoĀ of your colleagues inĀ one or moreĀ of the following .docx
Respond toĀ twoĀ of your colleagues inĀ one or moreĀ of the following .docx
audeleypearl
Ā
Respond toĀ twoĀ of your colleagues inĀ one or moreĀ of the following .docx
Respond toĀ twoĀ of your colleagues inĀ one or moreĀ of the following .docx
wilfredoa1
Ā
Arduino lessons learned
Arduino lessons learned
Bryce Roberts
Ā
Formal vs informal.pptx
Formal vs informal.pptx
FionaKee3
Ā
Ass6
Ass6
GokceKacmaz
Ā
TIP OF THE DAY series about DIP
TIP OF THE DAY series about DIP
Darshana Samanpura
Ā
Hacker High School-Book 01- being_a_hacker
Hacker High School-Book 01- being_a_hacker
Bons Ju
Ā
C programming guide new
C programming guide new
Kuntal Bhowmick
Ā
Similar to Gors appropriate
(17)
My self assessment
My self assessment
Ā
5 tactics for practical privacy protection
5 tactics for practical privacy protection
Ā
Choose Boring Technology
Choose Boring Technology
Ā
A Few of My Favorite Tools
A Few of My Favorite Tools
Ā
Paradox of the Active User
Paradox of the Active User
Ā
Reading response #3
Reading response #3
Ā
Sourceress cover letter
Sourceress cover letter
Ā
Collabtipskennedymighellts09
Collabtipskennedymighellts09
Ā
English for Computer Unit 1 Introduction
English for Computer Unit 1 Introduction
Ā
Respond toĀ twoĀ of your colleagues inĀ one or moreĀ of the following .docx
Respond toĀ twoĀ of your colleagues inĀ one or moreĀ of the following .docx
Ā
Respond toĀ twoĀ of your colleagues inĀ one or moreĀ of the following .docx
Respond toĀ twoĀ of your colleagues inĀ one or moreĀ of the following .docx
Ā
Arduino lessons learned
Arduino lessons learned
Ā
Formal vs informal.pptx
Formal vs informal.pptx
Ā
Ass6
Ass6
Ā
TIP OF THE DAY series about DIP
TIP OF THE DAY series about DIP
Ā
Hacker High School-Book 01- being_a_hacker
Hacker High School-Book 01- being_a_hacker
Ā
C programming guide new
C programming guide new
Ā
More from Tony Hirst
15 in 20 research fiesta
15 in 20 research fiesta
Tony Hirst
Ā
Dev8d jupyter
Dev8d jupyter
Tony Hirst
Ā
Ili 16 robot
Ili 16 robot
Tony Hirst
Ā
Jupyternotebooks ou.pptx
Jupyternotebooks ou.pptx
Tony Hirst
Ā
Virtual computing.pptx
Virtual computing.pptx
Tony Hirst
Ā
ouseful-parlihacks
ouseful-parlihacks
Tony Hirst
Ā
Gors appropriate
Gors appropriate
Tony Hirst
Ā
Notes on the Future - ILI2015 Workshop
Notes on the Future - ILI2015 Workshop
Tony Hirst
Ā
Community Journalism Conf - hyperlocal data wire
Community Journalism Conf - hyperlocal data wire
Tony Hirst
Ā
Residential school 2015_robotics_interest
Residential school 2015_robotics_interest
Tony Hirst
Ā
Data Mining - Separating Fact From Fiction - NetIKX
Data Mining - Separating Fact From Fiction - NetIKX
Tony Hirst
Ā
Week4
Week4
Tony Hirst
Ā
A Quick Tour of OpenRefine
A Quick Tour of OpenRefine
Tony Hirst
Ā
Conversations with data
Conversations with data
Tony Hirst
Ā
Data reuse OU workshop bingo
Data reuse OU workshop bingo
Tony Hirst
Ā
Inspiring content - You Don't Need Big Data to Tell Good Data Stories
Inspiring content - You Don't Need Big Data to Tell Good Data Stories
Tony Hirst
Ā
Lincoln Journalism Research Day - Data Journalism
Lincoln Journalism Research Day - Data Journalism
Tony Hirst
Ā
Calrg14 tm351
Calrg14 tm351
Tony Hirst
Ā
Hestia linear tales
Hestia linear tales
Tony Hirst
Ā
Hestia linear tales
Hestia linear tales
Tony Hirst
Ā
More from Tony Hirst
(20)
15 in 20 research fiesta
15 in 20 research fiesta
Ā
Dev8d jupyter
Dev8d jupyter
Ā
Ili 16 robot
Ili 16 robot
Ā
Jupyternotebooks ou.pptx
Jupyternotebooks ou.pptx
Ā
Virtual computing.pptx
Virtual computing.pptx
Ā
ouseful-parlihacks
ouseful-parlihacks
Ā
Gors appropriate
Gors appropriate
Ā
Notes on the Future - ILI2015 Workshop
Notes on the Future - ILI2015 Workshop
Ā
Community Journalism Conf - hyperlocal data wire
Community Journalism Conf - hyperlocal data wire
Ā
Residential school 2015_robotics_interest
Residential school 2015_robotics_interest
Ā
Data Mining - Separating Fact From Fiction - NetIKX
Data Mining - Separating Fact From Fiction - NetIKX
Ā
Week4
Week4
Ā
A Quick Tour of OpenRefine
A Quick Tour of OpenRefine
Ā
Conversations with data
Conversations with data
Ā
Data reuse OU workshop bingo
Data reuse OU workshop bingo
Ā
Inspiring content - You Don't Need Big Data to Tell Good Data Stories
Inspiring content - You Don't Need Big Data to Tell Good Data Stories
Ā
Lincoln Journalism Research Day - Data Journalism
Lincoln Journalism Research Day - Data Journalism
Ā
Calrg14 tm351
Calrg14 tm351
Ā
Hestia linear tales
Hestia linear tales
Ā
Hestia linear tales
Hestia linear tales
Ā
Recently uploaded
Unit-V; Pricing (Pharma Marketing Management).pptx
Unit-V; Pricing (Pharma Marketing Management).pptx
VishalSingh1417
Ā
Understanding Accommodations and Modifications
Understanding Accommodations and Modifications
MJDuyan
Ā
ICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptx
AreebaZafar22
Ā
Kodo Millet PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...
Kodo Millet PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...
pradhanghanshyam7136
Ā
Spatium Project Simulation student brief
Spatium Project Simulation student brief
Association for Project Management
Ā
Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
Denish Jangid
Ā
Food safety_Challenges food safety laboratories_.pdf
Food safety_Challenges food safety laboratories_.pdf
Sherif Taha
Ā
Sociology 101 Demonstration of Learning Exhibit
Sociology 101 Demonstration of Learning Exhibit
jbellavia9
Ā
Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdf
ciinovamais
Ā
1029-Danh muc Sach Giao Khoa khoi 6.pdf
1029-Danh muc Sach Giao Khoa khoi 6.pdf
QucHHunhnh
Ā
Micro-Scholarship, What it is, How can it help me.pdf
Micro-Scholarship, What it is, How can it help me.pdf
Poh-Sun Goh
Ā
Unit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptx
VishalSingh1417
Ā
Python Notes for mca i year students osmania university.docx
Python Notes for mca i year students osmania university.docx
Ramakrishna Reddy Bijjam
Ā
Mixin Classes in Odoo 17 How to Extend Models Using Mixin Classes
Mixin Classes in Odoo 17 How to Extend Models Using Mixin Classes
Celine George
Ā
This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.
christianmathematics
Ā
Grant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy Consulting
TechSoup
Ā
Tį»NG ĆN Tįŗ¬P THI VĆO Lį»P 10 MĆN TIįŗ¾NG ANH NÄM Hį»C 2023 - 2024 CĆ ÄĆP ĆN (NGį»® Ć...
Tį»NG ĆN Tįŗ¬P THI VĆO Lį»P 10 MĆN TIįŗ¾NG ANH NÄM Hį»C 2023 - 2024 CĆ ÄĆP ĆN (NGį»® Ć...
Nguyen Thanh Tu Collection
Ā
ICT role in 21st century education and it's challenges.
ICT role in 21st century education and it's challenges.
MaryamAhmad92
Ā
SKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptx
SKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptx
Amanpreet Kaur
Ā
Third Battle of Panipat detailed notes.pptx
Third Battle of Panipat detailed notes.pptx
Amita Gupta
Ā
Recently uploaded
(20)
Unit-V; Pricing (Pharma Marketing Management).pptx
Unit-V; Pricing (Pharma Marketing Management).pptx
Ā
Understanding Accommodations and Modifications
Understanding Accommodations and Modifications
Ā
ICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptx
Ā
Kodo Millet PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...
Kodo Millet PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...
Ā
Spatium Project Simulation student brief
Spatium Project Simulation student brief
Ā
Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
Ā
Food safety_Challenges food safety laboratories_.pdf
Food safety_Challenges food safety laboratories_.pdf
Ā
Sociology 101 Demonstration of Learning Exhibit
Sociology 101 Demonstration of Learning Exhibit
Ā
Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdf
Ā
1029-Danh muc Sach Giao Khoa khoi 6.pdf
1029-Danh muc Sach Giao Khoa khoi 6.pdf
Ā
Micro-Scholarship, What it is, How can it help me.pdf
Micro-Scholarship, What it is, How can it help me.pdf
Ā
Unit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptx
Ā
Python Notes for mca i year students osmania university.docx
Python Notes for mca i year students osmania university.docx
Ā
Mixin Classes in Odoo 17 How to Extend Models Using Mixin Classes
Mixin Classes in Odoo 17 How to Extend Models Using Mixin Classes
Ā
This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.
Ā
Grant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy Consulting
Ā
Tį»NG ĆN Tįŗ¬P THI VĆO Lį»P 10 MĆN TIįŗ¾NG ANH NÄM Hį»C 2023 - 2024 CĆ ÄĆP ĆN (NGį»® Ć...
Tį»NG ĆN Tįŗ¬P THI VĆO Lį»P 10 MĆN TIįŗ¾NG ANH NÄM Hį»C 2023 - 2024 CĆ ÄĆP ĆN (NGį»® Ć...
Ā
ICT role in 21st century education and it's challenges.
ICT role in 21st century education and it's challenges.
Ā
SKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptx
SKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptx
Ā
Third Battle of Panipat detailed notes.pptx
Third Battle of Panipat detailed notes.pptx
Ā
Gors appropriate
1.
Appropri-ut in the sense of not inapppropriate ā the āright thingā to use, as well as approrpri-ate, as in co-opt, or use for something it perhaps wasnāt originally intended for. 1
2.
So for example, one thing I do is appropriate openly licensed media resources for my own slides. In this case, I want to set the scene for this presentaAon as one in which I havenāt been afraid to get my hands dirty, but I have also played with and explored a parAcular medium ā in this case, various digital technologies ā and created my own things which may also, ulAmately, be of direct use to others. You might also say theyāre at best half-baked, if not completely unbaked;-) 2
3.
The tools Iām going to talk about are situated within a data context. I spend a lot of Ame playing with openly licensed datasets, working across the whole data pipeline. This example, taken from the third year undergrad equivalent OU course TM351 āData Analysis and Managementā provides a simplisAc view of some of the processes involved in working with data. (We all know itās not quite that straighSorward, and oTen involves a lot of iteraAon or backtracking, but as well as āThe role of the academic [making] everything less simpleā, as Mary Beard put it in an Observer interview a few weeks ago, the academic also simpliļ¬es and idealises through abstracAon and revisionist storytelling, parAcularly when it comes to describing processes. So what I plan to do is spend a few minutes show you some of the tools and emerging approaches I use working across the various steps of this pipeline. 3
4.
So ā the ļ¬rst thing to note is that Iām a technology opAmist: I believe technology can help make our lives simpler, even if at ļ¬rst it may look as if we are making it more complex by introducing yet more tools to learn ā and install on computers that our IT department would rather we leT under their control. Taking control of your compuAng desAny is another theme of this talkā¦ In this example, the box diagram I showed on the ļ¬rst line was /wrien/ rather than drawn. If I want to add steps, or have sub-branches added to the diagram, I donāt need to start faļ¬ng around in Powerpoint or Word ļ¬gures trying to line things up and get them sized right and so on. I let the machine do it. In this parAcular online tool (you can see the URL in the screenshot at the top of the slide ā Iāll pop a copy of the annotated slides online, and also let Alan have a copy) ā so, in this parAcular tool, blockdiag, there are other diagram types available. The underlying code is also opensource and available as a python package, so you can write diagrams such as these in a Jupyter notebook, for example. Iāll have more to say about Jupyter notebooks later. 4
5.
One other point to note ā and a bit of blatant self-promoAon here ā most of the individual slides within this talk are backed up by one or more posts on my personal blog, Ouseful.info. Iāve been wriAng this blog for many years and it represents a reasonably complete notebook of a lots of the ideas Iāve explored over that Ame. In many cases, the posts are comprehensive and self-complete: they record all the steps I took to do somehAng in case I need to remind myself later. 5
6.
So, the pipeline. The ļ¬rst step, acquisiAon, relates to how we get hold of data This may be from downloaded data ļ¬les ā Excel spreadsheet documents (which are actually zip ļ¬les ā you know you can change the xlsx suļ¬x to zip and unzip them, right? Same with docx Word document ļ¬les and pptx Powerpoint ļ¬les), databases, online APIs (applicaAon programmable interfaces), but it may be scraped from other sorts of document. Web pages, for example, or PDF documents (even though PDF documents are horrible, itās oTen quite easy to extract data tables from them). Iām not going to talk about the mechanics of scraping, but journalism lecturer Paul Bradshaw has a good intro to a variety of tools and techniques in his Leanpub book āScraping for Journalistsā. 6
7.
I will beieļ¬y menAon a couple of tools I use though ā morph.io is a site hoste dby an Australian opendata group that is actually a fork of a tool by UK Liverpudlian start-up, Scraperwiki. Morp.io will run a scraper of your own wriAng, hosted on Github, once a day and pop the results into a SQLite database that you can download. The slide shows a scraper I use for scraping License applicaAons made to the Isle of Wight council. 7
8.
Another tool I use a lot is Tabula. Tabula is a Java applicaAon with a browser based user interface that will extract data tables from PDF documents. You simple drag to select the area of the page you want to scrape (you can mirror the same area over mulAple pages or deļ¬ne diļ¬erent areas on each). 8
9.
The heart of the applicaAon is actually a command line engine, recently wrapped by the R tabulizr package. This means you can automate the use of tabula in order to scrape tabular data from PDF documents within R, gepng the data back as an R data frame. Thatās tabulizr ā very nice; and the developer (on Github) is quite responsive. 9
10.
Another tool I use from Ame to Ame is Apache Tika ā this can extract text from PDFs, Word documents and so on, as well as from images. There are quite a few online OCR services now, many of them appearing as part of āAI toolsetsā, oļ¬ering a range of commodity AI API services ā IBM, MicrosoT and Google all have them, for example. So as well as OCR text extracAon, they do face and emoAon detecAon in images, semanAc tagging / enAty labeling within documents, automaAc image tagging, speech to text, and so on. All with varying degrees of success. But all of them steadily improving. 10
11.
ATer data acquisiAon, weāre oTen faced with cleaning a dataset. A tool I used for cleaning data is another Java applicaAon, again accessed via a browser, called OpenReļ¬ne. OpenReļ¬ne will open a wide range of document types ā spreadsheets, csv or tabbed data ļ¬les, XML, JSON, HTML ā either locally or from the web, and presents it in a spreadsheet style UI. A wide range of opAons are provided for applying a parAcular transformaAon to each cell in a parAcular column ā you can also script your own in a custom scripAng language, or Python ā as well as tools for faceAng and ļ¬ltering the display of rows based on values within one or more columns. The clustering tools are useful for ļ¬nding and correcAng parAal matches ā so for example, you can normalise MyCo Ltd, with MyCo Ltd., with MyCo Limited, and so on. 11
12.
OpenReļ¬ne can also provide support for a limited range of data reshaping acAons. Iāve described a few of them in this post, which takes a messy local elecAon results data set and shows how to clean and reshape it. OpenReļ¬ne also has a templated export ā so we can generate simple āline at a Ameā reports from a ļ¬ltered dataset. 12
13.
One of the things I try to look for in applicaAons is whether they are open source and whether they provide a browser based UI ā if you can use it via a browser, you should be able to use it on your own local machine or from a remotely hosted version accessed over the web. OpenReļ¬ne meets both these criteria, which means itās no problem for someone like IBM to make it available via their DataScienAstWorkbench site. (Itās also not too hard to roll you won version of something like this site.) The other tools currently provided by this site are RStudio, a powerful ā and friendly ā IDE for the R programming language, and Jupyter notebooks. 13
14.
One reason why itās gepng easier to expose these applicaAons over the web in a scaleable way is through containerisaAon. ContainerisaAon is a form of applicaAon virtualisaAon where one or more applicaAons can be wired together an isolated from each other within a mulA-tenanted virtual machine. Docker containers oļ¬er the promise of being able to ārun anywhereā ā or at least, anywhere where the container plaSorm can operate. Docker is the most popular route to this at the moment. The applicaAon show here is called KitemaAc. It lets you search for public applicaAon containers, and download them and run them locally on your own computer. The example shows various containers Iāve put together for OpenReļ¬ne (some are diļ¬erent versions, others are experiments / demos I really should delete) So rather than install Java on your computer and then download and install OpenReļ¬ne, you can just one-click in KitemaAc and it will get a prepackaged OpenReļ¬ne container for you that includes all that OpenReļ¬ne needs to run. 14
15.
One of the spin-oļ¬s from the early days of OpenReļ¬ne was the noAon of a āreconciliaAon serviceā, whereby you could look up each item in an OpenReļ¬ne column against a webservice that would try to match it to ā reconcile it with ā a known enAty. A parAal / fuzzy matching lookup against a controlled vocabulary, essenAally. OpenCorporates, the opendata internaAonal company lookup service, oļ¬ers a reconciliaAon endpoint. Itās easy enough to package up your own lookup tables and this recipe describes how to do it using a homebrewed reconciliaAon container. I did ones for MPs, for example. 15
16.
Just as an aside, when pupng together reconciliaAon services, we ideally want a canonical list of enAAes or enAty names we want to reconcile against. Registers can be a good source of these. But itās also worth noAng that registers can also be used to generate derived datasets. For example, I wanted a list of UK prisons with locaAon informaAon. In the absence ļ¬nding a single openly licensed dataset with this informaAon (a website with one prison per page was the closest I found, which I could have scraped but chose not to), I instead do a lookup via the Food Standards Agency, which has inspecAon informaAon for public food outlets. (Another source might have been the CQC, with a search for health surgeries or dental treatment centres, ļ¬ltered by āHMPā or āprisonā). 16
17.
RStudio is another applicaAon that can be freely redistributed and exposed via a bowser. These posts who how to run an RStudio applicaAon in the cloud using a simple container management dashboard formerly known as Tutum, now available as Docker Cloud. Iāve also described how to package a Shiny applicaAon in a container so you can deploy it anywhere. Does anyone use Shiny? Shiny is a rapid prototyping tool for building browser-based, HTML5 interacAve applicaAons and dashboards ā RStudio released a new dashboarding framework over the last couple of weeks ā that make it relaAvely easy to build interacAve data exloraAon tools against an R environment. 17
18.
One really nice component of the Docker ecosytem is docker-compose, formerly known as ļ¬g, which allows you to orchestrate the launch of several interlinked containers, so you can easily access one from another. The example here shows how to link RStudio and a Jupyter notebooks to a neo4j database. 18
19.
Iāve menAoned Jupyter a few Ames ā does anyone use Jupyter notebooks? IPython notebooks? The browser based notebook UI lets you enter text (as markdown) and executable code (in a variety of languages) and then run the code and display the results of the code execuAon back in the notebook. One thing Iāve been exploring recently is a way of calling command line applicaAon funcAons packaged in a container from a notebook cell, and returning the output of of the containerised command line funcAon as a shared ļ¬le. This post describes how I package the Contentmine tools - a set of tools for harvesAng scienAļ¬c journal papers and extracAng knowledge from them ā and which a real pain to set up normally ā and then use them via a notebook. 19
20.
Just by the by, if you want to try the notebooks out, thereās a live demo available. (I also did a post on āSeven Ways to Run Jupyter Notebooksā which describes several other alternaAve ways of running the notebooks.) The code example here shows all the code needed to open an Excel ļ¬le containing average travel Ames to GP surgeries by LSOA, ļ¬lter the data down to a parAcular local authority area, pull in an openly licensed geojson shapeļ¬le for that area, and then plot (and embed) an interacAve choropleth map via the folium python package (using Google maps, I think, though it may be OpenStreetmap?) 20
21.
One problem with producing interacAve maps is that someAmes you actually want an image. It turns out that webtesAng frameworks like Selenium make it easy to grab screenshots from test pages rendered in a test browser, so I co-opted the idea to produce a rouAne that lets me grab a png snapshot of a map. 21
22.
That example was actually created for a side project I dabbled with with our hyperlocal news outlet on the Isle of Wight called OnTheWIght. OnTheWight have been reporAng monthly job ļ¬gures for years, so I though Iād have a go at automaAng the producAon of the reports from nomis data, as well as producing a few charts. The report is just a literal reporAng, although I do try to add some colour and a Any amount of analysis for example by using direcAonal and magnitude terms ā āthe numbers went UP SLIGHTLY from last month, although they are SIGNIFICANTLY DOWN from the same Ame last yearā. And so on. 22
23.
On my own site, I started trying to pull out some geographical insight, automaAcally reporAng on areas with noAceably high unemployment compared to other areas by gender. The map does look like a populaAon map, but the unemployment rate is actually higher in some of the more heavily populated areas! 23
24.
Just a side note ā the idea of being able to build something once they deploy it more widely for no extra eļ¬ort really appeals to me. In the case of naAonal datasets broken down to local level, building a soluAon for a local area you know about and understand helps get you started on automaAcally detecAng and pulling out stories or features ā but the same code can then run for other areas. 24
25.
The pain points oTen come in splipng the data down to local areas and then generaAng the stories. 25
26.
But if you automate a pain point away for one local area, youāve solved the problem for all of them. The approach Iāve been taking is to think in terms of producing press releases rather than than ļ¬nished stories, relying on the journalist, or some other editorial role, to act as the ļ¬nal arbiter of the quality and relevance of the press release style communicaAon. The implicaAon is also that more work needs to be done checking and working up the press release for the ļ¬nal story (if, indeed, there is any story). 26
27.
So picking up on this idea of reuse ā or laziness ā the nomis data to text engine can be easily wrapped to to provide a conversaAonal UI for it. In this example, I can ask the service for the latest JSA ļ¬gures in a parAcular area. Although not shown, you can put in a postcode, for example, and get the ļ¬gures back for the local authority area containing that postcode. At the Ame I did this demo, I was half thinking of trying to persuade Johnston Press to give me some pin money to play with, so I scraped a list of Johnston press papers, found the postcode of their oļ¬ce, and used it as a the basis for a lookup of jobless ļ¬gures by newspaper Atle area. 27
28.
Having got some machinery set up to work with slack, I could also use it as an interface for a simple āspreadsheet row to paragraph of textā toy I was trying to put together. So here, for example, Iām looking up latest ļ¬gures for CQC care home inspecAons. (Actually, I think this is based on a scraper of the CQC website rather than a data ļ¬le download.) 28
29.
The original experiments had the slack bot code running on my personal computer. More recently, I started looking at how things like Amazon AWS Lamda funcAons, essenAally serverless remote procedure calls, could be used to host the bot. The examples here make use of the UK Parliament API to provide the content, allowing me to lookup up recent reports, or commiee memberships, for example. 29
30.
The data 2 text area is a rich one, and one thing I ļ¬nd reļ¬ecAng on my own exploratory data acAviAes is that I oTen look to charts (which are oTen custom, mutlilayered charts of my own devising ā ggplot is great for that) for inspiraAon. Working in educaAon, where we have a legal requirement to make our teaching materials accessible, charts and ļ¬gures oTen require wrien descripAons. So one thing Iāve started wondering recently is whether we can introspect on chart objects created using things like ggplot as a ādata basisā for a textualisaAon of the chart components (and then do data2tesxt analysis for the simple analyAcs insight reporAng). And it seems we can ā gpplot chart objects , for example, have a ggplot_build() introspector, and we can also get access directly to chart objects. 30
31.
When I posted about my ggplot2text experiment, I idly wondered whether we could do the same for matplotlib chart objects. And is seems we can, as this demo shared via a commenter shows. #Lazyweb Tw, you might say:-) 31
32.
As I was looking at the Parlimanent API backend for a simple conversaAonal search agent, the ONS Beta website became the live site. One of the nice things about the new ONS site is that a JSON feed alternaAve is available for much of the HTML content on the site. Which means we can repurpose that website content directly as a response to a conversaAonal search. 32
33.
Finally, I want to return to the Jupyter ecosystem. I absoultely love the notebook environment: it provides a great environment for wriAng literate, reproducible data analysis scripts (serval news outlets are starAng to publlish Jupyter notebooks showing the analysis behind their news stories ā Buzzfeed is a great example of this, as with their recent tennis macth ļ¬xing / bepng scame, for example), as well as providing a great environment for documenAng exploratory data analyses. But the Jupyter ecosystem is already much richer than that. I havenāt described the dashboard toolkit for creaAng live dashboards, the slideshow view that lets you create interacAve slides with live code execuAon, the range of programme language kernels (not just Python and R) or the kernel wrapper that lets you deļ¬ne an API via a notebook). But I do just want to quickly menAon remote kernels. 33
34.
At the moment, weāre currently rewriAng a day long residenAal school acAvity that uses Lego robots. UnAl this year, weāve used the original yellow Lego Mindstorms RCX brick. This year, weāre using the Lego EV3 brick, which has wiļ¬ and can be set up to run Linux and a python shell that can access the robotās bits. The approach Iāve been exploring it to run a remote IPython kernel on the brick, and a Juoyter server on a desktop machine, and then connect a notebook to the remote kernel via the Jupyter server. Running the notebook server on the brick removes the load of running the server from the brick. (The same approach can be ā and is ā used to run large tasks on supercomputer clusters.) The notebooks also allow us to create simple interacAve Uis ā just like R has the shiny framework, the Jupyter notebooks can run interacAve ipywidgets direclty wired to python state. In the example abovem I have a slide for controlling motor speed, for example (actually, the duty cycle fo the stepper motor) and another that displays the value being seen by a parAcular sensor. (Again, thereās a Any element of simplisAc data2text contextualisaAon in the display.) 34
35.
So thatās me done. Some of the tools and technologies that I think are appropriate for, or can be appropriated for, data related tasks. SomeAmes a pen will do as well as a spoon. 35
36.
And ļ¬nally, a last bit of blatant self-promoAon. In the same way that maths has recreaAonal maths ā fun puzzles in the Sunday papers ā I engage in recreaAonal data acAviAes. And as with the blog, I keep a record of what Iāve done. Several years ago, I started to learn R, and used Formula One results and Aming sheets data as context for that. Over the years, Iāve pulled various tricks and techniques together into this evolving book. (Actually, the book was also another experiment ā Leanpub encourages you to publish as you write, and used markdown for the manuscript. I was looking for an opportunity to explore whether we might be able to use something like Rstudio, and in parAcular Rmd, R-markdown) for authoring OU course materials, so this gave me a reason ā and a context ā for exploring such a workļ¬ow). Itās sAll a work in progress, bit at over 400 pages already it represents a reasonably deep dive into the diļ¬erent things you can do with a limited range of datasets on a parAcular topic, as well as exploring a variety of ways of using ā and appropriaAng ā R to help us ļ¬nd stories in data. 36
Download now