SlideShare a Scribd company logo
1 of 85
Download to read offline
With Dr Ian Hopkinson 
Data Science for Social Scientists 
LJMU 2014-09-12 
ian@scraperwiki.com
Aims 
●Explain “Data Science” and “Big Data” 
●Show some tools 
●Show some examples 
●Take home 
●New methodology 
●Plan a project 
Data Science for Social Scientists 
With Dr Ian Hopkinson from ScraperWiki
Overview 
●Introductions ~15 minutes 
●What is Data Science?~40 minutes 
●What is Big Data?~20 minutes 
●Coffee Break~15 minutes 
●Case Studies~90 minutes 
●Discussion~30 minutes 
Data Science for Social Scientists 
With Dr Ian Hopkinson from ScraperWiki
Background 
●Background in physics 
●Computer fiddler for many years 
●Academic at Cambridge and UMIST 8 years 
●Unilever PLC (large FMCG) for 8 years 
●Lots of experience with all sorts of data 
Data Science for Social Scientists 
With Dr Ian Hopkinson from ScraperWiki
Mindset 
●What can I do with this data? 
●Is there a plot I can do with other data? 
●Can I make a map? 
●Just how many bottles are there in the Science Museum collection? 
●How do I automate this? 
Data Science for Social Scientists 
With Dr Ian Hopkinson from ScraperWiki
How do you work? 
Data Science for Social Scientists 
With Dr Ian Hopkinson from ScraperWiki
What is Data Science? 
Data Science for Social Scientists 
With Dr Ian Hopkinson from ScraperWiki
Data science 
Data Science for Social Scientists 
With Dr Ian Hopkinson from ScraperWiki
Classification of data analysts 
Data Science for Social Scientists 
With Dr Ian Hopkinson from ScraperWiki 
Source: Enterprise Data Analysis and Visualization: An Interview Study
Tools 
●Excel –analysis, processing, visualisation 
●Tableau (Public) –better visualisation 
●Python –a programming language 
●Gephi–network visualisation 
●R -statistics 
●Databases (SQLite, MySQL, Postgresql) 
●… 
Data Science for Social Scientists 
With Dr Ian Hopkinson from ScraperWiki
Data Science –what does it look like? 
Data Science for Social Scientists 
With Dr Ian Hopkinson from ScraperWiki 
Discover 
Wrangle 
Profile 
Model 
Report 
Workflow
Wrangling –joining and cleaning data 
Data Science for Social Scientists 
With Dr Ian Hopkinson from ScraperWiki 
●Manual 
●Dictionary 
●Substitutions 
●Geocoding 
●NewsReader 
●Natural language processing 
●Face recognition 
●…
Data Science in the wild… 
●Google Flu Trends, credit scoring, recommendation systems, fleet management, search engines, price comparisons, Google Translate… 
●Commercially: all about prediction 
Data Science for Social Scientists 
With Dr Ian Hopkinson from ScraperWiki
Data Science 
●Statistics 
●Machine Learning 
●Natural Language Processing 
●Computer Vision 
●Data visualisation 
Data Science for Social Scientists 
With Dr Ian Hopkinson from ScraperWiki
Machine Learning 
Data Science for Social Scientists 
With Dr Ian Hopkinson from ScraperWiki 
●Classification 
●“Supervised” –training set 
●“Unsupervised” –no training set 
●Algorithms 
●Naïve Bayesian, Logistic regression, Support vector machines, decision trees…
Machine Learning 
Data Science for Social Scientists 
With Dr Ian Hopkinson from ScraperWiki 
Source: scikit-learn, Supervised learning
Classifier evaluation 
Data Science for Social Scientists 
With Dr Ian Hopkinson from ScraperWiki 
Source: Understanding uncertainty, visualising probabilities 
TP 
FN 
FP 
= 9/(9 + 99) = 8.3% 
TN 
= 9/(9 + 1) = 90%
Natural Language Processing 
Data Science for Social Scientists 
With Dr Ian Hopkinson from ScraperWiki 
●Parts of speech 
●Named entity recognition 
●Sentiment analysis 
●Summarisation 
●Document search
Computer vision 
Data Science for Social Scientists 
With Dr Ian Hopkinson from ScraperWiki 
●Face / object recognition 
●Image forensics 
●Camera poise
What is Big Data? 
Data Science for Social Scientists 
With Dr Ian Hopkinson from ScraperWiki
Data Science for Social Scientists 
With Dr Ian Hopkinson from ScraperWiki 
Big Data 
●Velocity, value, variability… 
Source: Doug Laney at Gartner 
●N = all, messy, correlation not causation 
Source: Big Data by Mayer-Schönbergerand Cukier 
●Thirty other definitions… 
Source: What is Big Data?
Data Science for Social Scientists 
With Dr Ian Hopkinson from ScraperWiki 
Source: Analyzing the Analyzers
Data Science for Social Scientists 
With Dr Ian Hopkinson from ScraperWiki 
Big Data –my view 
●Lots of data, collected almost casually 
●Cloud storage and processing 
●Computing frameworks 
●Data mining algorithms, visualisation
Data Science for Social Scientists 
With Dr Ian Hopkinson from ScraperWiki 
Hadoop Ecosystem 
Source: ADASTRA
●Everyone talks about it, 
●nobody really knows how to do it, 
●everyone thinks everyone else is doing it, 
●so everyone claims they are doing it too. 
Data Science for Social Scientists 
With Dr Ian Hopkinson from ScraperWiki 
Big Data is like teenage sex…
What is your data? 
Data Science for Social Scientists 
With Dr Ian Hopkinson from ScraperWiki
Case Studies 
Data Science for Social Scientists 
With Dr Ian Hopkinson from ScraperWiki
Case Studies 
●BIG Lottery 
●MOT data 
●NewsReader 
●Inspiring Women 
Data Science for Social Scientists 
With Dr Ian Hopkinson from ScraperWiki 
●London Underground 
●Machine learning 
●Face Recognition 
●UN democracy 
●Google Ngram
Google Ngram 
●How does the popularity of scientists vary over time? 
●Ngrams(frequency of word combinations) 
●6 million digitized books 
●Big Data in the wild 
Data Science for Social Scientists 
With Dr Ian Hopkinson from ScraperWiki 
Source: Google Ngram, the data
Data Science for Social Scientists 
With Dr Ian Hopkinson from ScraperWiki 
William Thomson aka Lord Kelvin
Data Science for Social Scientists 
With Dr Ian Hopkinson from ScraperWiki 
An assortment
Data Science for Social Scientists 
With Dr Ian Hopkinson from ScraperWiki 
Isaac Newton and Albert Einstein
Google Ngram-lessons 
●Huge dataset drives the site (1TB for just two corpuses) 
●No book metadata 
Data Science for Social Scientists 
With Dr Ian Hopkinson from ScraperWiki
BIG Lottery 
●Where is lottery money allocated? 
●BIG Lottery data 
●ONS population data 
●Sport data 
●Fun with natural language processing 
Data Science for Social Scientists 
With Dr Ian Hopkinson from ScraperWiki 
Visualisation and blog: BIG Lottery
BIG Lottery 
Data Science for Social Scientists 
With Dr Ian Hopkinson from ScraperWiki 
Visualisation and blog: BIG Lottery
BIG Lottery 
Data Science for Social Scientists 
With Dr Ian Hopkinson from ScraperWiki 
Visualisation and blog: BIG Lottery
BIG Lottery 
Data Science for Social Scientists 
With Dr Ian Hopkinson from ScraperWiki 
Visualisation and blog: BIG Lottery
BIG Lottery 
Data Science for Social Scientists 
With Dr Ian Hopkinson from ScraperWiki 
Visualisation and blog: BIG Lottery
NLP demo 
Data Science for Social Scientists 
With Dr Ian Hopkinson from ScraperWiki
BIG Lottery 
Data Science for Social Scientists 
With Dr Ian Hopkinson from ScraperWiki 
Visualisation and blog: BIG Lottery
BIG Lottery 
Data Science for Social Scientists 
With Dr Ian Hopkinson from ScraperWiki 
Visualisation and blog: BIG Lottery
BIG Lottery -lessons 
●Initial (geographic) analysis didn’t work as expected 
●Revealed some organisational history 
Data Science for Social Scientists 
With Dr Ian Hopkinson from ScraperWiki 
Visualisation and blog: BIG Lottery
NewsReader World Cup Hack Day 
●How can we explore and understand huge volumes of news articles? 
●300,000 news articles on the World Cup 
●Cutting edge NLP/semantic web 
●Making an API 
Data Science for Social Scientists 
With Dr Ian Hopkinson from ScraperWiki 
Visualisation and blog: NewsReader
NewsReader World Cup Hack Day 
Data Science for Social Scientists 
With Dr Ian Hopkinson from ScraperWiki 
Visualisation and blog: NewsReader
NewsReader World Cup Hack Day 
Data Science for Social Scientists 
With Dr Ian Hopkinson from ScraperWiki 
Visualisation and blog: NewsReader
NewsReader World Cup Hack Day 
Data Science for Social Scientists 
With Dr Ian Hopkinson from ScraperWiki 
Visualisation: NewsReader
NewsReader Demo 
Data Science for Social Scientists 
With Dr Ian Hopkinson from ScraperWiki 
https://newsreader.scraperwiki.com/
GephiDemo 
Data Science for Social Scientists 
With Dr Ian Hopkinson from ScraperWiki
NewsReader -lessons 
●Proper research project –this is actually hard stuff! 
Data Science for Social Scientists 
With Dr Ian Hopkinson from ScraperWiki
#InspiringWomenon twitter 
●Who are the #InspiringWomen? 
●40,000 Tweets from a hashtag 
●Tableau for quick and easy visualisation 
Data Science for Social Scientists 
With Dr Ian Hopkinson from ScraperWiki 
Visualisation and blog: InspiringWomen
#InspiringWomenon Twitter 
Data Science for Social Scientists 
With Dr Ian Hopkinson from ScraperWiki 
Visualisation and blog: InspiringWomen
#InspiringWomenon Twitter 
Data Science for Social Scientists 
With Dr Ian Hopkinson from ScraperWiki 
Top 5 retweets 
1.Emma Watson 
2.Ada Lovelace 
3.Delia Derbyshire 
4.JK Rowling 
5.HedyLamarr 
Visualisation and blog: InspiringWomen
Twitter Demo 
Data Science for Social Scientists 
With Dr Ian Hopkinson from ScraperWiki
#InspiringWomen-lessons 
●Spamming on twitter 
●Dynamics of retweeting 
Data Science for Social Scientists 
With Dr Ian Hopkinson from ScraperWiki 
Visualisation and blog: InspiringWomen
MOT Data 
●Are some makes of car prone to particular faults at MOT? 
●Data on every single MOT test 
●Handling a large dataset 
●180,000,000 data points, 20GB 
Data Science for Social Scientists 
With Dr Ian Hopkinson from ScraperWiki
SQL Demo 
Data Science for Social Scientists 
With Dr Ian Hopkinson from ScraperWiki
MOT Data 
Data Science for Social Scientists 
With Dr Ian Hopkinson from ScraperWiki
MOT Data 
Data Science for Social Scientists 
With Dr Ian Hopkinson from ScraperWiki
MOT Data 
Data Science for Social Scientists 
With Dr Ian Hopkinson from ScraperWiki
MOT Data 
●Easy to get lost in a huge dataset 
●Even sharing the derived data can be difficult 
Data Science for Social Scientists 
With Dr Ian Hopkinson from ScraperWiki
London Underground Visualisation 
●How do passenger numbers vary across the London Underground? 
●Wikipedia 
●Openstreetmap 
●London Transport 
●Melding/tidying data from disparate sources 
●Fancy visualisation 
Data Science for Social Scientists 
With Dr Ian Hopkinson from ScraperWiki 
Visualisation and blog: London Underground
London Underground Visualisation 
Data Science for Social Scientists 
With Dr Ian Hopkinson from ScraperWiki 
Visualisation and blog: London Underground
London Underground –can I walk it? 
Data Science for Social Scientists 
With Dr Ian Hopkinson from ScraperWiki 
Visualisation and blog: London Underground –can I walk it?
London Underground -lessons 
●Fixed a problem with Table Xtract 
●Whizzy visualisations can have a big impact 
Data Science for Social Scientists 
With Dr Ian Hopkinson from ScraperWiki 
Visualisation and blog: London Underground
Twitter Machine Learning 
●ScraperWikistwitter followers 
●Which websites are businesses? 
●1,000 website URLs 
●Machine learning to classify websites 
Data Science for Social Scientists 
With Dr Ian Hopkinson from ScraperWiki 
Blog: Twitter Machine Learning
Twitter Machine Learning 
Data Science for Social Scientists 
With Dr Ian Hopkinson from ScraperWiki 
Blog: Twitter Machine Learning
Twitter Machine Learning –lift curve 
Data Science for Social Scientists 
With Dr Ian Hopkinson from ScraperWiki 
Blog: Twitter Machine Learning
Twitter Machine Learning -lessons 
●Scikit-learn libraries make this easy 
●Try out different algorithms –it’s cheap 
●Getting the training set the key problem 
Data Science for Social Scientists 
With Dr Ian Hopkinson from ScraperWiki 
Blog: Twitter Machine Learning
Face recognition 
●What are the demographics of twitter users? 
●Followers of @ScraperWiki 
●Face recognition, online services 
Data Science for Social Scientists 
With Dr Ian Hopkinson from ScraperWiki 
Blog: Face ReKognition
Face recognition 
Data Science for Social Scientists 
With Dr Ian Hopkinson from ScraperWiki 
Blog: Face ReKognition
Face recognition 
Data Science for Social Scientists 
With Dr Ian Hopkinson from ScraperWiki
Face recognition 
Data Science for Social Scientists 
With Dr Ian Hopkinson from ScraperWiki 
Blog: Face ReKognition
Face recognition 
Data Science for Social Scientists 
With Dr Ian Hopkinson from ScraperWiki 
Blog: Face ReKognition
Face Recognition -lessons 
●Complex analysis as a service 
●Works reasonably well 
Data Science for Social Scientists 
With Dr Ian Hopkinson from ScraperWiki 
Blog: Face ReKognition
UN Democracy 
●Improving access to the UN verbatim proceedings 
●Processing PDF 
Data Science for Social Scientists 
With Dr Ian Hopkinson from ScraperWiki
UN Democracy 
Data Science for Social Scientists 
With Dr Ian Hopkinson from ScraperWiki
UN Democracy 
Data Science for Social Scientists 
With Dr Ian Hopkinson from ScraperWiki
UN Democracy -lessons 
●Extracting data from websites can be hard 
●PDF is an important resource –general extraction mechanisms are hard 
Data Science for Social Scientists 
With Dr Ian Hopkinson from ScraperWiki
Digital Humanities on the web 
Data Science for Social Scientists 
With Dr Ian Hopkinson from ScraperWiki 
Source: @Sharon_Howard
Digital Humanities on the web (1) 
●Digging into Data Challenge 
●Nineteeth Century Scholarship Online 
●Eighteenth Century Scholarship Online 
●Medieval Electronic Scholarly Alliance 
●The Great Parchment Book 
Data Science for Social Scientists 
With Dr Ian Hopkinson from ScraperWiki 
Source: @Sharon_Howard
Digital Humanities on the web (2) 
●Quantifying Kissinger 
●Swansea –City Witness 
●Circulation of Knowledge and Learned Practices in the 17th-century Dutch Republic 
Data Science for Social Scientists 
With Dr Ian Hopkinson from ScraperWiki 
Source: @Sharon_Howard
Digital Humanities on the web (3) 
●City and Region -Urban and Agricultural Rent in England, 1400-1914. 
●The Proceedings of the Old Bailey 1674- 1913 
●Mapping the republic of letters 
Data Science for Social Scientists 
With Dr Ian Hopkinson from ScraperWiki 
Source: @Sharon_Howard
Bibliography 
●Natural Language Processing with Pythonby Steven Bird, Ewan Klein & Edward Loper 
●Mining the Social Webby Matthew A. Russell 
●Data Mining –Practical Machine Learning Tools and Techniquesby Witten, Frank and Hall 
●Data Science for Business Foster Provost and Tom Fawcett 
●Big Data by Viktor Mayer-Schönbergerand Kenneth Cukier 
●…more book reviews at ScraperWiki 
Data Science for Social Scientists 
With Dr Ian Hopkinson from ScraperWiki
Aims 
●Explain “Data Science” and “Big Data” 
●Show some tools 
●Show some examples 
●Take home 
●New methodology 
●Plan a project 
Data Science for Social Scientists 
With Dr Ian Hopkinson from ScraperWiki
Thank You! Please fill in survey 
Data Science for Social Scientists 
With Dr Ian Hopkinson from ScraperWiki 
ian@scraperwiki.com

More Related Content

What's hot

Booz Allen Hamilton's Data Science Infographic
Booz Allen Hamilton's Data Science InfographicBooz Allen Hamilton's Data Science Infographic
Booz Allen Hamilton's Data Science InfographicBooz Allen Hamilton
 
Creating Value in Health through Big Data
Creating Value in Health through Big DataCreating Value in Health through Big Data
Creating Value in Health through Big DataBooz Allen Hamilton
 
Introduction to Data Science and Large-scale Machine Learning
Introduction to Data Science and Large-scale Machine LearningIntroduction to Data Science and Large-scale Machine Learning
Introduction to Data Science and Large-scale Machine LearningNik Spirin
 
Introduction to data science.pptx
Introduction to data science.pptxIntroduction to data science.pptx
Introduction to data science.pptxSadhanaParameswaran
 
A data view of the data science process
A data view of the data science processA data view of the data science process
A data view of the data science processMathieu d'Aquin
 
How to become a Data Scientist?
How to become a Data Scientist? How to become a Data Scientist?
How to become a Data Scientist? HackerEarth
 
How to Become a Data Scientist – By Ryan Orban, VP of Operations and Expansio...
How to Become a Data Scientist – By Ryan Orban, VP of Operations and Expansio...How to Become a Data Scientist – By Ryan Orban, VP of Operations and Expansio...
How to Become a Data Scientist – By Ryan Orban, VP of Operations and Expansio...Galvanize
 
What data scientists really do, according to 50 data scientists
What data scientists really do, according to 50 data scientistsWhat data scientists really do, according to 50 data scientists
What data scientists really do, according to 50 data scientistsHugo Bowne-Anderson
 
Booz Allen Field Guide to Data Science
Booz Allen Field Guide to Data Science Booz Allen Field Guide to Data Science
Booz Allen Field Guide to Data Science Booz Allen Hamilton
 
Python for Data Science - TDC 2015
Python for Data Science - TDC 2015Python for Data Science - TDC 2015
Python for Data Science - TDC 2015Gabriel Moreira
 
The Field Guide to Data Science
The Field Guide to Data ScienceThe Field Guide to Data Science
The Field Guide to Data ScienceEMC
 

What's hot (16)

Booz Allen Hamilton's Data Science Infographic
Booz Allen Hamilton's Data Science InfographicBooz Allen Hamilton's Data Science Infographic
Booz Allen Hamilton's Data Science Infographic
 
Creating Value in Health through Big Data
Creating Value in Health through Big DataCreating Value in Health through Big Data
Creating Value in Health through Big Data
 
data mining
data mining data mining
data mining
 
The Digital Enterprise
The Digital EnterpriseThe Digital Enterprise
The Digital Enterprise
 
Introduction to Data Science and Large-scale Machine Learning
Introduction to Data Science and Large-scale Machine LearningIntroduction to Data Science and Large-scale Machine Learning
Introduction to Data Science and Large-scale Machine Learning
 
HICSS - 50
HICSS - 50 HICSS - 50
HICSS - 50
 
Introduction to data science.pptx
Introduction to data science.pptxIntroduction to data science.pptx
Introduction to data science.pptx
 
Data Science and its impact on society
Data Science and its impact on societyData Science and its impact on society
Data Science and its impact on society
 
A data view of the data science process
A data view of the data science processA data view of the data science process
A data view of the data science process
 
Data science and_analytics_for_ordinary_people_ebook
Data science and_analytics_for_ordinary_people_ebookData science and_analytics_for_ordinary_people_ebook
Data science and_analytics_for_ordinary_people_ebook
 
How to become a Data Scientist?
How to become a Data Scientist? How to become a Data Scientist?
How to become a Data Scientist?
 
How to Become a Data Scientist – By Ryan Orban, VP of Operations and Expansio...
How to Become a Data Scientist – By Ryan Orban, VP of Operations and Expansio...How to Become a Data Scientist – By Ryan Orban, VP of Operations and Expansio...
How to Become a Data Scientist – By Ryan Orban, VP of Operations and Expansio...
 
What data scientists really do, according to 50 data scientists
What data scientists really do, according to 50 data scientistsWhat data scientists really do, according to 50 data scientists
What data scientists really do, according to 50 data scientists
 
Booz Allen Field Guide to Data Science
Booz Allen Field Guide to Data Science Booz Allen Field Guide to Data Science
Booz Allen Field Guide to Data Science
 
Python for Data Science - TDC 2015
Python for Data Science - TDC 2015Python for Data Science - TDC 2015
Python for Data Science - TDC 2015
 
The Field Guide to Data Science
The Field Guide to Data ScienceThe Field Guide to Data Science
The Field Guide to Data Science
 

Viewers also liked

Cap 4 marradi teoria
Cap 4 marradi teoriaCap 4 marradi teoria
Cap 4 marradi teoriaIvi Monteiro
 
AULA - Introdução à Metodologia e Técnicas de Pesquisa em Ciências Sociais
AULA  - Introdução à  Metodologia e Técnicas de Pesquisa em Ciências SociaisAULA  - Introdução à  Metodologia e Técnicas de Pesquisa em Ciências Sociais
AULA - Introdução à Metodologia e Técnicas de Pesquisa em Ciências SociaisCleide Magáli dos Santos
 
Aula 1 Metodologia e Técnicas de pesquisa...A ruptura do senso comum e...
Aula 1  Metodologia e Técnicas de pesquisa...A ruptura do senso comum e...Aula 1  Metodologia e Técnicas de pesquisa...A ruptura do senso comum e...
Aula 1 Metodologia e Técnicas de pesquisa...A ruptura do senso comum e...Cleide Magáli dos Santos
 
reseach methodology in social sciences..
reseach methodology in social sciences..reseach methodology in social sciences..
reseach methodology in social sciences..shailesh15
 
FAEME METODOLOGIA CIENTÍFICA
FAEME METODOLOGIA CIENTÍFICAFAEME METODOLOGIA CIENTÍFICA
FAEME METODOLOGIA CIENTÍFICAUFMA e UEMA
 
Aula O método nas Ciências Sociais
Aula O método nas Ciências SociaisAula O método nas Ciências Sociais
Aula O método nas Ciências SociaisLeonardo Kaplan
 
Bonn Juego (2013) Reflections on Methodology for PhD Research in the Social S...
Bonn Juego (2013) Reflections on Methodology for PhD Research in the Social S...Bonn Juego (2013) Reflections on Methodology for PhD Research in the Social S...
Bonn Juego (2013) Reflections on Methodology for PhD Research in the Social S...Bonn Juego
 
Aula - Metodologia, Método e Técnicas (conceitos básicos)
Aula - Metodologia, Método e Técnicas (conceitos básicos)Aula - Metodologia, Método e Técnicas (conceitos básicos)
Aula - Metodologia, Método e Técnicas (conceitos básicos)Cleide Magáli dos Santos
 
DISCIPLINES OF SOCIAL SCIENCE
DISCIPLINES OF SOCIAL SCIENCEDISCIPLINES OF SOCIAL SCIENCE
DISCIPLINES OF SOCIAL SCIENCESAJINA K S
 
As origens e os principais teóricos da sociologia
As origens e os principais teóricos  da sociologiaAs origens e os principais teóricos  da sociologia
As origens e os principais teóricos da sociologiaLucio Oliveira
 
principles/approaches/strategies of teaching/learning social studies/social s...
principles/approaches/strategies of teaching/learning social studies/social s...principles/approaches/strategies of teaching/learning social studies/social s...
principles/approaches/strategies of teaching/learning social studies/social s...maryjune Jardeleza
 
Research methodology for project work for undergraduate students
Research  methodology  for project work for undergraduate  studentsResearch  methodology  for project work for undergraduate  students
Research methodology for project work for undergraduate studentsDr. Sanjay Sawant Dessai
 

Viewers also liked (12)

Cap 4 marradi teoria
Cap 4 marradi teoriaCap 4 marradi teoria
Cap 4 marradi teoria
 
AULA - Introdução à Metodologia e Técnicas de Pesquisa em Ciências Sociais
AULA  - Introdução à  Metodologia e Técnicas de Pesquisa em Ciências SociaisAULA  - Introdução à  Metodologia e Técnicas de Pesquisa em Ciências Sociais
AULA - Introdução à Metodologia e Técnicas de Pesquisa em Ciências Sociais
 
Aula 1 Metodologia e Técnicas de pesquisa...A ruptura do senso comum e...
Aula 1  Metodologia e Técnicas de pesquisa...A ruptura do senso comum e...Aula 1  Metodologia e Técnicas de pesquisa...A ruptura do senso comum e...
Aula 1 Metodologia e Técnicas de pesquisa...A ruptura do senso comum e...
 
reseach methodology in social sciences..
reseach methodology in social sciences..reseach methodology in social sciences..
reseach methodology in social sciences..
 
FAEME METODOLOGIA CIENTÍFICA
FAEME METODOLOGIA CIENTÍFICAFAEME METODOLOGIA CIENTÍFICA
FAEME METODOLOGIA CIENTÍFICA
 
Aula O método nas Ciências Sociais
Aula O método nas Ciências SociaisAula O método nas Ciências Sociais
Aula O método nas Ciências Sociais
 
Bonn Juego (2013) Reflections on Methodology for PhD Research in the Social S...
Bonn Juego (2013) Reflections on Methodology for PhD Research in the Social S...Bonn Juego (2013) Reflections on Methodology for PhD Research in the Social S...
Bonn Juego (2013) Reflections on Methodology for PhD Research in the Social S...
 
Aula - Metodologia, Método e Técnicas (conceitos básicos)
Aula - Metodologia, Método e Técnicas (conceitos básicos)Aula - Metodologia, Método e Técnicas (conceitos básicos)
Aula - Metodologia, Método e Técnicas (conceitos básicos)
 
DISCIPLINES OF SOCIAL SCIENCE
DISCIPLINES OF SOCIAL SCIENCEDISCIPLINES OF SOCIAL SCIENCE
DISCIPLINES OF SOCIAL SCIENCE
 
As origens e os principais teóricos da sociologia
As origens e os principais teóricos  da sociologiaAs origens e os principais teóricos  da sociologia
As origens e os principais teóricos da sociologia
 
principles/approaches/strategies of teaching/learning social studies/social s...
principles/approaches/strategies of teaching/learning social studies/social s...principles/approaches/strategies of teaching/learning social studies/social s...
principles/approaches/strategies of teaching/learning social studies/social s...
 
Research methodology for project work for undergraduate students
Research  methodology  for project work for undergraduate  studentsResearch  methodology  for project work for undergraduate  students
Research methodology for project work for undergraduate students
 

Similar to Data Science For Social Scientists Workshop

Open source science
Open source scienceOpen source science
Open source scienceAhmed Saeed
 
10/13 Top 5 Deep Learning Stories
10/13 Top 5 Deep Learning Stories10/13 Top 5 Deep Learning Stories
10/13 Top 5 Deep Learning StoriesNVIDIA
 
Getting to Know Your Data with R
Getting to Know Your Data with RGetting to Know Your Data with R
Getting to Know Your Data with RStephen Withington
 
Introduction to Big Data and Data Science
Introduction to Big Data and Data ScienceIntroduction to Big Data and Data Science
Introduction to Big Data and Data ScienceFeyzi R. Bagirov
 
Informatics Transform : Re-engineering Libraries for the Data Decade
Informatics Transform : Re-engineering Libraries for the Data DecadeInformatics Transform : Re-engineering Libraries for the Data Decade
Informatics Transform : Re-engineering Libraries for the Data DecadeLiz Lyon
 
Digital research: Collections, data, tools and methods
Digital research: Collections, data, tools and methods Digital research: Collections, data, tools and methods
Digital research: Collections, data, tools and methods Stella Wisdom
 
ACS Summer Institute - Emerging Roles of Librarians - 14_0731
ACS Summer Institute - Emerging Roles of Librarians - 14_0731ACS Summer Institute - Emerging Roles of Librarians - 14_0731
ACS Summer Institute - Emerging Roles of Librarians - 14_0731jeffreylancaster
 
Big Data in NATO and Your Role
Big Data in NATO and Your RoleBig Data in NATO and Your Role
Big Data in NATO and Your RoleJay Gendron
 
Big Data in the Arts and Humanities: Stirling presentation
Big Data in the Arts and Humanities: Stirling presentationBig Data in the Arts and Humanities: Stirling presentation
Big Data in the Arts and Humanities: Stirling presentationAndrew Prescott
 
Data_Science_Generating_Value_From_Data_Course_Slides_red.pdf
Data_Science_Generating_Value_From_Data_Course_Slides_red.pdfData_Science_Generating_Value_From_Data_Course_Slides_red.pdf
Data_Science_Generating_Value_From_Data_Course_Slides_red.pdfOlgaAngelikiKyriakou
 
CODATA International Training Workshop in Big Data for Science for Researcher...
CODATA International Training Workshop in Big Data for Science for Researcher...CODATA International Training Workshop in Big Data for Science for Researcher...
CODATA International Training Workshop in Big Data for Science for Researcher...Johann van Wyk
 
HKU Data Curation MLIM7350 Class 7
HKU Data Curation MLIM7350 Class 7HKU Data Curation MLIM7350 Class 7
HKU Data Curation MLIM7350 Class 7Scott Edmunds
 
Big Data in the Arts and Humanities
Big Data in the Arts and HumanitiesBig Data in the Arts and Humanities
Big Data in the Arts and HumanitiesAndrew Prescott
 
Interoperability of a Social Media Observatory
Interoperability of a Social Media ObservatoryInteroperability of a Social Media Observatory
Interoperability of a Social Media ObservatoryKarissa Rae McKelvey
 

Similar to Data Science For Social Scientists Workshop (20)

Open source science
Open source scienceOpen source science
Open source science
 
Better Data for a Better World
Better Data for a Better WorldBetter Data for a Better World
Better Data for a Better World
 
10/13 Top 5 Deep Learning Stories
10/13 Top 5 Deep Learning Stories10/13 Top 5 Deep Learning Stories
10/13 Top 5 Deep Learning Stories
 
Lecture #01
Lecture #01Lecture #01
Lecture #01
 
Getting to Know Your Data with R
Getting to Know Your Data with RGetting to Know Your Data with R
Getting to Know Your Data with R
 
Bowdoin: Data Driven Socities 2014 - On Digital Publics of Opening…or Not 2/1...
Bowdoin: Data Driven Socities 2014 - On Digital Publics of Opening…or Not 2/1...Bowdoin: Data Driven Socities 2014 - On Digital Publics of Opening…or Not 2/1...
Bowdoin: Data Driven Socities 2014 - On Digital Publics of Opening…or Not 2/1...
 
Introduction to Big Data and Data Science
Introduction to Big Data and Data ScienceIntroduction to Big Data and Data Science
Introduction to Big Data and Data Science
 
Informatics Transform : Re-engineering Libraries for the Data Decade
Informatics Transform : Re-engineering Libraries for the Data DecadeInformatics Transform : Re-engineering Libraries for the Data Decade
Informatics Transform : Re-engineering Libraries for the Data Decade
 
figshare Codecamp Iasi
figshare Codecamp Iasifigshare Codecamp Iasi
figshare Codecamp Iasi
 
Digital research: Collections, data, tools and methods
Digital research: Collections, data, tools and methods Digital research: Collections, data, tools and methods
Digital research: Collections, data, tools and methods
 
ACS Summer Institute - Emerging Roles of Librarians - 14_0731
ACS Summer Institute - Emerging Roles of Librarians - 14_0731ACS Summer Institute - Emerging Roles of Librarians - 14_0731
ACS Summer Institute - Emerging Roles of Librarians - 14_0731
 
Big Data in NATO and Your Role
Big Data in NATO and Your RoleBig Data in NATO and Your Role
Big Data in NATO and Your Role
 
Big Data in the Arts and Humanities: Stirling presentation
Big Data in the Arts and Humanities: Stirling presentationBig Data in the Arts and Humanities: Stirling presentation
Big Data in the Arts and Humanities: Stirling presentation
 
Data_Science_Generating_Value_From_Data_Course_Slides_red.pdf
Data_Science_Generating_Value_From_Data_Course_Slides_red.pdfData_Science_Generating_Value_From_Data_Course_Slides_red.pdf
Data_Science_Generating_Value_From_Data_Course_Slides_red.pdf
 
Data stories
Data storiesData stories
Data stories
 
CODATA International Training Workshop in Big Data for Science for Researcher...
CODATA International Training Workshop in Big Data for Science for Researcher...CODATA International Training Workshop in Big Data for Science for Researcher...
CODATA International Training Workshop in Big Data for Science for Researcher...
 
HKU Data Curation MLIM7350 Class 7
HKU Data Curation MLIM7350 Class 7HKU Data Curation MLIM7350 Class 7
HKU Data Curation MLIM7350 Class 7
 
Big Data in the Arts and Humanities
Big Data in the Arts and HumanitiesBig Data in the Arts and Humanities
Big Data in the Arts and Humanities
 
Interoperability of a Social Media Observatory
Interoperability of a Social Media ObservatoryInteroperability of a Social Media Observatory
Interoperability of a Social Media Observatory
 
Cool Tools
Cool Tools Cool Tools
Cool Tools
 

Recently uploaded

Stochastic Dynamic Programming and You.pptx
Stochastic Dynamic Programming and You.pptxStochastic Dynamic Programming and You.pptx
Stochastic Dynamic Programming and You.pptxjkmrshll88
 
The market for cross-border mortgages in Europe
The market for cross-border mortgages in EuropeThe market for cross-border mortgages in Europe
The market for cross-border mortgages in Europe321k
 
Bengaluru Tableau UG event- 2nd March 2024 Q1
Bengaluru Tableau UG event- 2nd March 2024 Q1Bengaluru Tableau UG event- 2nd March 2024 Q1
Bengaluru Tableau UG event- 2nd March 2024 Q1bengalurutug
 
Empowering Decisions A Guide to Embedded Analytics
Empowering Decisions A Guide to Embedded AnalyticsEmpowering Decisions A Guide to Embedded Analytics
Empowering Decisions A Guide to Embedded AnalyticsGain Insights
 
STOCK PRICE ANALYSIS Furkan Ali TASCI --.pptx
STOCK PRICE ANALYSIS  Furkan Ali TASCI --.pptxSTOCK PRICE ANALYSIS  Furkan Ali TASCI --.pptx
STOCK PRICE ANALYSIS Furkan Ali TASCI --.pptxFurkanTasci3
 
2024 Build Generative AI for Non-Profits
2024 Build Generative AI for Non-Profits2024 Build Generative AI for Non-Profits
2024 Build Generative AI for Non-ProfitsTimothy Spann
 
Neo4j Graph Summit 2024 Workshop - EMEA - Breda_and_Munchen.pdf
Neo4j Graph Summit 2024 Workshop - EMEA - Breda_and_Munchen.pdfNeo4j Graph Summit 2024 Workshop - EMEA - Breda_and_Munchen.pdf
Neo4j Graph Summit 2024 Workshop - EMEA - Breda_and_Munchen.pdfNeo4j
 
TCFPro24 Building Real-Time Generative AI Pipelines
TCFPro24 Building Real-Time Generative AI PipelinesTCFPro24 Building Real-Time Generative AI Pipelines
TCFPro24 Building Real-Time Generative AI PipelinesTimothy Spann
 
Prediction Of Cryptocurrency Prices Using Lstm, Svm And Polynomial Regression...
Prediction Of Cryptocurrency Prices Using Lstm, Svm And Polynomial Regression...Prediction Of Cryptocurrency Prices Using Lstm, Svm And Polynomial Regression...
Prediction Of Cryptocurrency Prices Using Lstm, Svm And Polynomial Regression...ferisulianta.com
 
Microeconomic Group Presentation Apple.pdf
Microeconomic Group Presentation Apple.pdfMicroeconomic Group Presentation Apple.pdf
Microeconomic Group Presentation Apple.pdfmxlos0
 
Understanding the Impact of video length on student performance
Understanding the Impact of video length on student performanceUnderstanding the Impact of video length on student performance
Understanding the Impact of video length on student performancePrithaVashisht1
 
Data Collection from Social Media Platforms
Data Collection from Social Media PlatformsData Collection from Social Media Platforms
Data Collection from Social Media PlatformsMahmoud Yasser
 
STOCK PRICE ANALYSIS Furkan Ali TASCI --.pptx
STOCK PRICE ANALYSIS  Furkan Ali TASCI --.pptxSTOCK PRICE ANALYSIS  Furkan Ali TASCI --.pptx
STOCK PRICE ANALYSIS Furkan Ali TASCI --.pptxFurkanTasci3
 
Unleashing Datas Potential - Mastering Precision with FCO-IM
Unleashing Datas Potential - Mastering Precision with FCO-IMUnleashing Datas Potential - Mastering Precision with FCO-IM
Unleashing Datas Potential - Mastering Precision with FCO-IMMarco Wobben
 
Deloitte+RedCross_Talk to your data with Knowledge-enriched Generative AI.ppt...
Deloitte+RedCross_Talk to your data with Knowledge-enriched Generative AI.ppt...Deloitte+RedCross_Talk to your data with Knowledge-enriched Generative AI.ppt...
Deloitte+RedCross_Talk to your data with Knowledge-enriched Generative AI.ppt...Neo4j
 
Brain Tumor Detection with Machine Learning.pptx
Brain Tumor Detection with Machine Learning.pptxBrain Tumor Detection with Machine Learning.pptx
Brain Tumor Detection with Machine Learning.pptxShammiRai3
 
Enabling GenAI Breakthroughs with Knowledge Graphs
Enabling GenAI Breakthroughs with Knowledge GraphsEnabling GenAI Breakthroughs with Knowledge Graphs
Enabling GenAI Breakthroughs with Knowledge GraphsNeo4j
 
Báo cáo Social Media Benchmark 2024 cho dân Marketing
Báo cáo Social Media Benchmark 2024 cho dân MarketingBáo cáo Social Media Benchmark 2024 cho dân Marketing
Báo cáo Social Media Benchmark 2024 cho dân MarketingMarketingTrips
 
How to Build an Experimentation Culture for Data-Driven Product Development
How to Build an Experimentation Culture for Data-Driven Product DevelopmentHow to Build an Experimentation Culture for Data-Driven Product Development
How to Build an Experimentation Culture for Data-Driven Product DevelopmentAggregage
 

Recently uploaded (20)

Stochastic Dynamic Programming and You.pptx
Stochastic Dynamic Programming and You.pptxStochastic Dynamic Programming and You.pptx
Stochastic Dynamic Programming and You.pptx
 
The market for cross-border mortgages in Europe
The market for cross-border mortgages in EuropeThe market for cross-border mortgages in Europe
The market for cross-border mortgages in Europe
 
Bengaluru Tableau UG event- 2nd March 2024 Q1
Bengaluru Tableau UG event- 2nd March 2024 Q1Bengaluru Tableau UG event- 2nd March 2024 Q1
Bengaluru Tableau UG event- 2nd March 2024 Q1
 
Empowering Decisions A Guide to Embedded Analytics
Empowering Decisions A Guide to Embedded AnalyticsEmpowering Decisions A Guide to Embedded Analytics
Empowering Decisions A Guide to Embedded Analytics
 
STOCK PRICE ANALYSIS Furkan Ali TASCI --.pptx
STOCK PRICE ANALYSIS  Furkan Ali TASCI --.pptxSTOCK PRICE ANALYSIS  Furkan Ali TASCI --.pptx
STOCK PRICE ANALYSIS Furkan Ali TASCI --.pptx
 
2024 Build Generative AI for Non-Profits
2024 Build Generative AI for Non-Profits2024 Build Generative AI for Non-Profits
2024 Build Generative AI for Non-Profits
 
Neo4j Graph Summit 2024 Workshop - EMEA - Breda_and_Munchen.pdf
Neo4j Graph Summit 2024 Workshop - EMEA - Breda_and_Munchen.pdfNeo4j Graph Summit 2024 Workshop - EMEA - Breda_and_Munchen.pdf
Neo4j Graph Summit 2024 Workshop - EMEA - Breda_and_Munchen.pdf
 
TCFPro24 Building Real-Time Generative AI Pipelines
TCFPro24 Building Real-Time Generative AI PipelinesTCFPro24 Building Real-Time Generative AI Pipelines
TCFPro24 Building Real-Time Generative AI Pipelines
 
Prediction Of Cryptocurrency Prices Using Lstm, Svm And Polynomial Regression...
Prediction Of Cryptocurrency Prices Using Lstm, Svm And Polynomial Regression...Prediction Of Cryptocurrency Prices Using Lstm, Svm And Polynomial Regression...
Prediction Of Cryptocurrency Prices Using Lstm, Svm And Polynomial Regression...
 
Microeconomic Group Presentation Apple.pdf
Microeconomic Group Presentation Apple.pdfMicroeconomic Group Presentation Apple.pdf
Microeconomic Group Presentation Apple.pdf
 
Understanding the Impact of video length on student performance
Understanding the Impact of video length on student performanceUnderstanding the Impact of video length on student performance
Understanding the Impact of video length on student performance
 
Data Collection from Social Media Platforms
Data Collection from Social Media PlatformsData Collection from Social Media Platforms
Data Collection from Social Media Platforms
 
STOCK PRICE ANALYSIS Furkan Ali TASCI --.pptx
STOCK PRICE ANALYSIS  Furkan Ali TASCI --.pptxSTOCK PRICE ANALYSIS  Furkan Ali TASCI --.pptx
STOCK PRICE ANALYSIS Furkan Ali TASCI --.pptx
 
Unleashing Datas Potential - Mastering Precision with FCO-IM
Unleashing Datas Potential - Mastering Precision with FCO-IMUnleashing Datas Potential - Mastering Precision with FCO-IM
Unleashing Datas Potential - Mastering Precision with FCO-IM
 
Deloitte+RedCross_Talk to your data with Knowledge-enriched Generative AI.ppt...
Deloitte+RedCross_Talk to your data with Knowledge-enriched Generative AI.ppt...Deloitte+RedCross_Talk to your data with Knowledge-enriched Generative AI.ppt...
Deloitte+RedCross_Talk to your data with Knowledge-enriched Generative AI.ppt...
 
Brain Tumor Detection with Machine Learning.pptx
Brain Tumor Detection with Machine Learning.pptxBrain Tumor Detection with Machine Learning.pptx
Brain Tumor Detection with Machine Learning.pptx
 
Enabling GenAI Breakthroughs with Knowledge Graphs
Enabling GenAI Breakthroughs with Knowledge GraphsEnabling GenAI Breakthroughs with Knowledge Graphs
Enabling GenAI Breakthroughs with Knowledge Graphs
 
Báo cáo Social Media Benchmark 2024 cho dân Marketing
Báo cáo Social Media Benchmark 2024 cho dân MarketingBáo cáo Social Media Benchmark 2024 cho dân Marketing
Báo cáo Social Media Benchmark 2024 cho dân Marketing
 
How to Build an Experimentation Culture for Data-Driven Product Development
How to Build an Experimentation Culture for Data-Driven Product DevelopmentHow to Build an Experimentation Culture for Data-Driven Product Development
How to Build an Experimentation Culture for Data-Driven Product Development
 
Target_Company_Data_breach_2013_110million
Target_Company_Data_breach_2013_110millionTarget_Company_Data_breach_2013_110million
Target_Company_Data_breach_2013_110million
 

Data Science For Social Scientists Workshop

  • 1. With Dr Ian Hopkinson Data Science for Social Scientists LJMU 2014-09-12 ian@scraperwiki.com
  • 2. Aims ●Explain “Data Science” and “Big Data” ●Show some tools ●Show some examples ●Take home ●New methodology ●Plan a project Data Science for Social Scientists With Dr Ian Hopkinson from ScraperWiki
  • 3. Overview ●Introductions ~15 minutes ●What is Data Science?~40 minutes ●What is Big Data?~20 minutes ●Coffee Break~15 minutes ●Case Studies~90 minutes ●Discussion~30 minutes Data Science for Social Scientists With Dr Ian Hopkinson from ScraperWiki
  • 4. Background ●Background in physics ●Computer fiddler for many years ●Academic at Cambridge and UMIST 8 years ●Unilever PLC (large FMCG) for 8 years ●Lots of experience with all sorts of data Data Science for Social Scientists With Dr Ian Hopkinson from ScraperWiki
  • 5. Mindset ●What can I do with this data? ●Is there a plot I can do with other data? ●Can I make a map? ●Just how many bottles are there in the Science Museum collection? ●How do I automate this? Data Science for Social Scientists With Dr Ian Hopkinson from ScraperWiki
  • 6. How do you work? Data Science for Social Scientists With Dr Ian Hopkinson from ScraperWiki
  • 7. What is Data Science? Data Science for Social Scientists With Dr Ian Hopkinson from ScraperWiki
  • 8. Data science Data Science for Social Scientists With Dr Ian Hopkinson from ScraperWiki
  • 9. Classification of data analysts Data Science for Social Scientists With Dr Ian Hopkinson from ScraperWiki Source: Enterprise Data Analysis and Visualization: An Interview Study
  • 10. Tools ●Excel –analysis, processing, visualisation ●Tableau (Public) –better visualisation ●Python –a programming language ●Gephi–network visualisation ●R -statistics ●Databases (SQLite, MySQL, Postgresql) ●… Data Science for Social Scientists With Dr Ian Hopkinson from ScraperWiki
  • 11. Data Science –what does it look like? Data Science for Social Scientists With Dr Ian Hopkinson from ScraperWiki Discover Wrangle Profile Model Report Workflow
  • 12. Wrangling –joining and cleaning data Data Science for Social Scientists With Dr Ian Hopkinson from ScraperWiki ●Manual ●Dictionary ●Substitutions ●Geocoding ●NewsReader ●Natural language processing ●Face recognition ●…
  • 13. Data Science in the wild… ●Google Flu Trends, credit scoring, recommendation systems, fleet management, search engines, price comparisons, Google Translate… ●Commercially: all about prediction Data Science for Social Scientists With Dr Ian Hopkinson from ScraperWiki
  • 14. Data Science ●Statistics ●Machine Learning ●Natural Language Processing ●Computer Vision ●Data visualisation Data Science for Social Scientists With Dr Ian Hopkinson from ScraperWiki
  • 15. Machine Learning Data Science for Social Scientists With Dr Ian Hopkinson from ScraperWiki ●Classification ●“Supervised” –training set ●“Unsupervised” –no training set ●Algorithms ●Naïve Bayesian, Logistic regression, Support vector machines, decision trees…
  • 16. Machine Learning Data Science for Social Scientists With Dr Ian Hopkinson from ScraperWiki Source: scikit-learn, Supervised learning
  • 17. Classifier evaluation Data Science for Social Scientists With Dr Ian Hopkinson from ScraperWiki Source: Understanding uncertainty, visualising probabilities TP FN FP = 9/(9 + 99) = 8.3% TN = 9/(9 + 1) = 90%
  • 18. Natural Language Processing Data Science for Social Scientists With Dr Ian Hopkinson from ScraperWiki ●Parts of speech ●Named entity recognition ●Sentiment analysis ●Summarisation ●Document search
  • 19. Computer vision Data Science for Social Scientists With Dr Ian Hopkinson from ScraperWiki ●Face / object recognition ●Image forensics ●Camera poise
  • 20. What is Big Data? Data Science for Social Scientists With Dr Ian Hopkinson from ScraperWiki
  • 21. Data Science for Social Scientists With Dr Ian Hopkinson from ScraperWiki Big Data ●Velocity, value, variability… Source: Doug Laney at Gartner ●N = all, messy, correlation not causation Source: Big Data by Mayer-Schönbergerand Cukier ●Thirty other definitions… Source: What is Big Data?
  • 22. Data Science for Social Scientists With Dr Ian Hopkinson from ScraperWiki Source: Analyzing the Analyzers
  • 23. Data Science for Social Scientists With Dr Ian Hopkinson from ScraperWiki Big Data –my view ●Lots of data, collected almost casually ●Cloud storage and processing ●Computing frameworks ●Data mining algorithms, visualisation
  • 24. Data Science for Social Scientists With Dr Ian Hopkinson from ScraperWiki Hadoop Ecosystem Source: ADASTRA
  • 25. ●Everyone talks about it, ●nobody really knows how to do it, ●everyone thinks everyone else is doing it, ●so everyone claims they are doing it too. Data Science for Social Scientists With Dr Ian Hopkinson from ScraperWiki Big Data is like teenage sex…
  • 26. What is your data? Data Science for Social Scientists With Dr Ian Hopkinson from ScraperWiki
  • 27. Case Studies Data Science for Social Scientists With Dr Ian Hopkinson from ScraperWiki
  • 28. Case Studies ●BIG Lottery ●MOT data ●NewsReader ●Inspiring Women Data Science for Social Scientists With Dr Ian Hopkinson from ScraperWiki ●London Underground ●Machine learning ●Face Recognition ●UN democracy ●Google Ngram
  • 29. Google Ngram ●How does the popularity of scientists vary over time? ●Ngrams(frequency of word combinations) ●6 million digitized books ●Big Data in the wild Data Science for Social Scientists With Dr Ian Hopkinson from ScraperWiki Source: Google Ngram, the data
  • 30. Data Science for Social Scientists With Dr Ian Hopkinson from ScraperWiki William Thomson aka Lord Kelvin
  • 31. Data Science for Social Scientists With Dr Ian Hopkinson from ScraperWiki An assortment
  • 32. Data Science for Social Scientists With Dr Ian Hopkinson from ScraperWiki Isaac Newton and Albert Einstein
  • 33. Google Ngram-lessons ●Huge dataset drives the site (1TB for just two corpuses) ●No book metadata Data Science for Social Scientists With Dr Ian Hopkinson from ScraperWiki
  • 34. BIG Lottery ●Where is lottery money allocated? ●BIG Lottery data ●ONS population data ●Sport data ●Fun with natural language processing Data Science for Social Scientists With Dr Ian Hopkinson from ScraperWiki Visualisation and blog: BIG Lottery
  • 35. BIG Lottery Data Science for Social Scientists With Dr Ian Hopkinson from ScraperWiki Visualisation and blog: BIG Lottery
  • 36. BIG Lottery Data Science for Social Scientists With Dr Ian Hopkinson from ScraperWiki Visualisation and blog: BIG Lottery
  • 37. BIG Lottery Data Science for Social Scientists With Dr Ian Hopkinson from ScraperWiki Visualisation and blog: BIG Lottery
  • 38. BIG Lottery Data Science for Social Scientists With Dr Ian Hopkinson from ScraperWiki Visualisation and blog: BIG Lottery
  • 39. NLP demo Data Science for Social Scientists With Dr Ian Hopkinson from ScraperWiki
  • 40. BIG Lottery Data Science for Social Scientists With Dr Ian Hopkinson from ScraperWiki Visualisation and blog: BIG Lottery
  • 41. BIG Lottery Data Science for Social Scientists With Dr Ian Hopkinson from ScraperWiki Visualisation and blog: BIG Lottery
  • 42. BIG Lottery -lessons ●Initial (geographic) analysis didn’t work as expected ●Revealed some organisational history Data Science for Social Scientists With Dr Ian Hopkinson from ScraperWiki Visualisation and blog: BIG Lottery
  • 43. NewsReader World Cup Hack Day ●How can we explore and understand huge volumes of news articles? ●300,000 news articles on the World Cup ●Cutting edge NLP/semantic web ●Making an API Data Science for Social Scientists With Dr Ian Hopkinson from ScraperWiki Visualisation and blog: NewsReader
  • 44. NewsReader World Cup Hack Day Data Science for Social Scientists With Dr Ian Hopkinson from ScraperWiki Visualisation and blog: NewsReader
  • 45. NewsReader World Cup Hack Day Data Science for Social Scientists With Dr Ian Hopkinson from ScraperWiki Visualisation and blog: NewsReader
  • 46. NewsReader World Cup Hack Day Data Science for Social Scientists With Dr Ian Hopkinson from ScraperWiki Visualisation: NewsReader
  • 47. NewsReader Demo Data Science for Social Scientists With Dr Ian Hopkinson from ScraperWiki https://newsreader.scraperwiki.com/
  • 48. GephiDemo Data Science for Social Scientists With Dr Ian Hopkinson from ScraperWiki
  • 49. NewsReader -lessons ●Proper research project –this is actually hard stuff! Data Science for Social Scientists With Dr Ian Hopkinson from ScraperWiki
  • 50. #InspiringWomenon twitter ●Who are the #InspiringWomen? ●40,000 Tweets from a hashtag ●Tableau for quick and easy visualisation Data Science for Social Scientists With Dr Ian Hopkinson from ScraperWiki Visualisation and blog: InspiringWomen
  • 51. #InspiringWomenon Twitter Data Science for Social Scientists With Dr Ian Hopkinson from ScraperWiki Visualisation and blog: InspiringWomen
  • 52. #InspiringWomenon Twitter Data Science for Social Scientists With Dr Ian Hopkinson from ScraperWiki Top 5 retweets 1.Emma Watson 2.Ada Lovelace 3.Delia Derbyshire 4.JK Rowling 5.HedyLamarr Visualisation and blog: InspiringWomen
  • 53. Twitter Demo Data Science for Social Scientists With Dr Ian Hopkinson from ScraperWiki
  • 54. #InspiringWomen-lessons ●Spamming on twitter ●Dynamics of retweeting Data Science for Social Scientists With Dr Ian Hopkinson from ScraperWiki Visualisation and blog: InspiringWomen
  • 55. MOT Data ●Are some makes of car prone to particular faults at MOT? ●Data on every single MOT test ●Handling a large dataset ●180,000,000 data points, 20GB Data Science for Social Scientists With Dr Ian Hopkinson from ScraperWiki
  • 56. SQL Demo Data Science for Social Scientists With Dr Ian Hopkinson from ScraperWiki
  • 57. MOT Data Data Science for Social Scientists With Dr Ian Hopkinson from ScraperWiki
  • 58. MOT Data Data Science for Social Scientists With Dr Ian Hopkinson from ScraperWiki
  • 59. MOT Data Data Science for Social Scientists With Dr Ian Hopkinson from ScraperWiki
  • 60. MOT Data ●Easy to get lost in a huge dataset ●Even sharing the derived data can be difficult Data Science for Social Scientists With Dr Ian Hopkinson from ScraperWiki
  • 61. London Underground Visualisation ●How do passenger numbers vary across the London Underground? ●Wikipedia ●Openstreetmap ●London Transport ●Melding/tidying data from disparate sources ●Fancy visualisation Data Science for Social Scientists With Dr Ian Hopkinson from ScraperWiki Visualisation and blog: London Underground
  • 62. London Underground Visualisation Data Science for Social Scientists With Dr Ian Hopkinson from ScraperWiki Visualisation and blog: London Underground
  • 63. London Underground –can I walk it? Data Science for Social Scientists With Dr Ian Hopkinson from ScraperWiki Visualisation and blog: London Underground –can I walk it?
  • 64. London Underground -lessons ●Fixed a problem with Table Xtract ●Whizzy visualisations can have a big impact Data Science for Social Scientists With Dr Ian Hopkinson from ScraperWiki Visualisation and blog: London Underground
  • 65. Twitter Machine Learning ●ScraperWikistwitter followers ●Which websites are businesses? ●1,000 website URLs ●Machine learning to classify websites Data Science for Social Scientists With Dr Ian Hopkinson from ScraperWiki Blog: Twitter Machine Learning
  • 66. Twitter Machine Learning Data Science for Social Scientists With Dr Ian Hopkinson from ScraperWiki Blog: Twitter Machine Learning
  • 67. Twitter Machine Learning –lift curve Data Science for Social Scientists With Dr Ian Hopkinson from ScraperWiki Blog: Twitter Machine Learning
  • 68. Twitter Machine Learning -lessons ●Scikit-learn libraries make this easy ●Try out different algorithms –it’s cheap ●Getting the training set the key problem Data Science for Social Scientists With Dr Ian Hopkinson from ScraperWiki Blog: Twitter Machine Learning
  • 69. Face recognition ●What are the demographics of twitter users? ●Followers of @ScraperWiki ●Face recognition, online services Data Science for Social Scientists With Dr Ian Hopkinson from ScraperWiki Blog: Face ReKognition
  • 70. Face recognition Data Science for Social Scientists With Dr Ian Hopkinson from ScraperWiki Blog: Face ReKognition
  • 71. Face recognition Data Science for Social Scientists With Dr Ian Hopkinson from ScraperWiki
  • 72. Face recognition Data Science for Social Scientists With Dr Ian Hopkinson from ScraperWiki Blog: Face ReKognition
  • 73. Face recognition Data Science for Social Scientists With Dr Ian Hopkinson from ScraperWiki Blog: Face ReKognition
  • 74. Face Recognition -lessons ●Complex analysis as a service ●Works reasonably well Data Science for Social Scientists With Dr Ian Hopkinson from ScraperWiki Blog: Face ReKognition
  • 75. UN Democracy ●Improving access to the UN verbatim proceedings ●Processing PDF Data Science for Social Scientists With Dr Ian Hopkinson from ScraperWiki
  • 76. UN Democracy Data Science for Social Scientists With Dr Ian Hopkinson from ScraperWiki
  • 77. UN Democracy Data Science for Social Scientists With Dr Ian Hopkinson from ScraperWiki
  • 78. UN Democracy -lessons ●Extracting data from websites can be hard ●PDF is an important resource –general extraction mechanisms are hard Data Science for Social Scientists With Dr Ian Hopkinson from ScraperWiki
  • 79. Digital Humanities on the web Data Science for Social Scientists With Dr Ian Hopkinson from ScraperWiki Source: @Sharon_Howard
  • 80. Digital Humanities on the web (1) ●Digging into Data Challenge ●Nineteeth Century Scholarship Online ●Eighteenth Century Scholarship Online ●Medieval Electronic Scholarly Alliance ●The Great Parchment Book Data Science for Social Scientists With Dr Ian Hopkinson from ScraperWiki Source: @Sharon_Howard
  • 81. Digital Humanities on the web (2) ●Quantifying Kissinger ●Swansea –City Witness ●Circulation of Knowledge and Learned Practices in the 17th-century Dutch Republic Data Science for Social Scientists With Dr Ian Hopkinson from ScraperWiki Source: @Sharon_Howard
  • 82. Digital Humanities on the web (3) ●City and Region -Urban and Agricultural Rent in England, 1400-1914. ●The Proceedings of the Old Bailey 1674- 1913 ●Mapping the republic of letters Data Science for Social Scientists With Dr Ian Hopkinson from ScraperWiki Source: @Sharon_Howard
  • 83. Bibliography ●Natural Language Processing with Pythonby Steven Bird, Ewan Klein & Edward Loper ●Mining the Social Webby Matthew A. Russell ●Data Mining –Practical Machine Learning Tools and Techniquesby Witten, Frank and Hall ●Data Science for Business Foster Provost and Tom Fawcett ●Big Data by Viktor Mayer-Schönbergerand Kenneth Cukier ●…more book reviews at ScraperWiki Data Science for Social Scientists With Dr Ian Hopkinson from ScraperWiki
  • 84. Aims ●Explain “Data Science” and “Big Data” ●Show some tools ●Show some examples ●Take home ●New methodology ●Plan a project Data Science for Social Scientists With Dr Ian Hopkinson from ScraperWiki
  • 85. Thank You! Please fill in survey Data Science for Social Scientists With Dr Ian Hopkinson from ScraperWiki ian@scraperwiki.com