SlideShare a Scribd company logo
1 of 32
R in the Humanities: Text Analysis (2022)
Dr Leah Henrickson
Lecturer in Digital Media
School of Media and Communication
University of Leeds
L.R.Henrickson@leeds.ac.uk
twitter.com/leahhenrickson
Who am I?
• Lecturer in Digital Media
• Programme Leader, MA New Media
• Book historian
• Digital humanist
• Canadian 🍁
L.R.Henrickson@leeds.ac.uk
twitter.com/leahhenrickson
Publication in the next issue of Victorian Review: ‘Tangling and Untangling the Trollopes’, with Eleanor Dumbill
Session 1:
Gettin’ to Grips with R
CC Image: https://www.pexels.com/photo/smiling-model-in-pirate-costume-with-smoking-pipe-7000092
Overview
This course is a gentle introduction to R for text analysis. Over the course of two sessions you will be taught the basics of the
powerful programming language before being provided with hands-on experience analysing long-form text in the RStudio
development environment.
By the end of the course, you will be able to:
• Navigate the RStudio development environment
• Prepare long-form prose texts for computational analysis using R
• Conduct basic computational analyses of long-form prose texts
• Construct and explain visualisations of computed results
• Critically apply computational text analysis to complement other analytical methods
To complete this course you will need to install:
• R version 3.6 or higher (download at https://www.r-project.org)
• RStudio Desktop: Open Source Edition 1.2 or higher (download at https://www.rstudio.com/products/rstudio)
Session 1 Agenda
1. What are R and RStudio?
2. What can R help you do?
3. A quick note about Computational Literary Studies
4. Getting started with R
5. Cleaning text
CC Image: https://www.pexels.com/photo/black-cat-holding-persons-arm-1049764
What are R and RStudio?
R is:
• a programming language
• a software environment
• a really fancy calculator
• free/open source
Download: https://cran.r-project.org/mirrors.html
RStudio is:
• an integrated development environment (IDE)
• a great way to make your coding experiences easier, more colourful,
and more fun!
Download: https://www.rstudio.com/products/rstudio/download
What can R help you do?
• Count words
• Find linguistic patterns within and across texts
• Compare texts
• Make pretty pictures
But it’s still up to you to explain results.
Also, is R always the most appropriate tool?
CC Image: https://pixabay.com/photos/letters-tiles-word-game-crossword-4938486
A quick note about Computational Literary
Studies (CLS)
CLS has a long history (for example, Father Robert Busa, ~1940s),
but has been criticised for:
• Misinterpretation of statistical data (Da)
• Unchecked enthusiasm for technological ‘hype’ (Kirsch)
• Turning literature into data and neglecting reception of works
(Marche)
Da, Nan Z. “The Computational Case against Computational Literary Studies.” Critical Inquiry, vol. 45, 2019,
pp. 601-639.
Kirsch, Adam. “Technology Is Taking Over English Departments.” The New Republic, 2014,
https://newrepublic.com/article/117428/limits-digital-humanities-adam-kirsch. Accessed 21 December 2020.
Marche, Stephen. “Literature Is not Data: Against Digital Humanities.” The Los Angeles Review of Books,
2012, https://lareviewofbooks.org/article/literature-is-not-data-against-digital-humanities. Accessed 21
December 2020.
CC Image: https://melissaterras.org/2013/10/15/for-ada-lovelace-day-father-busas-female-punch-card-operatives
Let’s get started!
Double click ‘Terminal’.
Terminal (write your script)
Console (run your script)
Environment (your data)
Everything else!
The Basics (1/2)
Calculating
• 10 + 2 (spaces optional)
• 10 – 2
• 10 * 2
• 10 / 2
Strings and Things
• 1:50
• print(“Hello world!”)
• [variable name] <- c(1, 2, 3)
• [variable name][2]
Meme: https://knowyourmeme.com/memes/math-lady-confused-lady
The Basics (2/2)
• Data types: character, numeric, integer, logical, complex
• Data structures: vector, list, matrix, data frame, factors
• Keep notes using #
• Need help?
• ?____________
• help()
• install.packages(“[name of package]”)
Meme: https://www.reddit.com/r/ProgrammerHumor/comments/8w54mx/code_comments_be_like
Tools > Global Options >
Appearance
(You will need to restart
RStudio to apply these
changes).
Let’s clean some text!
CC Image: https://thenounproject.com/term/cleaning/199037
You can use whatever corpus you’d like for this course.
However, I have prepared a corpus of twelve texts for you. You may download the corpus at http://tinyurl.com/n8texts.
This corpus includes six public domain texts comprising the first months of Astounding Stories of Super-Science (1930). A full
corpus for the year is available at http://tinyurl.com/n8texts2, if you’d like to use it in your own time.
• astoundingjan1930: https://www.gutenberg.org/ebooks/41481
• astoundingfeb1930: https://www.gutenberg.org/ebooks/28617
• astoundingmar1930: https://www.gutenberg.org/ebooks/29607
• astoundingapr1930: https://www.gutenberg.org/ebooks/29390
• astoundingmay1930: https://www.gutenberg.org/ebooks/29809
• astoundingjun1930: https://www.gutenberg.org/ebooks/29848
• astoundingjul1930: https://www.gutenberg.org/ebooks/29198
• astoundingaug1930: https://www.gutenberg.org/ebooks/29768
• astoundingsep1930: https://www.gutenberg.org/ebooks/29255
• astoundingoct1930: https://www.gutenberg.org/ebooks/29882
• astoundingnov1930: https://www.gutenberg.org/ebooks/29919
• astoundingdec1930: https://www.gutenberg.org/ebooks/30691
First, set your working directory: Session > Set Working Directory > Choose Directory > [folder]
install.packages(“tm”)
library(tm)
getwd()
texts <- Corpus(DirSource(“[path to working directory]”)
writeLines(as.character(texts[[4]])
?tm_map
getTransformations()
texts1 <- tm_map(texts, removePunctuation)
texts2 <- tm_map(texts1, removeNumbers)
texts3 <- tm_map(texts2, content_transformer(tolower))
texts4 <- tm_map(texts3, removeWords, stopwords(“english”))
texts_final <- tm_map(texts4, stripWhitespace)
writeLines(as.character(texts_final[[4]])
dtm <- DocumentTermMatrix(texts_final) + use inspect() to take a look!
Help me! (1/3)
R Communities
#rstats (Twitter): https://twitter.com/hashtag/rstats
Forwards: https://forwards.github.io
R-Bloggers: https://www.r-bloggers.com
R-Ladies: https://rladies.org
r/rstats: https://www.reddit.com/r/rstats
RStudio Community: https://community.rstudio.com
Stack Overflow: https://stackoverflow.com/questions/tagged/r
Help me! (2/3)
R Resources
Matthew Jockers, Text Analysis with R for Students of Literature (New York: Springer, 2014)
https://www.matthewjockers.net/text-analysis-with-r-for-students-of-literature/
LinkedIn Learning: R: https://www.linkedin.com/learning/topics/r
Emmanuel Paradis, R for Beginners (2005): https://cran.r-project.org/doc/contrib/Paradis-rdebuts_en.pdf
Emma Rand, ‘Reproducible Analyses in R’, N8 CIR (2020): https://n8cir.org.uk/events/event-resource/analyses-r
W. N. Venables, D. M. Smith, and the R Core Team, An Introduction to R (2021): https://cran.r-project.org/doc/manuals/r-
release/R-intro.pdf
Help me! (3/3)
R Packages for Text Analysis
corpustools (tokenised text analysis): https://cran.r-project.org/web/packages/corpustools
gutenbergr (searching/downloading Project Gutenberg): https://cran.r-project.org/web/packages/gutenbergr
quanteda (quantitative text analysis): https://cran.r-project.org/web/packages/quanteda/index.html
stylo (stylometry): https://cran.r-project.org/web/packages/stylo
syuzhet (sentiment analysis): https://cran.r-project.org/web/packages/syuzhet/index.html
tidytext (a bit of everything!): https://cran.r-project.org/web/packages/tidytext
tm (text mining – what we’ve done here): https://cran.r-project.org/web/packages/tm/index.html
Session 2:
Charts, Clouds, and Confidence
Image: https://pixabay.com/illustrations/rainbow-cloud-sunset-colorful-sky-5389074/
Session 2 Agenda
1. Any questions from last week?
2. Review of last week’s session (i.e. cleaning text)
3. Counting words
4. Plotting results
5. Making word clouds
6. Wrapping up
CC Images: https://thenounproject.com/term/graph/21394; https://thenounproject.com/term/word-cloud/195993
First, set your working directory: Session > Set Working Directory > Choose Directory > [folder]
install.packages(“tm”)
library(tm)
getwd()
texts <- Corpus(DirSource(“[path to working directory]”)
writeLines(as.character(texts[[4]])
?tm_map
getTransformations()
texts1 <- tm_map(texts, removePunctuation)
texts2 <- tm_map(texts1, removeNumbers)
texts3 <- tm_map(texts2, content_transformer(tolower))
texts4 <- tm_map(texts3, removeWords, stopwords(“english”))
texts_final <- tm_map(texts4, stripWhitespace)
writeLines(as.character(texts_final[[4]])
dtm <- DocumentTermMatrix(texts_final) + use inspect() to take a look!
Getting word frequencies and associations:
freq <- colSums(as.matrix(dtm))
freq[1:10]
freq_d <- sort(freq, decreasing=TRUE)
freq_d[1:10]
findFreqTerms(dtm, lowfreq=100)
findAssocs(dtm, “man", 0.95)
?findAssocs
Making a bar chart (and then making it look nice):
barplot(freq_d[1:10])
?barplot
install.packages("RColorBrewer")
library(RColorBrewer)
?RColorBrewer
display.brewer.all|)
cols <- brewer.pal(8, “Paired")
barplot(freq_d[1:10], col=cols, main="My Cool Plot", xlab="Word", ylab="Instances")
Making a word cloud (and then making it look nice):
install.packages("wordcloud")
library(wordcloud)
matrix <- as.matrix(dtm)
wordbank <- sort(colSums(matrix), decreasing=TRUE)
df <- data.frame(words=names(wordbank), freq=wordbank)
?data.frame
?wordcloud
wordcloud(words=df$words, freq=df$freq, max.words=100, random.order=FALSE, col=cols)
Discussion:
What are the potentials?
What are the limitations?
Is R the best choice?
CC Image: https://www.pexels.com/photo/selective-focus-photography-of-traffic-light-1616781
Help me! (1/3)
R Communities
#rstats (Twitter): https://twitter.com/hashtag/rstats
Forwards: https://forwards.github.io
R-Bloggers: https://www.r-bloggers.com
R-Ladies: https://rladies.org
r/rstats: https://www.reddit.com/r/rstats
RStudio Community: https://community.rstudio.com
Stack Overflow: https://stackoverflow.com/questions/tagged/r
Help me! (2/3)
R Resources
Matthew Jockers, Text Analysis with R for Students of Literature (New York: Springer, 2014)
https://www.matthewjockers.net/text-analysis-with-r-for-students-of-literature/
LinkedIn Learning: R: https://www.linkedin.com/learning/topics/r
Emmanuel Paradis, R for Beginners (2005): https://cran.r-project.org/doc/contrib/Paradis-rdebuts_en.pdf
Emma Rand, ‘Reproducible Analyses in R’, N8 CIR (2020): https://n8cir.org.uk/events/event-resource/analyses-r
W. N. Venables, D. M. Smith, and the R Core Team, An Introduction to R (2021): https://cran.r-project.org/doc/manuals/r-
release/R-intro.pdf
Help me! (3/3)
R Packages for Text Analysis
corpustools (tokenised text analysis): https://cran.r-project.org/web/packages/corpustools
gutenbergr (searching/downloading Project Gutenberg): https://cran.r-project.org/web/packages/gutenbergr
quanteda (quantitative text analysis): https://cran.r-project.org/web/packages/quanteda/index.html
stylo (stylometry): https://cran.r-project.org/web/packages/stylo
syuzhet (sentiment analysis): https://cran.r-project.org/web/packages/syuzhet/index.html
tidytext (a bit of everything!): https://cran.r-project.org/web/packages/tidytext
tm (text mining – what we’ve done here): https://cran.r-project.org/web/packages/tm/index.html
Thank you!
Dr Leah Henrickson
Lecturer in Digital Media
School of Media and Communication
University of Leeds
L.R.Henrickson@leeds.ac.uk
twitter.com/leahhenrickson

More Related Content

What's hot

Wimmics Overview 2021
Wimmics Overview 2021Wimmics Overview 2021
Wimmics Overview 2021Fabien Gandon
 
Digital Libraries, Digital Archives, Digital Humanities, Digital Scholarship:...
Digital Libraries, Digital Archives, Digital Humanities, Digital Scholarship:...Digital Libraries, Digital Archives, Digital Humanities, Digital Scholarship:...
Digital Libraries, Digital Archives, Digital Humanities, Digital Scholarship:...Jenn Riley
 
ESWC2015 opening ceremony
ESWC2015 opening ceremonyESWC2015 opening ceremony
ESWC2015 opening ceremonyFabien Gandon
 
Ariadne's Thread -- Exploring a world of networked information built from fre...
Ariadne's Thread -- Exploring a world of networked information built from fre...Ariadne's Thread -- Exploring a world of networked information built from fre...
Ariadne's Thread -- Exploring a world of networked information built from fre...Shenghui Wang
 
MA in Digital Humanities
MA in Digital Humanities MA in Digital Humanities
MA in Digital Humanities Paul Spence
 
Towards greater transparency in digital literary analysis
Towards greater transparency in digital literary analysisTowards greater transparency in digital literary analysis
Towards greater transparency in digital literary analysisJohn Lavagnino
 
Demystifying Digital Humanities: Winter 2014 Workshop #2: Programming on the ...
Demystifying Digital Humanities: Winter 2014 Workshop #2: Programming on the ...Demystifying Digital Humanities: Winter 2014 Workshop #2: Programming on the ...
Demystifying Digital Humanities: Winter 2014 Workshop #2: Programming on the ...Paige Morgan
 
The Challenge of Digital Sources in the Web Age: Common Tensions Across Three...
The Challenge of Digital Sources in the Web Age: Common Tensions Across Three...The Challenge of Digital Sources in the Web Age: Common Tensions Across Three...
The Challenge of Digital Sources in the Web Age: Common Tensions Across Three...Digital History
 

What's hot (11)

Wimmics Overview 2021
Wimmics Overview 2021Wimmics Overview 2021
Wimmics Overview 2021
 
Granada0611 digital humanities
Granada0611 digital humanitiesGranada0611 digital humanities
Granada0611 digital humanities
 
Digital Libraries, Digital Archives, Digital Humanities, Digital Scholarship:...
Digital Libraries, Digital Archives, Digital Humanities, Digital Scholarship:...Digital Libraries, Digital Archives, Digital Humanities, Digital Scholarship:...
Digital Libraries, Digital Archives, Digital Humanities, Digital Scholarship:...
 
ESWC2015 opening ceremony
ESWC2015 opening ceremonyESWC2015 opening ceremony
ESWC2015 opening ceremony
 
Ariadne's Thread -- Exploring a world of networked information built from fre...
Ariadne's Thread -- Exploring a world of networked information built from fre...Ariadne's Thread -- Exploring a world of networked information built from fre...
Ariadne's Thread -- Exploring a world of networked information built from fre...
 
MA in Digital Humanities
MA in Digital Humanities MA in Digital Humanities
MA in Digital Humanities
 
EricRochesterResume
EricRochesterResumeEricRochesterResume
EricRochesterResume
 
Towards greater transparency in digital literary analysis
Towards greater transparency in digital literary analysisTowards greater transparency in digital literary analysis
Towards greater transparency in digital literary analysis
 
Demystifying Digital Humanities: Winter 2014 Workshop #2: Programming on the ...
Demystifying Digital Humanities: Winter 2014 Workshop #2: Programming on the ...Demystifying Digital Humanities: Winter 2014 Workshop #2: Programming on the ...
Demystifying Digital Humanities: Winter 2014 Workshop #2: Programming on the ...
 
The Challenge of Digital Sources in the Web Age: Common Tensions Across Three...
The Challenge of Digital Sources in the Web Age: Common Tensions Across Three...The Challenge of Digital Sources in the Web Age: Common Tensions Across Three...
The Challenge of Digital Sources in the Web Age: Common Tensions Across Three...
 
Sattose talk
Sattose talkSattose talk
Sattose talk
 

Similar to R in the Humanities: Text Analysis and Visualization

N8_R_for_Text_Analysis_Slides.pptx
N8_R_for_Text_Analysis_Slides.pptxN8_R_for_Text_Analysis_Slides.pptx
N8_R_for_Text_Analysis_Slides.pptxNafisa Vaz
 
Topic Modeling for Learning Analytics Researchers LAK15 Tutorial
Topic Modeling for Learning Analytics Researchers LAK15 TutorialTopic Modeling for Learning Analytics Researchers LAK15 Tutorial
Topic Modeling for Learning Analytics Researchers LAK15 TutorialVitomir Kovanovic
 
Introduction to r
Introduction to rIntroduction to r
Introduction to rgslicraf
 
Research software susainability
Research software susainabilityResearch software susainability
Research software susainabilityDaniel S. Katz
 
Big Data with Modern R & Spark
Big Data with Modern R & SparkBig Data with Modern R & Spark
Big Data with Modern R & SparkXavier de Pedro
 
Big Data Analytics course: Named Entities and Deep Learning for NLP
Big Data Analytics course: Named Entities and Deep Learning for NLPBig Data Analytics course: Named Entities and Deep Learning for NLP
Big Data Analytics course: Named Entities and Deep Learning for NLPChristian Morbidoni
 
Data analysis in R
Data analysis in RData analysis in R
Data analysis in RAndrew Lowe
 
R programming for psychometrics
R programming for psychometricsR programming for psychometrics
R programming for psychometricsDiane Talley
 
Twitter analysis by Kaify Rais
Twitter analysis by Kaify RaisTwitter analysis by Kaify Rais
Twitter analysis by Kaify RaisAjay Ohri
 
Knowledge graph construction with a façade - The SPARQL Anything Project
Knowledge graph construction with a façade - The SPARQL Anything ProjectKnowledge graph construction with a façade - The SPARQL Anything Project
Knowledge graph construction with a façade - The SPARQL Anything ProjectEnrico Daga
 
Up your data game: How to use R to wrangle, analyze, and visualize data faste...
Up your data game: How to use R to wrangle, analyze, and visualize data faste...Up your data game: How to use R to wrangle, analyze, and visualize data faste...
Up your data game: How to use R to wrangle, analyze, and visualize data faste...Charles Guedenet
 
An R primer for SQL folks
An R primer for SQL folksAn R primer for SQL folks
An R primer for SQL folksThomas Hütter
 
2014-10-10-SBC361-Reproducible research
2014-10-10-SBC361-Reproducible research2014-10-10-SBC361-Reproducible research
2014-10-10-SBC361-Reproducible researchYannick Wurm
 
R journal 2011-2
R journal 2011-2R journal 2011-2
R journal 2011-2Ajay Ohri
 
Data visualisation in python tool - a brief
Data visualisation in python tool - a briefData visualisation in python tool - a brief
Data visualisation in python tool - a briefameermalik11
 
A Semantic Multimedia Web (Part 3)
A Semantic Multimedia Web (Part 3)A Semantic Multimedia Web (Part 3)
A Semantic Multimedia Web (Part 3)Raphael Troncy
 

Similar to R in the Humanities: Text Analysis and Visualization (20)

N8_R_for_Text_Analysis_Slides.pptx
N8_R_for_Text_Analysis_Slides.pptxN8_R_for_Text_Analysis_Slides.pptx
N8_R_for_Text_Analysis_Slides.pptx
 
Topic Modeling for Learning Analytics Researchers LAK15 Tutorial
Topic Modeling for Learning Analytics Researchers LAK15 TutorialTopic Modeling for Learning Analytics Researchers LAK15 Tutorial
Topic Modeling for Learning Analytics Researchers LAK15 Tutorial
 
Introduction to r
Introduction to rIntroduction to r
Introduction to r
 
LSESU a Taste of R Language Workshop
LSESU a Taste of R Language WorkshopLSESU a Taste of R Language Workshop
LSESU a Taste of R Language Workshop
 
Research software susainability
Research software susainabilityResearch software susainability
Research software susainability
 
Big Data with Modern R & Spark
Big Data with Modern R & SparkBig Data with Modern R & Spark
Big Data with Modern R & Spark
 
Big Data Analytics course: Named Entities and Deep Learning for NLP
Big Data Analytics course: Named Entities and Deep Learning for NLPBig Data Analytics course: Named Entities and Deep Learning for NLP
Big Data Analytics course: Named Entities and Deep Learning for NLP
 
Digital Humanities Workshop
Digital Humanities WorkshopDigital Humanities Workshop
Digital Humanities Workshop
 
Data analysis in R
Data analysis in RData analysis in R
Data analysis in R
 
R programming for psychometrics
R programming for psychometricsR programming for psychometrics
R programming for psychometrics
 
Twitter analysis by Kaify Rais
Twitter analysis by Kaify RaisTwitter analysis by Kaify Rais
Twitter analysis by Kaify Rais
 
Knowledge graph construction with a façade - The SPARQL Anything Project
Knowledge graph construction with a façade - The SPARQL Anything ProjectKnowledge graph construction with a façade - The SPARQL Anything Project
Knowledge graph construction with a façade - The SPARQL Anything Project
 
Up your data game: How to use R to wrangle, analyze, and visualize data faste...
Up your data game: How to use R to wrangle, analyze, and visualize data faste...Up your data game: How to use R to wrangle, analyze, and visualize data faste...
Up your data game: How to use R to wrangle, analyze, and visualize data faste...
 
An R primer for SQL folks
An R primer for SQL folksAn R primer for SQL folks
An R primer for SQL folks
 
2014-10-10-SBC361-Reproducible research
2014-10-10-SBC361-Reproducible research2014-10-10-SBC361-Reproducible research
2014-10-10-SBC361-Reproducible research
 
Digital humanities
Digital humanitiesDigital humanities
Digital humanities
 
R journal 2011-2
R journal 2011-2R journal 2011-2
R journal 2011-2
 
Data visualisation in python tool - a brief
Data visualisation in python tool - a briefData visualisation in python tool - a brief
Data visualisation in python tool - a brief
 
A Semantic Multimedia Web (Part 3)
A Semantic Multimedia Web (Part 3)A Semantic Multimedia Web (Part 3)
A Semantic Multimedia Web (Part 3)
 
Introduction to R software, by Leire ibaibarriaga
Introduction to R software, by Leire ibaibarriaga Introduction to R software, by Leire ibaibarriaga
Introduction to R software, by Leire ibaibarriaga
 

More from Leah Henrickson

Versions of Intimacy: Talking To and About CarynAI
Versions of Intimacy: Talking To and About CarynAIVersions of Intimacy: Talking To and About CarynAI
Versions of Intimacy: Talking To and About CarynAILeah Henrickson
 
Digital Storytelling for Collaborative Scholarship
Digital Storytelling for Collaborative ScholarshipDigital Storytelling for Collaborative Scholarship
Digital Storytelling for Collaborative ScholarshipLeah Henrickson
 
Deckling the Edges of the Digital: Why Book History Matters for Digital Human...
Deckling the Edges of the Digital: Why Book History Matters for Digital Human...Deckling the Edges of the Digital: Why Book History Matters for Digital Human...
Deckling the Edges of the Digital: Why Book History Matters for Digital Human...Leah Henrickson
 
Between Hermeneutics and Deceit: Keeping Natural Language Generation in Line
Between Hermeneutics and Deceit: Keeping Natural Language Generation in LineBetween Hermeneutics and Deceit: Keeping Natural Language Generation in Line
Between Hermeneutics and Deceit: Keeping Natural Language Generation in LineLeah Henrickson
 
Crafting Wellness: An Introduction to the University of Leeds' 'I Belong: Cre...
Crafting Wellness: An Introduction to the University of Leeds' 'I Belong: Cre...Crafting Wellness: An Introduction to the University of Leeds' 'I Belong: Cre...
Crafting Wellness: An Introduction to the University of Leeds' 'I Belong: Cre...Leah Henrickson
 
Wake Me Up When December Ends: Making Sense of Chatbot 'Authors'
Wake Me Up When December Ends: Making Sense of Chatbot 'Authors'Wake Me Up When December Ends: Making Sense of Chatbot 'Authors'
Wake Me Up When December Ends: Making Sense of Chatbot 'Authors'Leah Henrickson
 
‘Your Differentiating Strength’: Applied Digital Storytelling for Employment ...
‘Your Differentiating Strength’: Applied Digital Storytelling for Employment ...‘Your Differentiating Strength’: Applied Digital Storytelling for Employment ...
‘Your Differentiating Strength’: Applied Digital Storytelling for Employment ...Leah Henrickson
 
Telling Your Story for Effect and Affect
Telling Your Story for Effect and AffectTelling Your Story for Effect and Affect
Telling Your Story for Effect and AffectLeah Henrickson
 
The Procedural Rhetoric of Pedagogy: A Reflection on Teaching Digital Storyte...
The Procedural Rhetoric of Pedagogy: A Reflection on Teaching Digital Storyte...The Procedural Rhetoric of Pedagogy: A Reflection on Teaching Digital Storyte...
The Procedural Rhetoric of Pedagogy: A Reflection on Teaching Digital Storyte...Leah Henrickson
 
Funny, Fake, Freaky, Fascinating?: Making Sense of Computer-Generated Texts
Funny, Fake, Freaky, Fascinating?: Making Sense of Computer-Generated TextsFunny, Fake, Freaky, Fascinating?: Making Sense of Computer-Generated Texts
Funny, Fake, Freaky, Fascinating?: Making Sense of Computer-Generated TextsLeah Henrickson
 
Let's Get Digital, Digital 🎶: Using Digital Humanities to Embrace Data Fuzziness
Let's Get Digital, Digital 🎶: Using Digital Humanities to Embrace Data FuzzinessLet's Get Digital, Digital 🎶: Using Digital Humanities to Embrace Data Fuzziness
Let's Get Digital, Digital 🎶: Using Digital Humanities to Embrace Data FuzzinessLeah Henrickson
 
'Education Espresso: Changing Assessment' Panelist Self-Introduction
'Education Espresso: Changing Assessment' Panelist Self-Introduction'Education Espresso: Changing Assessment' Panelist Self-Introduction
'Education Espresso: Changing Assessment' Panelist Self-IntroductionLeah Henrickson
 
Achieving Success in an Interdisciplinary Team
Achieving Success in an Interdisciplinary TeamAchieving Success in an Interdisciplinary Team
Achieving Success in an Interdisciplinary TeamLeah Henrickson
 
Reading Computer-Generated Books: Artificial Versifying
Reading Computer-Generated Books: Artificial VersifyingReading Computer-Generated Books: Artificial Versifying
Reading Computer-Generated Books: Artificial VersifyingLeah Henrickson
 
Writing AI: Public (Mis)Perceptions of Algorithmic Authorship
Writing AI: Public (Mis)Perceptions of Algorithmic AuthorshipWriting AI: Public (Mis)Perceptions of Algorithmic Authorship
Writing AI: Public (Mis)Perceptions of Algorithmic AuthorshipLeah Henrickson
 
The #PandemicReading Aesthetic: A Photo Essay of Quarantine Reading
The #PandemicReading Aesthetic: A Photo Essay of Quarantine ReadingThe #PandemicReading Aesthetic: A Photo Essay of Quarantine Reading
The #PandemicReading Aesthetic: A Photo Essay of Quarantine ReadingLeah Henrickson
 
Narratives of Narrative Systems: Searching for the Human in Computer-Generate...
Narratives of Narrative Systems: Searching for the Human in Computer-Generate...Narratives of Narrative Systems: Searching for the Human in Computer-Generate...
Narratives of Narrative Systems: Searching for the Human in Computer-Generate...Leah Henrickson
 
'Life Beyond the PhD' Interdisciplinary Research Proposal
'Life Beyond the PhD' Interdisciplinary Research Proposal'Life Beyond the PhD' Interdisciplinary Research Proposal
'Life Beyond the PhD' Interdisciplinary Research ProposalLeah Henrickson
 
The Birth of the Algorithmic Author: NLG Systems as Tools and Agents
The Birth of the Algorithmic Author: NLG Systems as Tools and AgentsThe Birth of the Algorithmic Author: NLG Systems as Tools and Agents
The Birth of the Algorithmic Author: NLG Systems as Tools and AgentsLeah Henrickson
 

More from Leah Henrickson (20)

Versions of Intimacy: Talking To and About CarynAI
Versions of Intimacy: Talking To and About CarynAIVersions of Intimacy: Talking To and About CarynAI
Versions of Intimacy: Talking To and About CarynAI
 
Digital Storytelling for Collaborative Scholarship
Digital Storytelling for Collaborative ScholarshipDigital Storytelling for Collaborative Scholarship
Digital Storytelling for Collaborative Scholarship
 
Deckling the Edges of the Digital: Why Book History Matters for Digital Human...
Deckling the Edges of the Digital: Why Book History Matters for Digital Human...Deckling the Edges of the Digital: Why Book History Matters for Digital Human...
Deckling the Edges of the Digital: Why Book History Matters for Digital Human...
 
Chatting with Computers
Chatting with ComputersChatting with Computers
Chatting with Computers
 
Between Hermeneutics and Deceit: Keeping Natural Language Generation in Line
Between Hermeneutics and Deceit: Keeping Natural Language Generation in LineBetween Hermeneutics and Deceit: Keeping Natural Language Generation in Line
Between Hermeneutics and Deceit: Keeping Natural Language Generation in Line
 
Crafting Wellness: An Introduction to the University of Leeds' 'I Belong: Cre...
Crafting Wellness: An Introduction to the University of Leeds' 'I Belong: Cre...Crafting Wellness: An Introduction to the University of Leeds' 'I Belong: Cre...
Crafting Wellness: An Introduction to the University of Leeds' 'I Belong: Cre...
 
Wake Me Up When December Ends: Making Sense of Chatbot 'Authors'
Wake Me Up When December Ends: Making Sense of Chatbot 'Authors'Wake Me Up When December Ends: Making Sense of Chatbot 'Authors'
Wake Me Up When December Ends: Making Sense of Chatbot 'Authors'
 
‘Your Differentiating Strength’: Applied Digital Storytelling for Employment ...
‘Your Differentiating Strength’: Applied Digital Storytelling for Employment ...‘Your Differentiating Strength’: Applied Digital Storytelling for Employment ...
‘Your Differentiating Strength’: Applied Digital Storytelling for Employment ...
 
Telling Your Story for Effect and Affect
Telling Your Story for Effect and AffectTelling Your Story for Effect and Affect
Telling Your Story for Effect and Affect
 
The Procedural Rhetoric of Pedagogy: A Reflection on Teaching Digital Storyte...
The Procedural Rhetoric of Pedagogy: A Reflection on Teaching Digital Storyte...The Procedural Rhetoric of Pedagogy: A Reflection on Teaching Digital Storyte...
The Procedural Rhetoric of Pedagogy: A Reflection on Teaching Digital Storyte...
 
Funny, Fake, Freaky, Fascinating?: Making Sense of Computer-Generated Texts
Funny, Fake, Freaky, Fascinating?: Making Sense of Computer-Generated TextsFunny, Fake, Freaky, Fascinating?: Making Sense of Computer-Generated Texts
Funny, Fake, Freaky, Fascinating?: Making Sense of Computer-Generated Texts
 
Let's Get Digital, Digital 🎶: Using Digital Humanities to Embrace Data Fuzziness
Let's Get Digital, Digital 🎶: Using Digital Humanities to Embrace Data FuzzinessLet's Get Digital, Digital 🎶: Using Digital Humanities to Embrace Data Fuzziness
Let's Get Digital, Digital 🎶: Using Digital Humanities to Embrace Data Fuzziness
 
'Education Espresso: Changing Assessment' Panelist Self-Introduction
'Education Espresso: Changing Assessment' Panelist Self-Introduction'Education Espresso: Changing Assessment' Panelist Self-Introduction
'Education Espresso: Changing Assessment' Panelist Self-Introduction
 
Achieving Success in an Interdisciplinary Team
Achieving Success in an Interdisciplinary TeamAchieving Success in an Interdisciplinary Team
Achieving Success in an Interdisciplinary Team
 
Reading Computer-Generated Books: Artificial Versifying
Reading Computer-Generated Books: Artificial VersifyingReading Computer-Generated Books: Artificial Versifying
Reading Computer-Generated Books: Artificial Versifying
 
Writing AI: Public (Mis)Perceptions of Algorithmic Authorship
Writing AI: Public (Mis)Perceptions of Algorithmic AuthorshipWriting AI: Public (Mis)Perceptions of Algorithmic Authorship
Writing AI: Public (Mis)Perceptions of Algorithmic Authorship
 
The #PandemicReading Aesthetic: A Photo Essay of Quarantine Reading
The #PandemicReading Aesthetic: A Photo Essay of Quarantine ReadingThe #PandemicReading Aesthetic: A Photo Essay of Quarantine Reading
The #PandemicReading Aesthetic: A Photo Essay of Quarantine Reading
 
Narratives of Narrative Systems: Searching for the Human in Computer-Generate...
Narratives of Narrative Systems: Searching for the Human in Computer-Generate...Narratives of Narrative Systems: Searching for the Human in Computer-Generate...
Narratives of Narrative Systems: Searching for the Human in Computer-Generate...
 
'Life Beyond the PhD' Interdisciplinary Research Proposal
'Life Beyond the PhD' Interdisciplinary Research Proposal'Life Beyond the PhD' Interdisciplinary Research Proposal
'Life Beyond the PhD' Interdisciplinary Research Proposal
 
The Birth of the Algorithmic Author: NLG Systems as Tools and Agents
The Birth of the Algorithmic Author: NLG Systems as Tools and AgentsThe Birth of the Algorithmic Author: NLG Systems as Tools and Agents
The Birth of the Algorithmic Author: NLG Systems as Tools and Agents
 

Recently uploaded

🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure servicePooja Nehwal
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsTop 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsRoshan Dwivedi
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
Developing An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilDeveloping An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilV3cube
 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...gurkirankumar98700
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 

Recently uploaded (20)

🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsTop 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Developing An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilDeveloping An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of Brazil
 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 

R in the Humanities: Text Analysis and Visualization

  • 1. R in the Humanities: Text Analysis (2022) Dr Leah Henrickson Lecturer in Digital Media School of Media and Communication University of Leeds L.R.Henrickson@leeds.ac.uk twitter.com/leahhenrickson
  • 2. Who am I? • Lecturer in Digital Media • Programme Leader, MA New Media • Book historian • Digital humanist • Canadian 🍁 L.R.Henrickson@leeds.ac.uk twitter.com/leahhenrickson
  • 3. Publication in the next issue of Victorian Review: ‘Tangling and Untangling the Trollopes’, with Eleanor Dumbill
  • 4. Session 1: Gettin’ to Grips with R CC Image: https://www.pexels.com/photo/smiling-model-in-pirate-costume-with-smoking-pipe-7000092
  • 5. Overview This course is a gentle introduction to R for text analysis. Over the course of two sessions you will be taught the basics of the powerful programming language before being provided with hands-on experience analysing long-form text in the RStudio development environment. By the end of the course, you will be able to: • Navigate the RStudio development environment • Prepare long-form prose texts for computational analysis using R • Conduct basic computational analyses of long-form prose texts • Construct and explain visualisations of computed results • Critically apply computational text analysis to complement other analytical methods To complete this course you will need to install: • R version 3.6 or higher (download at https://www.r-project.org) • RStudio Desktop: Open Source Edition 1.2 or higher (download at https://www.rstudio.com/products/rstudio)
  • 6. Session 1 Agenda 1. What are R and RStudio? 2. What can R help you do? 3. A quick note about Computational Literary Studies 4. Getting started with R 5. Cleaning text CC Image: https://www.pexels.com/photo/black-cat-holding-persons-arm-1049764
  • 7. What are R and RStudio? R is: • a programming language • a software environment • a really fancy calculator • free/open source Download: https://cran.r-project.org/mirrors.html RStudio is: • an integrated development environment (IDE) • a great way to make your coding experiences easier, more colourful, and more fun! Download: https://www.rstudio.com/products/rstudio/download
  • 8. What can R help you do? • Count words • Find linguistic patterns within and across texts • Compare texts • Make pretty pictures But it’s still up to you to explain results. Also, is R always the most appropriate tool? CC Image: https://pixabay.com/photos/letters-tiles-word-game-crossword-4938486
  • 9. A quick note about Computational Literary Studies (CLS) CLS has a long history (for example, Father Robert Busa, ~1940s), but has been criticised for: • Misinterpretation of statistical data (Da) • Unchecked enthusiasm for technological ‘hype’ (Kirsch) • Turning literature into data and neglecting reception of works (Marche) Da, Nan Z. “The Computational Case against Computational Literary Studies.” Critical Inquiry, vol. 45, 2019, pp. 601-639. Kirsch, Adam. “Technology Is Taking Over English Departments.” The New Republic, 2014, https://newrepublic.com/article/117428/limits-digital-humanities-adam-kirsch. Accessed 21 December 2020. Marche, Stephen. “Literature Is not Data: Against Digital Humanities.” The Los Angeles Review of Books, 2012, https://lareviewofbooks.org/article/literature-is-not-data-against-digital-humanities. Accessed 21 December 2020. CC Image: https://melissaterras.org/2013/10/15/for-ada-lovelace-day-father-busas-female-punch-card-operatives
  • 12. Terminal (write your script) Console (run your script) Environment (your data) Everything else!
  • 13. The Basics (1/2) Calculating • 10 + 2 (spaces optional) • 10 – 2 • 10 * 2 • 10 / 2 Strings and Things • 1:50 • print(“Hello world!”) • [variable name] <- c(1, 2, 3) • [variable name][2] Meme: https://knowyourmeme.com/memes/math-lady-confused-lady
  • 14. The Basics (2/2) • Data types: character, numeric, integer, logical, complex • Data structures: vector, list, matrix, data frame, factors • Keep notes using # • Need help? • ?____________ • help() • install.packages(“[name of package]”) Meme: https://www.reddit.com/r/ProgrammerHumor/comments/8w54mx/code_comments_be_like
  • 15. Tools > Global Options > Appearance (You will need to restart RStudio to apply these changes).
  • 16. Let’s clean some text! CC Image: https://thenounproject.com/term/cleaning/199037
  • 17. You can use whatever corpus you’d like for this course. However, I have prepared a corpus of twelve texts for you. You may download the corpus at http://tinyurl.com/n8texts. This corpus includes six public domain texts comprising the first months of Astounding Stories of Super-Science (1930). A full corpus for the year is available at http://tinyurl.com/n8texts2, if you’d like to use it in your own time. • astoundingjan1930: https://www.gutenberg.org/ebooks/41481 • astoundingfeb1930: https://www.gutenberg.org/ebooks/28617 • astoundingmar1930: https://www.gutenberg.org/ebooks/29607 • astoundingapr1930: https://www.gutenberg.org/ebooks/29390 • astoundingmay1930: https://www.gutenberg.org/ebooks/29809 • astoundingjun1930: https://www.gutenberg.org/ebooks/29848 • astoundingjul1930: https://www.gutenberg.org/ebooks/29198 • astoundingaug1930: https://www.gutenberg.org/ebooks/29768 • astoundingsep1930: https://www.gutenberg.org/ebooks/29255 • astoundingoct1930: https://www.gutenberg.org/ebooks/29882 • astoundingnov1930: https://www.gutenberg.org/ebooks/29919 • astoundingdec1930: https://www.gutenberg.org/ebooks/30691
  • 18. First, set your working directory: Session > Set Working Directory > Choose Directory > [folder] install.packages(“tm”) library(tm) getwd() texts <- Corpus(DirSource(“[path to working directory]”) writeLines(as.character(texts[[4]]) ?tm_map getTransformations() texts1 <- tm_map(texts, removePunctuation) texts2 <- tm_map(texts1, removeNumbers) texts3 <- tm_map(texts2, content_transformer(tolower)) texts4 <- tm_map(texts3, removeWords, stopwords(“english”)) texts_final <- tm_map(texts4, stripWhitespace) writeLines(as.character(texts_final[[4]]) dtm <- DocumentTermMatrix(texts_final) + use inspect() to take a look!
  • 19. Help me! (1/3) R Communities #rstats (Twitter): https://twitter.com/hashtag/rstats Forwards: https://forwards.github.io R-Bloggers: https://www.r-bloggers.com R-Ladies: https://rladies.org r/rstats: https://www.reddit.com/r/rstats RStudio Community: https://community.rstudio.com Stack Overflow: https://stackoverflow.com/questions/tagged/r
  • 20. Help me! (2/3) R Resources Matthew Jockers, Text Analysis with R for Students of Literature (New York: Springer, 2014) https://www.matthewjockers.net/text-analysis-with-r-for-students-of-literature/ LinkedIn Learning: R: https://www.linkedin.com/learning/topics/r Emmanuel Paradis, R for Beginners (2005): https://cran.r-project.org/doc/contrib/Paradis-rdebuts_en.pdf Emma Rand, ‘Reproducible Analyses in R’, N8 CIR (2020): https://n8cir.org.uk/events/event-resource/analyses-r W. N. Venables, D. M. Smith, and the R Core Team, An Introduction to R (2021): https://cran.r-project.org/doc/manuals/r- release/R-intro.pdf
  • 21. Help me! (3/3) R Packages for Text Analysis corpustools (tokenised text analysis): https://cran.r-project.org/web/packages/corpustools gutenbergr (searching/downloading Project Gutenberg): https://cran.r-project.org/web/packages/gutenbergr quanteda (quantitative text analysis): https://cran.r-project.org/web/packages/quanteda/index.html stylo (stylometry): https://cran.r-project.org/web/packages/stylo syuzhet (sentiment analysis): https://cran.r-project.org/web/packages/syuzhet/index.html tidytext (a bit of everything!): https://cran.r-project.org/web/packages/tidytext tm (text mining – what we’ve done here): https://cran.r-project.org/web/packages/tm/index.html
  • 22. Session 2: Charts, Clouds, and Confidence Image: https://pixabay.com/illustrations/rainbow-cloud-sunset-colorful-sky-5389074/
  • 23. Session 2 Agenda 1. Any questions from last week? 2. Review of last week’s session (i.e. cleaning text) 3. Counting words 4. Plotting results 5. Making word clouds 6. Wrapping up CC Images: https://thenounproject.com/term/graph/21394; https://thenounproject.com/term/word-cloud/195993
  • 24. First, set your working directory: Session > Set Working Directory > Choose Directory > [folder] install.packages(“tm”) library(tm) getwd() texts <- Corpus(DirSource(“[path to working directory]”) writeLines(as.character(texts[[4]]) ?tm_map getTransformations() texts1 <- tm_map(texts, removePunctuation) texts2 <- tm_map(texts1, removeNumbers) texts3 <- tm_map(texts2, content_transformer(tolower)) texts4 <- tm_map(texts3, removeWords, stopwords(“english”)) texts_final <- tm_map(texts4, stripWhitespace) writeLines(as.character(texts_final[[4]]) dtm <- DocumentTermMatrix(texts_final) + use inspect() to take a look!
  • 25. Getting word frequencies and associations: freq <- colSums(as.matrix(dtm)) freq[1:10] freq_d <- sort(freq, decreasing=TRUE) freq_d[1:10] findFreqTerms(dtm, lowfreq=100) findAssocs(dtm, “man", 0.95) ?findAssocs
  • 26. Making a bar chart (and then making it look nice): barplot(freq_d[1:10]) ?barplot install.packages("RColorBrewer") library(RColorBrewer) ?RColorBrewer display.brewer.all|) cols <- brewer.pal(8, “Paired") barplot(freq_d[1:10], col=cols, main="My Cool Plot", xlab="Word", ylab="Instances")
  • 27. Making a word cloud (and then making it look nice): install.packages("wordcloud") library(wordcloud) matrix <- as.matrix(dtm) wordbank <- sort(colSums(matrix), decreasing=TRUE) df <- data.frame(words=names(wordbank), freq=wordbank) ?data.frame ?wordcloud wordcloud(words=df$words, freq=df$freq, max.words=100, random.order=FALSE, col=cols)
  • 28. Discussion: What are the potentials? What are the limitations? Is R the best choice? CC Image: https://www.pexels.com/photo/selective-focus-photography-of-traffic-light-1616781
  • 29. Help me! (1/3) R Communities #rstats (Twitter): https://twitter.com/hashtag/rstats Forwards: https://forwards.github.io R-Bloggers: https://www.r-bloggers.com R-Ladies: https://rladies.org r/rstats: https://www.reddit.com/r/rstats RStudio Community: https://community.rstudio.com Stack Overflow: https://stackoverflow.com/questions/tagged/r
  • 30. Help me! (2/3) R Resources Matthew Jockers, Text Analysis with R for Students of Literature (New York: Springer, 2014) https://www.matthewjockers.net/text-analysis-with-r-for-students-of-literature/ LinkedIn Learning: R: https://www.linkedin.com/learning/topics/r Emmanuel Paradis, R for Beginners (2005): https://cran.r-project.org/doc/contrib/Paradis-rdebuts_en.pdf Emma Rand, ‘Reproducible Analyses in R’, N8 CIR (2020): https://n8cir.org.uk/events/event-resource/analyses-r W. N. Venables, D. M. Smith, and the R Core Team, An Introduction to R (2021): https://cran.r-project.org/doc/manuals/r- release/R-intro.pdf
  • 31. Help me! (3/3) R Packages for Text Analysis corpustools (tokenised text analysis): https://cran.r-project.org/web/packages/corpustools gutenbergr (searching/downloading Project Gutenberg): https://cran.r-project.org/web/packages/gutenbergr quanteda (quantitative text analysis): https://cran.r-project.org/web/packages/quanteda/index.html stylo (stylometry): https://cran.r-project.org/web/packages/stylo syuzhet (sentiment analysis): https://cran.r-project.org/web/packages/syuzhet/index.html tidytext (a bit of everything!): https://cran.r-project.org/web/packages/tidytext tm (text mining – what we’ve done here): https://cran.r-project.org/web/packages/tm/index.html
  • 32. Thank you! Dr Leah Henrickson Lecturer in Digital Media School of Media and Communication University of Leeds L.R.Henrickson@leeds.ac.uk twitter.com/leahhenrickson

Editor's Notes

  1. Matrix = table Data frame = table, with my flexible about what can be included in that table