SlideShare a Scribd company logo
1 of 1
Download to read offline
TEMPLATE DESIGN © 2008
www.PosterPresentations.com
Text-Mining PubMed Search Results to Identify Emerging Technologies
Relevant to Medical Librarians
P. F. Anderson1
<pfa@umich.edu>; Skye Bickett2
; Joanne Doucette3
; Pamela Herring4
; Andrea Kepsel5
; Tierney Lyons6
; Scott McLachlan7
; Carol Shannon1
; Lin Wu
8
University of Michigan-Ann Arbor; 2) Georgia Campus-Philadelphia College of Osteopathic Medicine; 3) MCPHS University, Boston; 4) University of Central Florida College
ofMedicine; 5) Michigan State University; 6) Cerebros Medical Systems, Jessup, PA; 7) Ruskin College, Oxford; 8) University of Tennessee, Memphis
Objectives
The Emerging Technologies Team, part of the Medical Library Association (MLA)
systematic review (SR) projects, conducted a pilot study to identify emerging
technologies relevant to medical librarians. The team analyzed results from its
previously reported PubMed Search filter using text mining to identify patterns,
themes, and trends important to the practice of medical librarianship and the
communities we support.
Methods
Analysis: Challenges & Solutions
Challenges:
1. FLink exports as CSV directly
from PubMed, but only permits
export of 10,000 records, no
abstracts.
2. Inadequate (under-powered)
hardware.
3. Large file size created
challenges with opening file and
file conversion.
4. Unable to install current version
of text mining software
(OpenRefine).
5. IT policies (blocking) and
support at some institutions.
Text Mining Images
All images were created through the Voyant-Tools analysis: <http://voyant-tools.org/>
Next Steps & Recommendations
Voyant was used for basic analysis to identify big concepts from 2016, yet we can
dig deeper with additional tools to generate unknown items. Moving forward, we
will extend the years of analysis for trends and patterns and continue gathering
more data. We will continue the text mining process, using visualization
techniques such as Google Refine and OpenRefine for analysis in context and
AntConc for concordance and fringe concepts. We will then publish the results.
Once one has a dataset, the dataset itself can be useful to look for trends in
specific areas, such as surgery or education, potentially in response to areas of
interest within the library’s target audience.
Results
Sources / Resources
Find us
The finalized search strategy results in a five- year file with 162,339 records.
Deduping in EndNote resulted in 162,221 records. We tested the analysis with 5-year
[162,221], 3-year [107,531], and 1-year sets for each of the five years. For this poster
we tested the analysis process with the single year set from 2016 [35,535].
In our initial project planning, we had identified five main areas of interest
(technology, information, public health, education, and the body). This analysis made
clear additional clusters of interesting content, especially new methodologies (e.g.,
big data and data visualization) and emerging interdisciplinary trends (such as
precision medicine). As those had not been included in the original planning, this
showed the potential benefit of text mining for discovering unknown areas of
relevance.
The primary areas of the body which were strongly represented in the data included
blood, bone, brain, and urine. Related concepts which were strongly represented in
the data set included cancer, diagnostics, treatment, and biomarkers.
The three top technologies that arose from the text mining process were robotics,
simulations, and 3D technologies, especially 3D printing. All three were being used
most heavily in surgery. Simulations were also prominent in education/training.
BIBLIOGRAPHY
Higgins JPT, Deeks JJ, (eds.) 2011. Selecting studies and collecting data, London: The
Cochrane Collaboration.
Mane KK, Borner K. 2004. Mapping topics and topic bursts in PNAS. Proceedings of the
National Academy of Sciences 101(suppl. 1), 5287-5290.
Mikova N. 2016. Recent trends in technology mining approaches: Quantitative analysis of GTM
Conference Proceedings. In: Daim TU, Chiavetta D, Porter AL, Saritas O (eds.) Anticipating
future innovation pathways through large data analysis. Cham, Switzerland: Springer
International Publishing.
Porter AL, Cunningham SW. 2005. Tech mining: Exploiting new technologies for competitive
advantage, Hoboken, NJ, John Wiley & Sons, Inc.
Schünemann HJ, Oxman AD, Vist GE, Higgins JPT, Deeks JJ, Glasziou PP, Guyatt GH. 2011.
Interpreting results and drawing conclusions. In: Higgins JPT, Green S (eds.) Cochrane
handbook for systematic reviews of interventions Version 5.1.0 (updated March 2011). London:
The Cochrane Collaboration.
Stevens A, Milne R, Lilford R, Gabbay J. 1999. Keeping pace with new technologies: Systems
needed to identify and evaluate them. BMJ 319, 1291-3.
RESOURCES
Endnote: endnote.com/
Voyant: https://voyant-tools.org/
OpenRefine: http://openrefine.org/
AntConc: www.laurenceanthony.net/software/antconc/
We began by establishing a common competency base through custom training
sessions from higher education data-mining experts. Next, the team 1) reviewed and
finalized the emerging technologies PubMed search strategy created for the project;
2) exported the data; 3) used automated tools to clean extraneous data from the data
set; and 4) tested the data by running preliminary text-mining scans. Steps 3 and 4
were repeated to refine and focus the results. Tools such as GREP, R, FLink,
pubmed.mineR were evaluated and tested for data export and cleaning, with the
ultimate choices settling on a combination of EndNote, Voyant, OpenRefine for data
cleaning, and Voyant, OpenRefine/GoogleRefine and AntConc for analysis.
MLASR6 Google Plus Community: goo.gl/RxtOFg
The Medical Library Association initiated a large systematic review project to assess the level of
evidence available to support the profession and practice of medical librarianship in several
very important questions. Team 6 has been assigned to explore this topic:
The explosion of information, expanding of technology (especially mobile technology), and
complexity of healthcare environment present medical librarians and medical libraries
opportunities and challenges. To live up with the opportunities and challenges, what kinds of
skill sets or information structure do medical librarians or medical libraries are required to have
or acquire so as to be strong partners or contributors of continuing effectiveness to the
changing environment?
Process: Software & Technology
1
st
stage: Voyant was used to identify words/word cloud to create custom
stoplist. Stop word list was created by the team leader, and peer reviewed
within the team. Collocations used to identify major tech concepts from word
cloud concepts. Additional visualizations to refine understanding of top three
tech concepts.
2
nd
stage: OpenRefine will be used to open projects & expand analysis.
(OpenRefine challenges: with full dataset, it stalls out in the project creation
stage on team desktops. Works in Chrome, but not in Opera or Safari.)
3
rd
stage: AntConc will be used for a deep dive into the specific terms/phrases
of interest to discover their context and related concepts.
Data Cleaning
After running the finalized search strategy in PubMed, the resulting list was exported
from the database in the MEDLINE format, creating a TXT file. To support the
proposed text mining analysis of this dataset using Voyant or OpenRefine, a CSV
file needed to be built. FLink, an NLM product developed to create CSV files, was
used initially. Unfortunately, the program was only able to handle 10,000 records
and could not produce a CSV file that included abstracts. To create the appropriate
CSV file, results were exported from PubMed as TXT file, imported to EndNote,
deduplicated within EndNote, then exported using a custom output style created by
the team. The resulting CSV file of 162,339 PubMed records was downloaded to
Excel where all fields except for PMID, Title, Abstract and MeSH or Keywords were
deleted. The remaining content was cleaned by removal of punctuation (using
nested SUBSTITUTE functions) and changing all text to lowercase (using the
LOWER function). Considerations during punctuation removal included separation
of MeSH headings and subheadings by removal of the “/” character, the importance
of “.” character in numerical values, and the decision of whether or not to keep
numeric data.
.
Figure 1: Word Cloud of Whole Corpus
Figure 2: Top Three Technologies Identified in Analysis
Figure 3: 3D Printing in Context
Solutions:
1. Export full records, import to
Endnote, use custom filter for initial
cleaning, export to CSV file.
2. Do you have access to more
powerful computers elsewhere?
3. Break file into smaller chunks for
cleaning; pool for final analysis.
4. Upgrade computer to have more
memory, or use more powerful
computer elsewhere.
5. Ensure good technical support and
administrative backing for project.

More Related Content

What's hot

Scott Edmunds: Quantifying how FAIR is Hong Kong: The Hong Kong Shareability ...
Scott Edmunds: Quantifying how FAIR is Hong Kong: The Hong Kong Shareability ...Scott Edmunds: Quantifying how FAIR is Hong Kong: The Hong Kong Shareability ...
Scott Edmunds: Quantifying how FAIR is Hong Kong: The Hong Kong Shareability ...GigaScience, BGI Hong Kong
 
Human Genome and Big Data Challenges
Human Genome and Big Data ChallengesHuman Genome and Big Data Challenges
Human Genome and Big Data ChallengesPhilip Bourne
 
GSmith Springer Nature Data policies and practices: HKU Open Data and Data Pu...
GSmith Springer Nature Data policies and practices: HKU Open Data and Data Pu...GSmith Springer Nature Data policies and practices: HKU Open Data and Data Pu...
GSmith Springer Nature Data policies and practices: HKU Open Data and Data Pu...GrahamSmith646206
 
Reproducible research: First steps.
Reproducible research: First steps. Reproducible research: First steps.
Reproducible research: First steps. Richard Layton
 
Prisma s manuscript preprint
Prisma s manuscript preprintPrisma s manuscript preprint
Prisma s manuscript preprintdaisyfloresc
 
FedCentric_Presentation
FedCentric_PresentationFedCentric_Presentation
FedCentric_PresentationYatpang Cheung
 
2015 GU-ICBI Poster (third printing)
2015 GU-ICBI Poster (third printing)2015 GU-ICBI Poster (third printing)
2015 GU-ICBI Poster (third printing)Michael Atkins
 
RES812 U4 Individual Project
RES812  U4 Individual ProjectRES812  U4 Individual Project
RES812 U4 Individual ProjectThienSi Le
 
Interlinking educational data to Web of Data (Thesis presentation)
Interlinking educational data to Web of Data (Thesis presentation)Interlinking educational data to Web of Data (Thesis presentation)
Interlinking educational data to Web of Data (Thesis presentation)Enayat Rajabi
 
The Roots: Linked data and the foundations of successful Agriculture Data
The Roots: Linked data and the foundations of successful Agriculture DataThe Roots: Linked data and the foundations of successful Agriculture Data
The Roots: Linked data and the foundations of successful Agriculture DataPaul Groth
 
EKAW 2016 - TechMiner: Extracting Technologies from Academic Publications
EKAW 2016 - TechMiner: Extracting Technologies from Academic PublicationsEKAW 2016 - TechMiner: Extracting Technologies from Academic Publications
EKAW 2016 - TechMiner: Extracting Technologies from Academic PublicationsFrancesco Osborne
 
ISMB/ECCB 2013 Keynote Goble Results may vary: what is reproducible? why do o...
ISMB/ECCB 2013 Keynote Goble Results may vary: what is reproducible? why do o...ISMB/ECCB 2013 Keynote Goble Results may vary: what is reproducible? why do o...
ISMB/ECCB 2013 Keynote Goble Results may vary: what is reproducible? why do o...Carole Goble
 
Elsevier’s Healthcare Knowledge Graph
Elsevier’s Healthcare Knowledge GraphElsevier’s Healthcare Knowledge Graph
Elsevier’s Healthcare Knowledge GraphPaul Groth
 
Reproducibility and Scientific Research: why, what, where, when, who, how
Reproducibility and Scientific Research: why, what, where, when, who, how Reproducibility and Scientific Research: why, what, where, when, who, how
Reproducibility and Scientific Research: why, what, where, when, who, how Carole Goble
 
Responsible conduct of research: Data Management
Responsible conduct of research: Data ManagementResponsible conduct of research: Data Management
Responsible conduct of research: Data ManagementC. Tobin Magle
 
Bringing bioinformatics into the library
Bringing bioinformatics into the libraryBringing bioinformatics into the library
Bringing bioinformatics into the libraryC. Tobin Magle
 
Google Scholar and Web of Science: Similarities and Differences in Citation A...
Google Scholar and Web of Science: Similarities and Differences in Citation A...Google Scholar and Web of Science: Similarities and Differences in Citation A...
Google Scholar and Web of Science: Similarities and Differences in Citation A...Balachandar Radhakrishnan
 
Next-Generation Search Engines for Information Retrieval
Next-Generation Search Engines for Information RetrievalNext-Generation Search Engines for Information Retrieval
Next-Generation Search Engines for Information RetrievalWaqas Tariq
 

What's hot (20)

Scott Edmunds: Quantifying how FAIR is Hong Kong: The Hong Kong Shareability ...
Scott Edmunds: Quantifying how FAIR is Hong Kong: The Hong Kong Shareability ...Scott Edmunds: Quantifying how FAIR is Hong Kong: The Hong Kong Shareability ...
Scott Edmunds: Quantifying how FAIR is Hong Kong: The Hong Kong Shareability ...
 
Human Genome and Big Data Challenges
Human Genome and Big Data ChallengesHuman Genome and Big Data Challenges
Human Genome and Big Data Challenges
 
GSmith Springer Nature Data policies and practices: HKU Open Data and Data Pu...
GSmith Springer Nature Data policies and practices: HKU Open Data and Data Pu...GSmith Springer Nature Data policies and practices: HKU Open Data and Data Pu...
GSmith Springer Nature Data policies and practices: HKU Open Data and Data Pu...
 
Reproducible research: First steps.
Reproducible research: First steps. Reproducible research: First steps.
Reproducible research: First steps.
 
Prisma s manuscript preprint
Prisma s manuscript preprintPrisma s manuscript preprint
Prisma s manuscript preprint
 
FedCentric_Presentation
FedCentric_PresentationFedCentric_Presentation
FedCentric_Presentation
 
2015 GU-ICBI Poster (third printing)
2015 GU-ICBI Poster (third printing)2015 GU-ICBI Poster (third printing)
2015 GU-ICBI Poster (third printing)
 
RES812 U4 Individual Project
RES812  U4 Individual ProjectRES812  U4 Individual Project
RES812 U4 Individual Project
 
Interlinking educational data to Web of Data (Thesis presentation)
Interlinking educational data to Web of Data (Thesis presentation)Interlinking educational data to Web of Data (Thesis presentation)
Interlinking educational data to Web of Data (Thesis presentation)
 
The Roots: Linked data and the foundations of successful Agriculture Data
The Roots: Linked data and the foundations of successful Agriculture DataThe Roots: Linked data and the foundations of successful Agriculture Data
The Roots: Linked data and the foundations of successful Agriculture Data
 
Curating and Sharing Structures and Spectra for the Environmental Community
Curating and Sharing  Structures and Spectra for the Environmental CommunityCurating and Sharing  Structures and Spectra for the Environmental Community
Curating and Sharing Structures and Spectra for the Environmental Community
 
Payton Eliminating Conflicts in Ebook Metadata
Payton Eliminating Conflicts in Ebook MetadataPayton Eliminating Conflicts in Ebook Metadata
Payton Eliminating Conflicts in Ebook Metadata
 
EKAW 2016 - TechMiner: Extracting Technologies from Academic Publications
EKAW 2016 - TechMiner: Extracting Technologies from Academic PublicationsEKAW 2016 - TechMiner: Extracting Technologies from Academic Publications
EKAW 2016 - TechMiner: Extracting Technologies from Academic Publications
 
ISMB/ECCB 2013 Keynote Goble Results may vary: what is reproducible? why do o...
ISMB/ECCB 2013 Keynote Goble Results may vary: what is reproducible? why do o...ISMB/ECCB 2013 Keynote Goble Results may vary: what is reproducible? why do o...
ISMB/ECCB 2013 Keynote Goble Results may vary: what is reproducible? why do o...
 
Elsevier’s Healthcare Knowledge Graph
Elsevier’s Healthcare Knowledge GraphElsevier’s Healthcare Knowledge Graph
Elsevier’s Healthcare Knowledge Graph
 
Reproducibility and Scientific Research: why, what, where, when, who, how
Reproducibility and Scientific Research: why, what, where, when, who, how Reproducibility and Scientific Research: why, what, where, when, who, how
Reproducibility and Scientific Research: why, what, where, when, who, how
 
Responsible conduct of research: Data Management
Responsible conduct of research: Data ManagementResponsible conduct of research: Data Management
Responsible conduct of research: Data Management
 
Bringing bioinformatics into the library
Bringing bioinformatics into the libraryBringing bioinformatics into the library
Bringing bioinformatics into the library
 
Google Scholar and Web of Science: Similarities and Differences in Citation A...
Google Scholar and Web of Science: Similarities and Differences in Citation A...Google Scholar and Web of Science: Similarities and Differences in Citation A...
Google Scholar and Web of Science: Similarities and Differences in Citation A...
 
Next-Generation Search Engines for Information Retrieval
Next-Generation Search Engines for Information RetrievalNext-Generation Search Engines for Information Retrieval
Next-Generation Search Engines for Information Retrieval
 

Similar to Text-Mining PubMed Search Results to Identify Emerging Technologies Relevant to Medical Librarians

Peter Embi's 2011 AMIA CRI Year-in-Review
Peter Embi's 2011 AMIA CRI Year-in-ReviewPeter Embi's 2011 AMIA CRI Year-in-Review
Peter Embi's 2011 AMIA CRI Year-in-ReviewPeter Embi
 
Siena's Clinical Decision Assistant
Siena's Clinical Decision AssistantSiena's Clinical Decision Assistant
Siena's Clinical Decision AssistantMichael Ippolito
 
ACS 248th Paper 71 ChAMP Project
ACS 248th Paper 71 ChAMP ProjectACS 248th Paper 71 ChAMP Project
ACS 248th Paper 71 ChAMP ProjectStuart Chalk
 
OpenMinTeD: Making Sense of Large Volumes of Data
OpenMinTeD: Making Sense of Large Volumes of DataOpenMinTeD: Making Sense of Large Volumes of Data
OpenMinTeD: Making Sense of Large Volumes of Dataopenminted_eu
 
Embi cri review-2012-final
Embi cri review-2012-finalEmbi cri review-2012-final
Embi cri review-2012-finalPeter Embi
 
2011-11-28 Open PHACTS at RSC CICAG
2011-11-28 Open PHACTS at RSC CICAG2011-11-28 Open PHACTS at RSC CICAG
2011-11-28 Open PHACTS at RSC CICAGopen_phacts
 
Knowledge graph construction for research & medicine
Knowledge graph construction for research & medicineKnowledge graph construction for research & medicine
Knowledge graph construction for research & medicinePaul Groth
 
Social Phrases Having Impact in Altmetrics - SOPHIA
Social Phrases Having Impact in Altmetrics - SOPHIASocial Phrases Having Impact in Altmetrics - SOPHIA
Social Phrases Having Impact in Altmetrics - SOPHIAInsight_Altmetrics
 
Navigation through citation network based on content similarity using cosine ...
Navigation through citation network based on content similarity using cosine ...Navigation through citation network based on content similarity using cosine ...
Navigation through citation network based on content similarity using cosine ...Salam Shah
 
Biomedical indexing and retrieval system based on language modeling approach
Biomedical indexing and retrieval system based on language modeling approachBiomedical indexing and retrieval system based on language modeling approach
Biomedical indexing and retrieval system based on language modeling approachijseajournal
 
An Improved Mining Of Biomedical Data From Web Documents Using Clustering
An Improved Mining Of Biomedical Data From Web Documents Using ClusteringAn Improved Mining Of Biomedical Data From Web Documents Using Clustering
An Improved Mining Of Biomedical Data From Web Documents Using ClusteringKelly Lipiec
 
From data to knowledge – the Ondex System for integrating Life Sciences data ...
From data to knowledge – the Ondex System for integrating Life Sciences data ...From data to knowledge – the Ondex System for integrating Life Sciences data ...
From data to knowledge – the Ondex System for integrating Life Sciences data ...Catherine Canevet
 
Automatic summarization of the medical literature
Automatic summarization of the medical literatureAutomatic summarization of the medical literature
Automatic summarization of the medical literatureharinithiyagarajan4
 
How to extract data from your paper for systemic review - Pubrica
How to extract data from your paper for systemic review  - PubricaHow to extract data from your paper for systemic review  - Pubrica
How to extract data from your paper for systemic review - PubricaPubrica
 
How to extract data from your paper for systemic review – Pubrica
How to extract data from your paper for systemic review – PubricaHow to extract data from your paper for systemic review – Pubrica
How to extract data from your paper for systemic review – PubricaPubrica
 
2011-10-11 Open PHACTS at BioIT World Europe
2011-10-11 Open PHACTS at BioIT World Europe2011-10-11 Open PHACTS at BioIT World Europe
2011-10-11 Open PHACTS at BioIT World Europeopen_phacts
 

Similar to Text-Mining PubMed Search Results to Identify Emerging Technologies Relevant to Medical Librarians (20)

Developing a Replicable Methodology for Automated Identification of Emerging ...
Developing a Replicable Methodology for Automated Identification of Emerging ...Developing a Replicable Methodology for Automated Identification of Emerging ...
Developing a Replicable Methodology for Automated Identification of Emerging ...
 
Peter Embi's 2011 AMIA CRI Year-in-Review
Peter Embi's 2011 AMIA CRI Year-in-ReviewPeter Embi's 2011 AMIA CRI Year-in-Review
Peter Embi's 2011 AMIA CRI Year-in-Review
 
Prosdocimi ucb cdao
Prosdocimi ucb cdaoProsdocimi ucb cdao
Prosdocimi ucb cdao
 
Siena's Clinical Decision Assistant
Siena's Clinical Decision AssistantSiena's Clinical Decision Assistant
Siena's Clinical Decision Assistant
 
ACS 248th Paper 71 ChAMP Project
ACS 248th Paper 71 ChAMP ProjectACS 248th Paper 71 ChAMP Project
ACS 248th Paper 71 ChAMP Project
 
Paul Groth
Paul GrothPaul Groth
Paul Groth
 
OpenMinTeD: Making Sense of Large Volumes of Data
OpenMinTeD: Making Sense of Large Volumes of DataOpenMinTeD: Making Sense of Large Volumes of Data
OpenMinTeD: Making Sense of Large Volumes of Data
 
Embi cri review-2012-final
Embi cri review-2012-finalEmbi cri review-2012-final
Embi cri review-2012-final
 
2011-11-28 Open PHACTS at RSC CICAG
2011-11-28 Open PHACTS at RSC CICAG2011-11-28 Open PHACTS at RSC CICAG
2011-11-28 Open PHACTS at RSC CICAG
 
Knowledge graph construction for research & medicine
Knowledge graph construction for research & medicineKnowledge graph construction for research & medicine
Knowledge graph construction for research & medicine
 
Social Phrases Having Impact in Altmetrics - SOPHIA
Social Phrases Having Impact in Altmetrics - SOPHIASocial Phrases Having Impact in Altmetrics - SOPHIA
Social Phrases Having Impact in Altmetrics - SOPHIA
 
Navigation through citation network based on content similarity using cosine ...
Navigation through citation network based on content similarity using cosine ...Navigation through citation network based on content similarity using cosine ...
Navigation through citation network based on content similarity using cosine ...
 
Biomedical indexing and retrieval system based on language modeling approach
Biomedical indexing and retrieval system based on language modeling approachBiomedical indexing and retrieval system based on language modeling approach
Biomedical indexing and retrieval system based on language modeling approach
 
An Improved Mining Of Biomedical Data From Web Documents Using Clustering
An Improved Mining Of Biomedical Data From Web Documents Using ClusteringAn Improved Mining Of Biomedical Data From Web Documents Using Clustering
An Improved Mining Of Biomedical Data From Web Documents Using Clustering
 
From data to knowledge – the Ondex System for integrating Life Sciences data ...
From data to knowledge – the Ondex System for integrating Life Sciences data ...From data to knowledge – the Ondex System for integrating Life Sciences data ...
From data to knowledge – the Ondex System for integrating Life Sciences data ...
 
Automatic summarization of the medical literature
Automatic summarization of the medical literatureAutomatic summarization of the medical literature
Automatic summarization of the medical literature
 
How to extract data from your paper for systemic review - Pubrica
How to extract data from your paper for systemic review  - PubricaHow to extract data from your paper for systemic review  - Pubrica
How to extract data from your paper for systemic review - Pubrica
 
How to extract data from your paper for systemic review – Pubrica
How to extract data from your paper for systemic review – PubricaHow to extract data from your paper for systemic review – Pubrica
How to extract data from your paper for systemic review – Pubrica
 
As World’s Collide
As World’s CollideAs World’s Collide
As World’s Collide
 
2011-10-11 Open PHACTS at BioIT World Europe
2011-10-11 Open PHACTS at BioIT World Europe2011-10-11 Open PHACTS at BioIT World Europe
2011-10-11 Open PHACTS at BioIT World Europe
 

More from University of Michigan Taubman Health Sciences Library

More from University of Michigan Taubman Health Sciences Library (20)

Systematic Reviews, Tech Mining, and Other Knowledge Synthesis Beasts of Burden
Systematic Reviews, Tech Mining, and Other Knowledge Synthesis Beasts of BurdenSystematic Reviews, Tech Mining, and Other Knowledge Synthesis Beasts of Burden
Systematic Reviews, Tech Mining, and Other Knowledge Synthesis Beasts of Burden
 
It's Not Brain Surgery: Graphic Medicine, Graphic Justice, and More About Com...
It's Not Brain Surgery: Graphic Medicine, Graphic Justice, and More About Com...It's Not Brain Surgery: Graphic Medicine, Graphic Justice, and More About Com...
It's Not Brain Surgery: Graphic Medicine, Graphic Justice, and More About Com...
 
Methodology Mashups: Systematic Searches, Plus ...
Methodology Mashups: Systematic Searches, Plus ... Methodology Mashups: Systematic Searches, Plus ...
Methodology Mashups: Systematic Searches, Plus ...
 
#OwnVoices in Graphic Medicine: Creation and Collection
#OwnVoices in Graphic Medicine:  Creation and Collection#OwnVoices in Graphic Medicine:  Creation and Collection
#OwnVoices in Graphic Medicine: Creation and Collection
 
Introducing the "Librome Research Core"
Introducing the "Librome Research Core"Introducing the "Librome Research Core"
Introducing the "Librome Research Core"
 
Storytelling workshop: journeys in health care
Storytelling workshop: journeys in health careStorytelling workshop: journeys in health care
Storytelling workshop: journeys in health care
 
Research Methods: Searches & Systematic Reviews
Research Methods: Searches & Systematic ReviewsResearch Methods: Searches & Systematic Reviews
Research Methods: Searches & Systematic Reviews
 
NISO — Cutting Edges with Company: Emerging Technologies as a Collective Effort
NISO — Cutting Edges with Company: Emerging Technologies as a Collective EffortNISO — Cutting Edges with Company: Emerging Technologies as a Collective Effort
NISO — Cutting Edges with Company: Emerging Technologies as a Collective Effort
 
Ab Errantry: A Game to Build Awareness of the Aberrant and Abhorrent in Teens...
Ab Errantry: A Game to Build Awareness of the Aberrant and Abhorrent in Teens...Ab Errantry: A Game to Build Awareness of the Aberrant and Abhorrent in Teens...
Ab Errantry: A Game to Build Awareness of the Aberrant and Abhorrent in Teens...
 
Making Comics Fast — The Social Justice Version (2017)
Making Comics Fast — The Social Justice Version (2017)Making Comics Fast — The Social Justice Version (2017)
Making Comics Fast — The Social Justice Version (2017)
 
Making Comics Fast (2018)
Making Comics Fast (2018)Making Comics Fast (2018)
Making Comics Fast (2018)
 
Writing A Sexier Research Abstract: Making Research In Life Science More Disc...
Writing A Sexier Research Abstract: Making Research In Life Science More Disc...Writing A Sexier Research Abstract: Making Research In Life Science More Disc...
Writing A Sexier Research Abstract: Making Research In Life Science More Disc...
 
Rapid Reviews 101
Rapid Reviews 101 Rapid Reviews 101
Rapid Reviews 101
 
Making Research in the Life Sciences More Discoverable
Making Research in the Life Sciences More Discoverable Making Research in the Life Sciences More Discoverable
Making Research in the Life Sciences More Discoverable
 
Methods: Searching & Systematic Reviews
Methods: Searching & Systematic ReviewsMethods: Searching & Systematic Reviews
Methods: Searching & Systematic Reviews
 
Reinventing Normal 3: Connie Chang, Fast Forward Medical Innovation
Reinventing Normal 3: Connie Chang, Fast Forward Medical Innovation Reinventing Normal 3: Connie Chang, Fast Forward Medical Innovation
Reinventing Normal 3: Connie Chang, Fast Forward Medical Innovation
 
Reinventing Normal 2: David Chesney, Gaming for the Greater Good
Reinventing Normal 2: David Chesney, Gaming for the Greater Good Reinventing Normal 2: David Chesney, Gaming for the Greater Good
Reinventing Normal 2: David Chesney, Gaming for the Greater Good
 
Reinventing Normal 1: Michelle A. Meade, Technologies for Empowerment
Reinventing Normal 1: Michelle A. Meade, Technologies for Empowerment  Reinventing Normal 1: Michelle A. Meade, Technologies for Empowerment
Reinventing Normal 1: Michelle A. Meade, Technologies for Empowerment
 
Comic creation as an innovative library role
Comic creation as an innovative library roleComic creation as an innovative library role
Comic creation as an innovative library role
 
Try Not to Get Sued! The Pursuit of Accessibility and a Professional Captioni...
Try Not to Get Sued! The Pursuit of Accessibility and a Professional Captioni...Try Not to Get Sued! The Pursuit of Accessibility and a Professional Captioni...
Try Not to Get Sued! The Pursuit of Accessibility and a Professional Captioni...
 

Recently uploaded

SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxNavinnSomaal
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106
 
Vector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector DatabasesVector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector DatabasesZilliz
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsMiki Katsuragi
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr LapshynFwdays
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Wonjun Hwang
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Manik S Magar
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxhariprasad279825
 
The Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfThe Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfSeasiaInfotech2
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machinePadma Pradeep
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 

Recently uploaded (20)

SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptx
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 
Vector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector DatabasesVector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector Databases
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering Tips
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptx
 
The Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfThe Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdf
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machine
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 

Text-Mining PubMed Search Results to Identify Emerging Technologies Relevant to Medical Librarians

  • 1. TEMPLATE DESIGN © 2008 www.PosterPresentations.com Text-Mining PubMed Search Results to Identify Emerging Technologies Relevant to Medical Librarians P. F. Anderson1 <pfa@umich.edu>; Skye Bickett2 ; Joanne Doucette3 ; Pamela Herring4 ; Andrea Kepsel5 ; Tierney Lyons6 ; Scott McLachlan7 ; Carol Shannon1 ; Lin Wu 8 University of Michigan-Ann Arbor; 2) Georgia Campus-Philadelphia College of Osteopathic Medicine; 3) MCPHS University, Boston; 4) University of Central Florida College ofMedicine; 5) Michigan State University; 6) Cerebros Medical Systems, Jessup, PA; 7) Ruskin College, Oxford; 8) University of Tennessee, Memphis Objectives The Emerging Technologies Team, part of the Medical Library Association (MLA) systematic review (SR) projects, conducted a pilot study to identify emerging technologies relevant to medical librarians. The team analyzed results from its previously reported PubMed Search filter using text mining to identify patterns, themes, and trends important to the practice of medical librarianship and the communities we support. Methods Analysis: Challenges & Solutions Challenges: 1. FLink exports as CSV directly from PubMed, but only permits export of 10,000 records, no abstracts. 2. Inadequate (under-powered) hardware. 3. Large file size created challenges with opening file and file conversion. 4. Unable to install current version of text mining software (OpenRefine). 5. IT policies (blocking) and support at some institutions. Text Mining Images All images were created through the Voyant-Tools analysis: <http://voyant-tools.org/> Next Steps & Recommendations Voyant was used for basic analysis to identify big concepts from 2016, yet we can dig deeper with additional tools to generate unknown items. Moving forward, we will extend the years of analysis for trends and patterns and continue gathering more data. We will continue the text mining process, using visualization techniques such as Google Refine and OpenRefine for analysis in context and AntConc for concordance and fringe concepts. We will then publish the results. Once one has a dataset, the dataset itself can be useful to look for trends in specific areas, such as surgery or education, potentially in response to areas of interest within the library’s target audience. Results Sources / Resources Find us The finalized search strategy results in a five- year file with 162,339 records. Deduping in EndNote resulted in 162,221 records. We tested the analysis with 5-year [162,221], 3-year [107,531], and 1-year sets for each of the five years. For this poster we tested the analysis process with the single year set from 2016 [35,535]. In our initial project planning, we had identified five main areas of interest (technology, information, public health, education, and the body). This analysis made clear additional clusters of interesting content, especially new methodologies (e.g., big data and data visualization) and emerging interdisciplinary trends (such as precision medicine). As those had not been included in the original planning, this showed the potential benefit of text mining for discovering unknown areas of relevance. The primary areas of the body which were strongly represented in the data included blood, bone, brain, and urine. Related concepts which were strongly represented in the data set included cancer, diagnostics, treatment, and biomarkers. The three top technologies that arose from the text mining process were robotics, simulations, and 3D technologies, especially 3D printing. All three were being used most heavily in surgery. Simulations were also prominent in education/training. BIBLIOGRAPHY Higgins JPT, Deeks JJ, (eds.) 2011. Selecting studies and collecting data, London: The Cochrane Collaboration. Mane KK, Borner K. 2004. Mapping topics and topic bursts in PNAS. Proceedings of the National Academy of Sciences 101(suppl. 1), 5287-5290. Mikova N. 2016. Recent trends in technology mining approaches: Quantitative analysis of GTM Conference Proceedings. In: Daim TU, Chiavetta D, Porter AL, Saritas O (eds.) Anticipating future innovation pathways through large data analysis. Cham, Switzerland: Springer International Publishing. Porter AL, Cunningham SW. 2005. Tech mining: Exploiting new technologies for competitive advantage, Hoboken, NJ, John Wiley & Sons, Inc. Schünemann HJ, Oxman AD, Vist GE, Higgins JPT, Deeks JJ, Glasziou PP, Guyatt GH. 2011. Interpreting results and drawing conclusions. In: Higgins JPT, Green S (eds.) Cochrane handbook for systematic reviews of interventions Version 5.1.0 (updated March 2011). London: The Cochrane Collaboration. Stevens A, Milne R, Lilford R, Gabbay J. 1999. Keeping pace with new technologies: Systems needed to identify and evaluate them. BMJ 319, 1291-3. RESOURCES Endnote: endnote.com/ Voyant: https://voyant-tools.org/ OpenRefine: http://openrefine.org/ AntConc: www.laurenceanthony.net/software/antconc/ We began by establishing a common competency base through custom training sessions from higher education data-mining experts. Next, the team 1) reviewed and finalized the emerging technologies PubMed search strategy created for the project; 2) exported the data; 3) used automated tools to clean extraneous data from the data set; and 4) tested the data by running preliminary text-mining scans. Steps 3 and 4 were repeated to refine and focus the results. Tools such as GREP, R, FLink, pubmed.mineR were evaluated and tested for data export and cleaning, with the ultimate choices settling on a combination of EndNote, Voyant, OpenRefine for data cleaning, and Voyant, OpenRefine/GoogleRefine and AntConc for analysis. MLASR6 Google Plus Community: goo.gl/RxtOFg The Medical Library Association initiated a large systematic review project to assess the level of evidence available to support the profession and practice of medical librarianship in several very important questions. Team 6 has been assigned to explore this topic: The explosion of information, expanding of technology (especially mobile technology), and complexity of healthcare environment present medical librarians and medical libraries opportunities and challenges. To live up with the opportunities and challenges, what kinds of skill sets or information structure do medical librarians or medical libraries are required to have or acquire so as to be strong partners or contributors of continuing effectiveness to the changing environment? Process: Software & Technology 1 st stage: Voyant was used to identify words/word cloud to create custom stoplist. Stop word list was created by the team leader, and peer reviewed within the team. Collocations used to identify major tech concepts from word cloud concepts. Additional visualizations to refine understanding of top three tech concepts. 2 nd stage: OpenRefine will be used to open projects & expand analysis. (OpenRefine challenges: with full dataset, it stalls out in the project creation stage on team desktops. Works in Chrome, but not in Opera or Safari.) 3 rd stage: AntConc will be used for a deep dive into the specific terms/phrases of interest to discover their context and related concepts. Data Cleaning After running the finalized search strategy in PubMed, the resulting list was exported from the database in the MEDLINE format, creating a TXT file. To support the proposed text mining analysis of this dataset using Voyant or OpenRefine, a CSV file needed to be built. FLink, an NLM product developed to create CSV files, was used initially. Unfortunately, the program was only able to handle 10,000 records and could not produce a CSV file that included abstracts. To create the appropriate CSV file, results were exported from PubMed as TXT file, imported to EndNote, deduplicated within EndNote, then exported using a custom output style created by the team. The resulting CSV file of 162,339 PubMed records was downloaded to Excel where all fields except for PMID, Title, Abstract and MeSH or Keywords were deleted. The remaining content was cleaned by removal of punctuation (using nested SUBSTITUTE functions) and changing all text to lowercase (using the LOWER function). Considerations during punctuation removal included separation of MeSH headings and subheadings by removal of the “/” character, the importance of “.” character in numerical values, and the decision of whether or not to keep numeric data. . Figure 1: Word Cloud of Whole Corpus Figure 2: Top Three Technologies Identified in Analysis Figure 3: 3D Printing in Context Solutions: 1. Export full records, import to Endnote, use custom filter for initial cleaning, export to CSV file. 2. Do you have access to more powerful computers elsewhere? 3. Break file into smaller chunks for cleaning; pool for final analysis. 4. Upgrade computer to have more memory, or use more powerful computer elsewhere. 5. Ensure good technical support and administrative backing for project.