SlideShare a Scribd company logo
1 of 24
The PIMMS project and Natural Language
     Processing for Climate Science
Extending the Chemical Tagger natural language processing tool with
              climate science controlled vocabularies

                   Charlotte Pascoe, Hannah Barjat
                  Peter Murray-Rust and Gerry Devine

                  June 9th 2012, Open Repositories 2012
Portable Infrastructure for the
 Metafor Metadata System
                     http://proj.badc.rl.ac.uk/pimms/
Common Information Model
                                                           Data                                            Software
             We can talk about DataObjects
             collected together in any number of
             ways, stored in a particular medium

       Shared                               ISO

                      We reuse various ISO classes

                                     Quality
                                                     We can talk about
Some concepts                                        hierarchical
are shared                                           ModelComponents
                                                     with
                            We can record the        ModelProperties, som
                            quality of things                                          A particular Activity uses
                                                     e of which can be
                                                                                       a particular
                                                     coupled together
                            Grids                                           Activity   SoftwareComponent




                                                                                                   We can talk about
                                                                                                   Simulations run in
                                                                                                   support of Experiments.
                                                                                                   Experiments consist of
                                                                                                   Requirements;
 We can define a GridSpec                                                                          Simulations conform to
 or some other geometry                                                                            Requirements
Common Information Model
Mind Maps




Mind maps are used to capture
information requirements from domain
experts and build a controlled vocabulary.
Python Parser
The python parser processes the XML files generated by the mind maps
<component name="Radiation">
   <definition status="missing">Definition of component type Radiation required</definition>
   <parameter name="RadiativeTimeStep" choice="keyboard">
    <definition status="missing">Definition of property name RadiativeTimeStep required</definition>
    <value format="numerical" name="time step" units="time units"/>
   </parameter>
   <parametergroup name="Longwave">
    <parameter name="SchemeType" choice="XOR">
     <definition status="missing">Definition of property name SchemeType required</definition>
     <value name="Wide-band model"/>
     <value name="Wide-band (Morcrette)"/>
     <value name="K-correlated"/>
     <value name="K-correlated (RRTM)"/>
     <value name="other"/>
    </parameter>
    <parameter name="Method" choice="XOR">
     <definition status="missing">Definition of property name Method required</definition>
     <value name="Two stream"/>
     <value name="Layer interaction"/>
     <value name="other"/>
    </parameter>
    <parameter name="NumberOfSpectralIntervals" choice="keyboard">
     <definition status="missing">Definition of property name NumberOfSpectralIntervals required</definition>
     <value format="numerical" name=""/>
    </parameter>
   </parametergroup>
Web Forms
Web forms generate content in CIM xml format   http://q.cmip5.ceda.ac.uk/
CIM Viewer
http://zonda5.badc.rl.ac.uk/site/public/tools/viewer/integrated/1.5/en/73c59aba-dc6d-11df-a442-00163e9152a5/1
Chemical Tagger
                                                         http://chemicaltagger.ch.cam.ac.uk/
ChemicalTagger is an open-source tool that uses OSCAR4 and NLP techniques for tagging and
parsing experimental sections in the chemistry literature.
Chemical Tagger
   https://bitbucket.org/wwmm/chemicaltagger & https://bitbucket.org/wwmm/acpgeo
• Java project Developed by the Peter Murray-Rust
  group, Cambridge. Online demo:
 http://chemicaltagger.ch.cam.ac.uk/
• Adapted for use with ACP Abstracts (Lezan Hawizy and
  Hannah Barjat).
   –   Modification by use of dictionaries and changes to grammar.
   –   First use case outside of laboratory chemistry.
   –   Still with a significant chemistry component.
   –   Wider physical science.
 • Open Source NLP tool for processing
• Open Source NLP tool for processing chemical text
    chemical text
• Combines Chemical Entity Recognitions (OSCAR) with NLP
 • techniquesChemical Entity Recognitions
    Combines
• Extendible and Reconfigurable Taggers and Parsers
    (OSCAR) with NLP techniques
 • Extendible and Reconfigurable Taggers
    and Parsers generated using ANTLR
    (ANother Tool for Language Recognition)
Chemical Tagger & PIMMS

• To extend chemical tagger to be more suited to
  climate modelling.
     – Specifically:
         • Palaeoclimate modelling and how process of text mining
           might differ from development of a controlled vocabulary.
         • High-lighting of text for comparison with CIM documents.
         • Initially only using XML Abstracts e.g. from EGU’s
           Geoscientific Model Development and Climate of the Past.
         – Brief look at PDF to Text.




11
Paleoclimate Language
• Time periods and climatic events
   – Includes named Ages, Epochs, Eras etc. [Including all those in a mind map produced
     for the PIMMS project at Bristol].
   – context of proper nouns e.g. with words such as ‘period’, ‘era’, ‘epoch’
   – Numbers with appropriate units e.g. Mya, yr BP
   – Likely date numbers e.g. 1750 AD.
   – Acronyms – known’LGM’ e.g. [in context ACRONYMS have not been investigated]
   – Related adjectives e.g.
     seasonal, decadal, glacial, interglacial, stadial, interstadial, maximum, minimum
     where used as proper nouns.

• Palaeoclimate Models
   – Can guess model names from context
       • e.g. proper noun or acronym followed by model
       • e.g. reconstruction / simulation with XXX
   – Can develop/use glossary of model names.

• Palaeoclimate Acronyms
   – Time periods and models.
   – Theories, techniques, physical and chemical parameters?
   – Can develop/use glossary of acronyms – problem area: often not unique even
     within subject.
Natural Language vs CV

• Quick compilation of proper nouns used for time periods
  (primarily from Wikipedia) contains 185 words.
     – Use of these words together with adjective/ dates / details of
       events would produce a very large number of phrases.

• Controlled Vocabulary from Bristol contains around 24 of
  these.
     • Use of these words together with other proper nouns /
       adjectives / dates gives only 44 phrases within the Bristol CV.

• Map natural language to CV?
     – Straightforward for most dates?
     – Understanding of context important
         • Does context refer to main emphasis of paper?
13       • Is an event/time period described unambiguously? e.g. “Last Glacial
Preliminary Results
Preliminary Results (from 68 files)

    Tag / Tags                  Example                       Comment
    <timePhrase>                (i) Holocene, (ii) 8 kyr BP
    <PALAEOTIME>                (iii)

    <referencePhrase>           (i) (Otto et al. 2009b)       Important to distinguish
                                (ii) Giraudeau et al. 2000    year pattern from dates
                                                              relevant to the study.


    <locationPhrase>            (i) around Lake Kotokel,      False positives: e.g. “from
                                (ii) over Tibetan Plateau     Sphagnum”


    <LOCATION>                  (i) 52°47´ N, 108°07´ E,      Cannot currently do
                                458 m a.s.l (ii) London.      degrees from pdf-text.

    <TempPhrase>                                              „warm‟ and „cool‟: verbs in
                                                              synthetic chem unlike env.
                                                              chem.
Tag / Tags        Example                     Numbers found
<CAMPAIGN>        (i) PMIP, (ii) PANASH       Less relevant here than to
                                              ACP in general
<MODEL>           (i) REVEALS model, (ii)
                  ECBILT-CLIO intermediate
                  complexity climate model

<acronymPhrase>   (i) Modern Analogues        May pick up campaigns /
                  Technique ( MAT )           models where phrases
                  (ii) REVEALS ( Regional     above have failed.
                  Estimates of VEgetation
                  Abundance from Large
                  Sites )
<QUANTITY>        (i) 10 ppm (ii) 0.53 mm/day units dictionary could be
                                              more extensive
<MOLECULE>        (i) CO2, (ii) calcium       Many false positives as
                  carbonate                   what chemical tagger was
                                              designed for.
Chemical Tagger
                        Rendering of PALEOTIME
XML rendered with CSS      http://www.clim-past.net/2/205/2006/cp-2-205-2006.html




   16
GMD Journal Article
http://www.geosci-model-dev.net/4/1035/2011/gmd-4-1035-2011.html
CIM Document Viewer

   The acronym / name
MIROC4 is not explained – so
  reproduce sentence




                                      The description is just
                                    first few sentences after
                                           appearance of
                                             <MODEL>
CIM Document Viewer
     http://zonda5.badc.rl.ac.uk/site/public/tools/viewer




                               Makes use of existing
                                chemical tagging.
CIM Document Viewer
      http://zonda5.badc.rl.ac.uk/site/public/repository




                            Number of spectral
                             intervals were not
                            found! No place for
                                “not found”
Climate Models –
           General Constraints
• Unless paper is specifically about the model we
  are unlikely to find much MEAFOR type CV in
  the abstract
  – Look at experimental / methods sections
     • model name
     • model resolution
     • model schemes
  – Problem with PDF -> text.
  – Only certain elements easy to extract (e.g.
    resolution)
Refine ACPgeo Output

• Add a few more phrases e.g. specific phrases to
  look for model resolution, using expected
  vocabulary (e.g. grid, levels, resolution, directions
  etc).
• Refine output of ACPgeo to look for specific CV
  terms.
• Try to put CV terms in context:
     – Look for proximity of CV terms to other phrases:
         • Within phrase; within sentence or within a number of
           sentences


22
<MOLECULE>

– Chemical Tagger was designed to be used primarily with
  chemistry.
    • Unsurprising that there is a tendency to to assign acronyms;
         hyphenated words; and words with common chemical
         endings as molecules.
     –   It is possible to filter some of these wrongly assigned words by
         probability.
– There are still conflicts e.g. C3 and C4 could refer to
  hydrocarbons or plants.
    • Extensive testing and modifying / machine learning might
         reduce these.
– Better to get right first time if important!
Harvested Metadata vs
                          Documented Metadata
                                                            http://proj.badc.rl.ac.uk/pimms/blog/
CIM was designed to be populated by modellers with the (probably over simplistic) assumption
that if something isn't in the CIM document then it either isn't in the model or isn't relevant. But
CIM documents created by harvesting information from papers will naturally not cover
everything about a model, so missing info doesn't mean that those things weren't
included/aren't relevant.

PIMMS will need to describe different protocols for interpreting CIM documents depending on
how they were created, but we will also want to ensure that that CIM accounts for missing data
more intelligently in future releases.

In essence the difference between journal article descriptions and metadata documentation is
Narrative. Journal articles need to tell a story so the information they include is only that which
is relevant to the narrative, whereas metadata documentation is an attempt to include as much
as possible across the board. The general nature of metadata documentation is probably why it
has historically been perceived as such a boring task to complete.

PIMMS will make metadata documentation more fun by bringing back the Narrative, once
PIMMS is established at an institution users will be able to create generalised metadata having
only described those things that are relevant to the story of their experiment.

More Related Content

Similar to Cpascoe pimms or2012_

A Case Study Of A Reusable Component Collection
A Case Study Of A Reusable Component CollectionA Case Study Of A Reusable Component Collection
A Case Study Of A Reusable Component CollectionJennifer Strong
 
STATICMOCK : A Mock Object Framework for Compiled Languages
STATICMOCK : A Mock Object Framework for Compiled Languages STATICMOCK : A Mock Object Framework for Compiled Languages
STATICMOCK : A Mock Object Framework for Compiled Languages ijseajournal
 
Applying the Scientific Method to Simulation Experiments
Applying the Scientific Method to Simulation ExperimentsApplying the Scientific Method to Simulation Experiments
Applying the Scientific Method to Simulation ExperimentsFrank Bergmann
 
Ontologies in Physical Science
Ontologies in Physical ScienceOntologies in Physical Science
Ontologies in Physical Sciencepetermurrayrust
 
1 Project 2 Introduction - the SeaPort Project seri.docx
1  Project 2 Introduction - the SeaPort Project seri.docx1  Project 2 Introduction - the SeaPort Project seri.docx
1 Project 2 Introduction - the SeaPort Project seri.docxhoney725342
 
"Running Open-Source LLM models on Kubernetes", Volodymyr Tsap
"Running Open-Source LLM models on Kubernetes",  Volodymyr Tsap"Running Open-Source LLM models on Kubernetes",  Volodymyr Tsap
"Running Open-Source LLM models on Kubernetes", Volodymyr TsapFwdays
 
HiPEAC 2019 Tutorial - Maestro RTOS
HiPEAC 2019 Tutorial - Maestro RTOSHiPEAC 2019 Tutorial - Maestro RTOS
HiPEAC 2019 Tutorial - Maestro RTOSTulipp. Eu
 
Practical OOP In Java
Practical OOP In JavaPractical OOP In Java
Practical OOP In Javawiradikusuma
 
Software tools for high-throughput materials data generation and data mining
Software tools for high-throughput materials data generation and data miningSoftware tools for high-throughput materials data generation and data mining
Software tools for high-throughput materials data generation and data miningAnubhav Jain
 
A High-Level Programming Approach for using FPGAs in HPC using Functional Des...
A High-Level Programming Approach for using FPGAs in HPC using Functional Des...A High-Level Programming Approach for using FPGAs in HPC using Functional Des...
A High-Level Programming Approach for using FPGAs in HPC using Functional Des...waqarnabi
 
Software development effort reduction with Co-op
Software development effort reduction with Co-opSoftware development effort reduction with Co-op
Software development effort reduction with Co-oplbergmans
 
Design patterns through refactoring
Design patterns through refactoringDesign patterns through refactoring
Design patterns through refactoringGanesh Samarthyam
 
Model-Driven Cloud Data Storage
Model-Driven Cloud Data StorageModel-Driven Cloud Data Storage
Model-Driven Cloud Data Storagejccastrejon
 
04 distance learning standards-scorm specification
04 distance learning standards-scorm specification04 distance learning standards-scorm specification
04 distance learning standards-scorm specification宥均 林
 
advancedzplmacroprogramming_081820.pptx
advancedzplmacroprogramming_081820.pptxadvancedzplmacroprogramming_081820.pptx
advancedzplmacroprogramming_081820.pptxssuser6a1dbf
 
Determan SummerSim_submit_rev3
Determan SummerSim_submit_rev3Determan SummerSim_submit_rev3
Determan SummerSim_submit_rev3John Determan
 
OpenSAF Symposium_Python Bindings_9.21.11
OpenSAF Symposium_Python Bindings_9.21.11OpenSAF Symposium_Python Bindings_9.21.11
OpenSAF Symposium_Python Bindings_9.21.11OpenSAF Foundation
 

Similar to Cpascoe pimms or2012_ (20)

A Case Study Of A Reusable Component Collection
A Case Study Of A Reusable Component CollectionA Case Study Of A Reusable Component Collection
A Case Study Of A Reusable Component Collection
 
STATICMOCK : A Mock Object Framework for Compiled Languages
STATICMOCK : A Mock Object Framework for Compiled Languages STATICMOCK : A Mock Object Framework for Compiled Languages
STATICMOCK : A Mock Object Framework for Compiled Languages
 
ORDBMS.pptx
ORDBMS.pptxORDBMS.pptx
ORDBMS.pptx
 
Applying the Scientific Method to Simulation Experiments
Applying the Scientific Method to Simulation ExperimentsApplying the Scientific Method to Simulation Experiments
Applying the Scientific Method to Simulation Experiments
 
Ontologies in Physical Science
Ontologies in Physical ScienceOntologies in Physical Science
Ontologies in Physical Science
 
1 Project 2 Introduction - the SeaPort Project seri.docx
1  Project 2 Introduction - the SeaPort Project seri.docx1  Project 2 Introduction - the SeaPort Project seri.docx
1 Project 2 Introduction - the SeaPort Project seri.docx
 
"Running Open-Source LLM models on Kubernetes", Volodymyr Tsap
"Running Open-Source LLM models on Kubernetes",  Volodymyr Tsap"Running Open-Source LLM models on Kubernetes",  Volodymyr Tsap
"Running Open-Source LLM models on Kubernetes", Volodymyr Tsap
 
Memory models in c#
Memory models in c#Memory models in c#
Memory models in c#
 
HiPEAC 2019 Tutorial - Maestro RTOS
HiPEAC 2019 Tutorial - Maestro RTOSHiPEAC 2019 Tutorial - Maestro RTOS
HiPEAC 2019 Tutorial - Maestro RTOS
 
Practical OOP In Java
Practical OOP In JavaPractical OOP In Java
Practical OOP In Java
 
Software tools for high-throughput materials data generation and data mining
Software tools for high-throughput materials data generation and data miningSoftware tools for high-throughput materials data generation and data mining
Software tools for high-throughput materials data generation and data mining
 
A High-Level Programming Approach for using FPGAs in HPC using Functional Des...
A High-Level Programming Approach for using FPGAs in HPC using Functional Des...A High-Level Programming Approach for using FPGAs in HPC using Functional Des...
A High-Level Programming Approach for using FPGAs in HPC using Functional Des...
 
Software development effort reduction with Co-op
Software development effort reduction with Co-opSoftware development effort reduction with Co-op
Software development effort reduction with Co-op
 
Design patterns through refactoring
Design patterns through refactoringDesign patterns through refactoring
Design patterns through refactoring
 
3rd presentation
3rd presentation3rd presentation
3rd presentation
 
Model-Driven Cloud Data Storage
Model-Driven Cloud Data StorageModel-Driven Cloud Data Storage
Model-Driven Cloud Data Storage
 
04 distance learning standards-scorm specification
04 distance learning standards-scorm specification04 distance learning standards-scorm specification
04 distance learning standards-scorm specification
 
advancedzplmacroprogramming_081820.pptx
advancedzplmacroprogramming_081820.pptxadvancedzplmacroprogramming_081820.pptx
advancedzplmacroprogramming_081820.pptx
 
Determan SummerSim_submit_rev3
Determan SummerSim_submit_rev3Determan SummerSim_submit_rev3
Determan SummerSim_submit_rev3
 
OpenSAF Symposium_Python Bindings_9.21.11
OpenSAF Symposium_Python Bindings_9.21.11OpenSAF Symposium_Python Bindings_9.21.11
OpenSAF Symposium_Python Bindings_9.21.11
 

Recently uploaded

Call Girls Jayanagar Just Call 👗 9155563397 👗 Top Class Call Girl Service Ban...
Call Girls Jayanagar Just Call 👗 9155563397 👗 Top Class Call Girl Service Ban...Call Girls Jayanagar Just Call 👗 9155563397 👗 Top Class Call Girl Service Ban...
Call Girls Jayanagar Just Call 👗 9155563397 👗 Top Class Call Girl Service Ban...only4webmaster01
 
Call Girls Btm Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Btm Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...Call Girls Btm Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Btm Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...amitlee9823
 
Guide to a Winning Interview May 2024 for MCWN
Guide to a Winning Interview May 2024 for MCWNGuide to a Winning Interview May 2024 for MCWN
Guide to a Winning Interview May 2024 for MCWNBruce Bennett
 
➥🔝 7737669865 🔝▻ Satara Call-girls in Women Seeking Men 🔝Satara🔝 Escorts S...
➥🔝 7737669865 🔝▻ Satara Call-girls in Women Seeking Men  🔝Satara🔝   Escorts S...➥🔝 7737669865 🔝▻ Satara Call-girls in Women Seeking Men  🔝Satara🔝   Escorts S...
➥🔝 7737669865 🔝▻ Satara Call-girls in Women Seeking Men 🔝Satara🔝 Escorts S...amitlee9823
 
Call Girls In Sarjapur Road ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Sarjapur Road ☎ 7737669865 🥵 Book Your One night StandCall Girls In Sarjapur Road ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Sarjapur Road ☎ 7737669865 🥵 Book Your One night Standamitlee9823
 
Personal Brand Exploration - Fernando Negron
Personal Brand Exploration - Fernando NegronPersonal Brand Exploration - Fernando Negron
Personal Brand Exploration - Fernando Negronnegronf24
 
Resumes, Cover Letters, and Applying Online
Resumes, Cover Letters, and Applying OnlineResumes, Cover Letters, and Applying Online
Resumes, Cover Letters, and Applying OnlineBruce Bennett
 
➥🔝 7737669865 🔝▻ Tirupati Call-girls in Women Seeking Men 🔝Tirupati🔝 Escor...
➥🔝 7737669865 🔝▻ Tirupati Call-girls in Women Seeking Men  🔝Tirupati🔝   Escor...➥🔝 7737669865 🔝▻ Tirupati Call-girls in Women Seeking Men  🔝Tirupati🔝   Escor...
➥🔝 7737669865 🔝▻ Tirupati Call-girls in Women Seeking Men 🔝Tirupati🔝 Escor...amitlee9823
 
Call Girls Bidadi ☎ 7737669865☎ Book Your One night Stand (Bangalore)
Call Girls Bidadi ☎ 7737669865☎ Book Your One night Stand (Bangalore)Call Girls Bidadi ☎ 7737669865☎ Book Your One night Stand (Bangalore)
Call Girls Bidadi ☎ 7737669865☎ Book Your One night Stand (Bangalore)amitlee9823
 
Presentation for the country presentation
Presentation for the country presentationPresentation for the country presentation
Presentation for the country presentationjalal879
 
Chintamani Call Girls Service: ☎ 7737669865 ☎ High Profile Model Escorts | Ba...
Chintamani Call Girls Service: ☎ 7737669865 ☎ High Profile Model Escorts | Ba...Chintamani Call Girls Service: ☎ 7737669865 ☎ High Profile Model Escorts | Ba...
Chintamani Call Girls Service: ☎ 7737669865 ☎ High Profile Model Escorts | Ba...amitlee9823
 
Virgin Call Girls Delhi Service-oriented sexy call girls ☞ 9899900591 ☜ Rita ...
Virgin Call Girls Delhi Service-oriented sexy call girls ☞ 9899900591 ☜ Rita ...Virgin Call Girls Delhi Service-oriented sexy call girls ☞ 9899900591 ☜ Rita ...
Virgin Call Girls Delhi Service-oriented sexy call girls ☞ 9899900591 ☜ Rita ...poojakaurpk09
 
Nagavara Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore Es...
Nagavara Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore Es...Nagavara Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore Es...
Nagavara Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore Es...amitlee9823
 
➥🔝 7737669865 🔝▻ Pallavaram Call-girls in Women Seeking Men 🔝Pallavaram🔝 E...
➥🔝 7737669865 🔝▻ Pallavaram Call-girls in Women Seeking Men  🔝Pallavaram🔝   E...➥🔝 7737669865 🔝▻ Pallavaram Call-girls in Women Seeking Men  🔝Pallavaram🔝   E...
➥🔝 7737669865 🔝▻ Pallavaram Call-girls in Women Seeking Men 🔝Pallavaram🔝 E...amitlee9823
 
Call Girls In Chandapura ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Chandapura ☎ 7737669865 🥵 Book Your One night StandCall Girls In Chandapura ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Chandapura ☎ 7737669865 🥵 Book Your One night Standamitlee9823
 
Call Girls Hoodi Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Hoodi Just Call 👗 7737669865 👗 Top Class Call Girl Service BangaloreCall Girls Hoodi Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Hoodi Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangaloreamitlee9823
 
➥🔝 7737669865 🔝▻ Tumkur Call-girls in Women Seeking Men 🔝Tumkur🔝 Escorts S...
➥🔝 7737669865 🔝▻ Tumkur Call-girls in Women Seeking Men  🔝Tumkur🔝   Escorts S...➥🔝 7737669865 🔝▻ Tumkur Call-girls in Women Seeking Men  🔝Tumkur🔝   Escorts S...
➥🔝 7737669865 🔝▻ Tumkur Call-girls in Women Seeking Men 🔝Tumkur🔝 Escorts S...amitlee9823
 
➥🔝 7737669865 🔝▻ Bulandshahr Call-girls in Women Seeking Men 🔝Bulandshahr🔝 ...
➥🔝 7737669865 🔝▻ Bulandshahr Call-girls in Women Seeking Men  🔝Bulandshahr🔝  ...➥🔝 7737669865 🔝▻ Bulandshahr Call-girls in Women Seeking Men  🔝Bulandshahr🔝  ...
➥🔝 7737669865 🔝▻ Bulandshahr Call-girls in Women Seeking Men 🔝Bulandshahr🔝 ...amitlee9823
 
➥🔝 7737669865 🔝▻ bharuch Call-girls in Women Seeking Men 🔝bharuch🔝 Escorts...
➥🔝 7737669865 🔝▻ bharuch Call-girls in Women Seeking Men  🔝bharuch🔝   Escorts...➥🔝 7737669865 🔝▻ bharuch Call-girls in Women Seeking Men  🔝bharuch🔝   Escorts...
➥🔝 7737669865 🔝▻ bharuch Call-girls in Women Seeking Men 🔝bharuch🔝 Escorts...amitlee9823
 
Miletti Gabriela_Vision Plan for artist Jahzel.pdf
Miletti Gabriela_Vision Plan for artist Jahzel.pdfMiletti Gabriela_Vision Plan for artist Jahzel.pdf
Miletti Gabriela_Vision Plan for artist Jahzel.pdfGabrielaMiletti
 

Recently uploaded (20)

Call Girls Jayanagar Just Call 👗 9155563397 👗 Top Class Call Girl Service Ban...
Call Girls Jayanagar Just Call 👗 9155563397 👗 Top Class Call Girl Service Ban...Call Girls Jayanagar Just Call 👗 9155563397 👗 Top Class Call Girl Service Ban...
Call Girls Jayanagar Just Call 👗 9155563397 👗 Top Class Call Girl Service Ban...
 
Call Girls Btm Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Btm Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...Call Girls Btm Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Btm Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
 
Guide to a Winning Interview May 2024 for MCWN
Guide to a Winning Interview May 2024 for MCWNGuide to a Winning Interview May 2024 for MCWN
Guide to a Winning Interview May 2024 for MCWN
 
➥🔝 7737669865 🔝▻ Satara Call-girls in Women Seeking Men 🔝Satara🔝 Escorts S...
➥🔝 7737669865 🔝▻ Satara Call-girls in Women Seeking Men  🔝Satara🔝   Escorts S...➥🔝 7737669865 🔝▻ Satara Call-girls in Women Seeking Men  🔝Satara🔝   Escorts S...
➥🔝 7737669865 🔝▻ Satara Call-girls in Women Seeking Men 🔝Satara🔝 Escorts S...
 
Call Girls In Sarjapur Road ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Sarjapur Road ☎ 7737669865 🥵 Book Your One night StandCall Girls In Sarjapur Road ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Sarjapur Road ☎ 7737669865 🥵 Book Your One night Stand
 
Personal Brand Exploration - Fernando Negron
Personal Brand Exploration - Fernando NegronPersonal Brand Exploration - Fernando Negron
Personal Brand Exploration - Fernando Negron
 
Resumes, Cover Letters, and Applying Online
Resumes, Cover Letters, and Applying OnlineResumes, Cover Letters, and Applying Online
Resumes, Cover Letters, and Applying Online
 
➥🔝 7737669865 🔝▻ Tirupati Call-girls in Women Seeking Men 🔝Tirupati🔝 Escor...
➥🔝 7737669865 🔝▻ Tirupati Call-girls in Women Seeking Men  🔝Tirupati🔝   Escor...➥🔝 7737669865 🔝▻ Tirupati Call-girls in Women Seeking Men  🔝Tirupati🔝   Escor...
➥🔝 7737669865 🔝▻ Tirupati Call-girls in Women Seeking Men 🔝Tirupati🔝 Escor...
 
Call Girls Bidadi ☎ 7737669865☎ Book Your One night Stand (Bangalore)
Call Girls Bidadi ☎ 7737669865☎ Book Your One night Stand (Bangalore)Call Girls Bidadi ☎ 7737669865☎ Book Your One night Stand (Bangalore)
Call Girls Bidadi ☎ 7737669865☎ Book Your One night Stand (Bangalore)
 
Presentation for the country presentation
Presentation for the country presentationPresentation for the country presentation
Presentation for the country presentation
 
Chintamani Call Girls Service: ☎ 7737669865 ☎ High Profile Model Escorts | Ba...
Chintamani Call Girls Service: ☎ 7737669865 ☎ High Profile Model Escorts | Ba...Chintamani Call Girls Service: ☎ 7737669865 ☎ High Profile Model Escorts | Ba...
Chintamani Call Girls Service: ☎ 7737669865 ☎ High Profile Model Escorts | Ba...
 
Virgin Call Girls Delhi Service-oriented sexy call girls ☞ 9899900591 ☜ Rita ...
Virgin Call Girls Delhi Service-oriented sexy call girls ☞ 9899900591 ☜ Rita ...Virgin Call Girls Delhi Service-oriented sexy call girls ☞ 9899900591 ☜ Rita ...
Virgin Call Girls Delhi Service-oriented sexy call girls ☞ 9899900591 ☜ Rita ...
 
Nagavara Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore Es...
Nagavara Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore Es...Nagavara Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore Es...
Nagavara Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore Es...
 
➥🔝 7737669865 🔝▻ Pallavaram Call-girls in Women Seeking Men 🔝Pallavaram🔝 E...
➥🔝 7737669865 🔝▻ Pallavaram Call-girls in Women Seeking Men  🔝Pallavaram🔝   E...➥🔝 7737669865 🔝▻ Pallavaram Call-girls in Women Seeking Men  🔝Pallavaram🔝   E...
➥🔝 7737669865 🔝▻ Pallavaram Call-girls in Women Seeking Men 🔝Pallavaram🔝 E...
 
Call Girls In Chandapura ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Chandapura ☎ 7737669865 🥵 Book Your One night StandCall Girls In Chandapura ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Chandapura ☎ 7737669865 🥵 Book Your One night Stand
 
Call Girls Hoodi Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Hoodi Just Call 👗 7737669865 👗 Top Class Call Girl Service BangaloreCall Girls Hoodi Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Hoodi Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
 
➥🔝 7737669865 🔝▻ Tumkur Call-girls in Women Seeking Men 🔝Tumkur🔝 Escorts S...
➥🔝 7737669865 🔝▻ Tumkur Call-girls in Women Seeking Men  🔝Tumkur🔝   Escorts S...➥🔝 7737669865 🔝▻ Tumkur Call-girls in Women Seeking Men  🔝Tumkur🔝   Escorts S...
➥🔝 7737669865 🔝▻ Tumkur Call-girls in Women Seeking Men 🔝Tumkur🔝 Escorts S...
 
➥🔝 7737669865 🔝▻ Bulandshahr Call-girls in Women Seeking Men 🔝Bulandshahr🔝 ...
➥🔝 7737669865 🔝▻ Bulandshahr Call-girls in Women Seeking Men  🔝Bulandshahr🔝  ...➥🔝 7737669865 🔝▻ Bulandshahr Call-girls in Women Seeking Men  🔝Bulandshahr🔝  ...
➥🔝 7737669865 🔝▻ Bulandshahr Call-girls in Women Seeking Men 🔝Bulandshahr🔝 ...
 
➥🔝 7737669865 🔝▻ bharuch Call-girls in Women Seeking Men 🔝bharuch🔝 Escorts...
➥🔝 7737669865 🔝▻ bharuch Call-girls in Women Seeking Men  🔝bharuch🔝   Escorts...➥🔝 7737669865 🔝▻ bharuch Call-girls in Women Seeking Men  🔝bharuch🔝   Escorts...
➥🔝 7737669865 🔝▻ bharuch Call-girls in Women Seeking Men 🔝bharuch🔝 Escorts...
 
Miletti Gabriela_Vision Plan for artist Jahzel.pdf
Miletti Gabriela_Vision Plan for artist Jahzel.pdfMiletti Gabriela_Vision Plan for artist Jahzel.pdf
Miletti Gabriela_Vision Plan for artist Jahzel.pdf
 

Cpascoe pimms or2012_

  • 1. The PIMMS project and Natural Language Processing for Climate Science Extending the Chemical Tagger natural language processing tool with climate science controlled vocabularies Charlotte Pascoe, Hannah Barjat Peter Murray-Rust and Gerry Devine June 9th 2012, Open Repositories 2012
  • 2. Portable Infrastructure for the Metafor Metadata System http://proj.badc.rl.ac.uk/pimms/
  • 3. Common Information Model Data Software We can talk about DataObjects collected together in any number of ways, stored in a particular medium Shared ISO We reuse various ISO classes Quality We can talk about Some concepts hierarchical are shared ModelComponents with We can record the ModelProperties, som quality of things A particular Activity uses e of which can be a particular coupled together Grids Activity SoftwareComponent We can talk about Simulations run in support of Experiments. Experiments consist of Requirements; We can define a GridSpec Simulations conform to or some other geometry Requirements
  • 5. Mind Maps Mind maps are used to capture information requirements from domain experts and build a controlled vocabulary.
  • 6. Python Parser The python parser processes the XML files generated by the mind maps <component name="Radiation"> <definition status="missing">Definition of component type Radiation required</definition> <parameter name="RadiativeTimeStep" choice="keyboard"> <definition status="missing">Definition of property name RadiativeTimeStep required</definition> <value format="numerical" name="time step" units="time units"/> </parameter> <parametergroup name="Longwave"> <parameter name="SchemeType" choice="XOR"> <definition status="missing">Definition of property name SchemeType required</definition> <value name="Wide-band model"/> <value name="Wide-band (Morcrette)"/> <value name="K-correlated"/> <value name="K-correlated (RRTM)"/> <value name="other"/> </parameter> <parameter name="Method" choice="XOR"> <definition status="missing">Definition of property name Method required</definition> <value name="Two stream"/> <value name="Layer interaction"/> <value name="other"/> </parameter> <parameter name="NumberOfSpectralIntervals" choice="keyboard"> <definition status="missing">Definition of property name NumberOfSpectralIntervals required</definition> <value format="numerical" name=""/> </parameter> </parametergroup>
  • 7. Web Forms Web forms generate content in CIM xml format http://q.cmip5.ceda.ac.uk/
  • 9. Chemical Tagger http://chemicaltagger.ch.cam.ac.uk/ ChemicalTagger is an open-source tool that uses OSCAR4 and NLP techniques for tagging and parsing experimental sections in the chemistry literature.
  • 10. Chemical Tagger https://bitbucket.org/wwmm/chemicaltagger & https://bitbucket.org/wwmm/acpgeo • Java project Developed by the Peter Murray-Rust group, Cambridge. Online demo: http://chemicaltagger.ch.cam.ac.uk/ • Adapted for use with ACP Abstracts (Lezan Hawizy and Hannah Barjat). – Modification by use of dictionaries and changes to grammar. – First use case outside of laboratory chemistry. – Still with a significant chemistry component. – Wider physical science. • Open Source NLP tool for processing • Open Source NLP tool for processing chemical text chemical text • Combines Chemical Entity Recognitions (OSCAR) with NLP • techniquesChemical Entity Recognitions Combines • Extendible and Reconfigurable Taggers and Parsers (OSCAR) with NLP techniques • Extendible and Reconfigurable Taggers and Parsers generated using ANTLR (ANother Tool for Language Recognition)
  • 11. Chemical Tagger & PIMMS • To extend chemical tagger to be more suited to climate modelling. – Specifically: • Palaeoclimate modelling and how process of text mining might differ from development of a controlled vocabulary. • High-lighting of text for comparison with CIM documents. • Initially only using XML Abstracts e.g. from EGU’s Geoscientific Model Development and Climate of the Past. – Brief look at PDF to Text. 11
  • 12. Paleoclimate Language • Time periods and climatic events – Includes named Ages, Epochs, Eras etc. [Including all those in a mind map produced for the PIMMS project at Bristol]. – context of proper nouns e.g. with words such as ‘period’, ‘era’, ‘epoch’ – Numbers with appropriate units e.g. Mya, yr BP – Likely date numbers e.g. 1750 AD. – Acronyms – known’LGM’ e.g. [in context ACRONYMS have not been investigated] – Related adjectives e.g. seasonal, decadal, glacial, interglacial, stadial, interstadial, maximum, minimum where used as proper nouns. • Palaeoclimate Models – Can guess model names from context • e.g. proper noun or acronym followed by model • e.g. reconstruction / simulation with XXX – Can develop/use glossary of model names. • Palaeoclimate Acronyms – Time periods and models. – Theories, techniques, physical and chemical parameters? – Can develop/use glossary of acronyms – problem area: often not unique even within subject.
  • 13. Natural Language vs CV • Quick compilation of proper nouns used for time periods (primarily from Wikipedia) contains 185 words. – Use of these words together with adjective/ dates / details of events would produce a very large number of phrases. • Controlled Vocabulary from Bristol contains around 24 of these. • Use of these words together with other proper nouns / adjectives / dates gives only 44 phrases within the Bristol CV. • Map natural language to CV? – Straightforward for most dates? – Understanding of context important • Does context refer to main emphasis of paper? 13 • Is an event/time period described unambiguously? e.g. “Last Glacial
  • 14. Preliminary Results Preliminary Results (from 68 files) Tag / Tags Example Comment <timePhrase> (i) Holocene, (ii) 8 kyr BP <PALAEOTIME> (iii) <referencePhrase> (i) (Otto et al. 2009b) Important to distinguish (ii) Giraudeau et al. 2000 year pattern from dates relevant to the study. <locationPhrase> (i) around Lake Kotokel, False positives: e.g. “from (ii) over Tibetan Plateau Sphagnum” <LOCATION> (i) 52°47´ N, 108°07´ E, Cannot currently do 458 m a.s.l (ii) London. degrees from pdf-text. <TempPhrase> „warm‟ and „cool‟: verbs in synthetic chem unlike env. chem.
  • 15. Tag / Tags Example Numbers found <CAMPAIGN> (i) PMIP, (ii) PANASH Less relevant here than to ACP in general <MODEL> (i) REVEALS model, (ii) ECBILT-CLIO intermediate complexity climate model <acronymPhrase> (i) Modern Analogues May pick up campaigns / Technique ( MAT ) models where phrases (ii) REVEALS ( Regional above have failed. Estimates of VEgetation Abundance from Large Sites ) <QUANTITY> (i) 10 ppm (ii) 0.53 mm/day units dictionary could be more extensive <MOLECULE> (i) CO2, (ii) calcium Many false positives as carbonate what chemical tagger was designed for.
  • 16. Chemical Tagger Rendering of PALEOTIME XML rendered with CSS http://www.clim-past.net/2/205/2006/cp-2-205-2006.html 16
  • 18. CIM Document Viewer The acronym / name MIROC4 is not explained – so reproduce sentence The description is just first few sentences after appearance of <MODEL>
  • 19. CIM Document Viewer http://zonda5.badc.rl.ac.uk/site/public/tools/viewer Makes use of existing chemical tagging.
  • 20. CIM Document Viewer http://zonda5.badc.rl.ac.uk/site/public/repository Number of spectral intervals were not found! No place for “not found”
  • 21. Climate Models – General Constraints • Unless paper is specifically about the model we are unlikely to find much MEAFOR type CV in the abstract – Look at experimental / methods sections • model name • model resolution • model schemes – Problem with PDF -> text. – Only certain elements easy to extract (e.g. resolution)
  • 22. Refine ACPgeo Output • Add a few more phrases e.g. specific phrases to look for model resolution, using expected vocabulary (e.g. grid, levels, resolution, directions etc). • Refine output of ACPgeo to look for specific CV terms. • Try to put CV terms in context: – Look for proximity of CV terms to other phrases: • Within phrase; within sentence or within a number of sentences 22
  • 23. <MOLECULE> – Chemical Tagger was designed to be used primarily with chemistry. • Unsurprising that there is a tendency to to assign acronyms; hyphenated words; and words with common chemical endings as molecules. – It is possible to filter some of these wrongly assigned words by probability. – There are still conflicts e.g. C3 and C4 could refer to hydrocarbons or plants. • Extensive testing and modifying / machine learning might reduce these. – Better to get right first time if important!
  • 24. Harvested Metadata vs Documented Metadata http://proj.badc.rl.ac.uk/pimms/blog/ CIM was designed to be populated by modellers with the (probably over simplistic) assumption that if something isn't in the CIM document then it either isn't in the model or isn't relevant. But CIM documents created by harvesting information from papers will naturally not cover everything about a model, so missing info doesn't mean that those things weren't included/aren't relevant. PIMMS will need to describe different protocols for interpreting CIM documents depending on how they were created, but we will also want to ensure that that CIM accounts for missing data more intelligently in future releases. In essence the difference between journal article descriptions and metadata documentation is Narrative. Journal articles need to tell a story so the information they include is only that which is relevant to the narrative, whereas metadata documentation is an attempt to include as much as possible across the board. The general nature of metadata documentation is probably why it has historically been perceived as such a boring task to complete. PIMMS will make metadata documentation more fun by bringing back the Narrative, once PIMMS is established at an institution users will be able to create generalised metadata having only described those things that are relevant to the story of their experiment.

Editor's Notes

  1. skip