SlideShare a Scribd company logo
1 of 30
Download to read offline
Using MongoDB for
Materials Discovery
   Michael Kocher and Dan Gunter
   Lawrence Berkeley National Lab
Energy Mission at LBNL
•   Li-ion Batteries

•   Photovoltaic (Solar Cells)

•   Thermoelectrics

•   Biofuels

•   New Computational Tools

•   Cutting edge Spectroscopic Tools (Advanced Light Source)

               http://carboncycle2.lbl.gov/
Current Material Design
    model is Slow


18 Years... from the average
new materials discovery to
commercialization


  Bringing New Materials to the Market: Eagar, T.W.
        Technology Review Feb 1995, 98, 42.
Materials Genome Initiative:
  A Renaissance of American Manufacturing

      “To help businesses discover, develop, and deploy new
     materials twice as fast, we're launching what we call the
       Materials Genome Initiative. The invention of silicon
   circuits and lithium-ion batteries made computers and iPods
        and iPads possible -- but it took years to get those
     technologies from the drawing board to the marketplace.

          We can do it faster.”
     - President Obama at Carnegie Mellon
              University 6/24/2011
What is a Material?
NaCl   Silicon
LiCoO2
          Li

         O

         Co
What can we Compute using
  quantum mechanics?
                                volume
                                 density
                              total energy
         +
                           formation energy
                                metallic?
                                  etc...



     No empirical parameters!
MaterialsProject.org
 ‘The Google of Material Science Data”




  +




  MIT and LBNL collaboration
Inverting the Problem
Detailed Properties
Machine Learning
                  How often can you
                                                   Structure 1
                 substitute Mg for Ca?
                                                   Structure 2
                                 (new materials)
                                                   Structure 3
                                                   Structure 4
materials.bson       Learning                      Structure 5
                     Algorithm                     Structure 6
                     What about
                     Na, V, P, O?


         Prof. Gerbrand Ceder (DOI: 10.1103/PhysRevLett.91.135503)
Materials Project:
 A Play in Three Acts

I.Data generation using HTC
II. Data storage
III.Data analysis/logging
Act I: Managing
       Calculations
• Centralized distributed model is the only
  way to go
• Hub is at LBNL
• Store the state in db
• Overview of running many MPI jobs at
  many different HP centers
MasterQueue  create a new
                    engine, add
                      to queue



                    pull crystal
       builder.x             master_queue.bson
      ‘The Brain’



        manager.x   manager.x      manager.x   manager.x   manager.x


HPC

        Franklin Hopper            Carver         lr1        lr2

                 NERSC                          Lawrencium
                (Oakland)                        (Berkeley)
Centralized Logging
Example                                     MongoDB
                                                                 and Management


manager.x   manager.x   manager.x   manager.x   manager.x   manager.x   manager.x   manager.x




  O1        Cathode Hopper Franklin Carver                     lr1        lr2        DLX
       MIT                  NERSC (Oakland)                          LBNL           Kentucky


 query = {‘elements’: {‘$all’: [“Li”, “O”], ‘nelectrons’ :{“$lte: 200}}
Act II :
Core Data storage
Very Complex Documents
Powerful Querying
Every crystal that has (Li or Na or K), (Mn), (O or S or F or Si)
plus one other element except (Zn or Ni or Fe or Cu or Co)

{
        "lattice.volume" : { "$lt" : 500 },
        "elements" : {"$all" : ['Mn'],"$size" : 4, “$nin”:['Zn','Ni','Fe','Cu','Co']},
        "atoms" : { "$elemMatch" : { ‘oxidation_state’ : 3, ‘symbol’:’Mn’} },
        "$where" : "match_all(
           this.element_names,
              ['Li', 'Na', 'K'],
              ['Mn'],
              ['O', 'S', 'F', 'Si'])"
    }
pre-MongoDB :(
((SELECT structure.structureid FROM structure NATURAL INNER JOIN
database NATURAL INNER JOIN databaseentry WHERE structureid IN
((select structure.structureid from structure NATURAL INNER JOIN
elemententry where elemententry.symbol='Li' INTERSECT select
structure.structureid from structure NATURAL INNER JOIN elemententry
where elemententry.symbol='O') INTERSECT select structure.structureid
from structure NATURAL INNER JOIN database NATURAL INNER JOIN
databaseentry where database.title='ICSD')) EXCEPT (SELECT
structure.structureid FROM structure where structure.entryid IN
(select duplicateentry.entryid from duplicateentry))) EXCEPT (SELECT
structure.structureid FROM structure where structure.entryid IN
(select entryid from removals))




Search for materials with Li and O,
       excluding duplicates
Map/Reduce
     Calculation 12
     Calculation 13 ✓
     Calculation 14
     Calculation 15

             MR


tasks.bson        materials.bson
Every App uses MongoDB


                 structure_predictors.bson
                 candidate_materials.bson
                 diffraction_patterns.bson




 by G. Hautier
Structure Predictor
Diffraction Pattern
Act III:
Analytics and Logging
Rich Error Analysis




    Experimental   Calculated
Integrated logging just
     makes sense
• Semi-structured data easily stored
• Can correlate with all other data
• Automation Layer: Failed tasks
• Web/App Layer
Conclusions
• MongoDB is a very versatile tool
• Used in several different cases
• Elegant query syntax
• Very useful for scientific data storage
• A lot of exciting future ideas
Acknowledgements
Thanks!

MaterialsProject.org

More Related Content

What's hot

Density functional theory calculations and data mining for new thermoelectric...
Density functional theory calculations and data mining for new thermoelectric...Density functional theory calculations and data mining for new thermoelectric...
Density functional theory calculations and data mining for new thermoelectric...Anubhav Jain
 
NANO266 - Lecture 12 - High-throughput computational materials design
NANO266 - Lecture 12 - High-throughput computational materials designNANO266 - Lecture 12 - High-throughput computational materials design
NANO266 - Lecture 12 - High-throughput computational materials designUniversity of California, San Diego
 
Discovering advanced materials for energy applications (with high-throughput ...
Discovering advanced materials for energy applications (with high-throughput ...Discovering advanced materials for energy applications (with high-throughput ...
Discovering advanced materials for energy applications (with high-throughput ...Anubhav Jain
 
Software tools for calculating materials properties in high-throughput (pymat...
Software tools for calculating materials properties in high-throughput (pymat...Software tools for calculating materials properties in high-throughput (pymat...
Software tools for calculating materials properties in high-throughput (pymat...Anubhav Jain
 
The Materials Project: Experiences from running a million computational scien...
The Materials Project: Experiences from running a million computational scien...The Materials Project: Experiences from running a million computational scien...
The Materials Project: Experiences from running a million computational scien...Anubhav Jain
 
High-throughput computation and machine learning methods applied to materials...
High-throughput computation and machine learning methods applied to materials...High-throughput computation and machine learning methods applied to materials...
High-throughput computation and machine learning methods applied to materials...Anubhav Jain
 
DuraMat Data Analytics
DuraMat Data AnalyticsDuraMat Data Analytics
DuraMat Data AnalyticsAnubhav Jain
 
Automated Machine Learning Applied to Diverse Materials Design Problems
Automated Machine Learning Applied to Diverse Materials Design ProblemsAutomated Machine Learning Applied to Diverse Materials Design Problems
Automated Machine Learning Applied to Diverse Materials Design ProblemsAnubhav Jain
 
Conducting and Enabling Data-Driven Research Through the Materials Project
Conducting and Enabling Data-Driven Research Through the Materials ProjectConducting and Enabling Data-Driven Research Through the Materials Project
Conducting and Enabling Data-Driven Research Through the Materials ProjectAnubhav Jain
 
Overview of accelerated materials design efforts in the Hacking Materials res...
Overview of accelerated materials design efforts in the Hacking Materials res...Overview of accelerated materials design efforts in the Hacking Materials res...
Overview of accelerated materials design efforts in the Hacking Materials res...Anubhav Jain
 
Combining density functional theory calculations, supercomputing, and data-dr...
Combining density functional theory calculations, supercomputing, and data-dr...Combining density functional theory calculations, supercomputing, and data-dr...
Combining density functional theory calculations, supercomputing, and data-dr...Anubhav Jain
 
Combining density functional theory calculations, supercomputing, and data-dr...
Combining density functional theory calculations, supercomputing, and data-dr...Combining density functional theory calculations, supercomputing, and data-dr...
Combining density functional theory calculations, supercomputing, and data-dr...Anubhav Jain
 
How might machine learning help advance solar PV research?
How might machine learning help advance solar PV research?How might machine learning help advance solar PV research?
How might machine learning help advance solar PV research?Anubhav Jain
 
Machine Learning Platform for Catalyst Design
Machine Learning Platform for Catalyst DesignMachine Learning Platform for Catalyst Design
Machine Learning Platform for Catalyst DesignAnubhav Jain
 
The Materials Project: overview and infrastructure
The Materials Project: overview and infrastructureThe Materials Project: overview and infrastructure
The Materials Project: overview and infrastructureAnubhav Jain
 
Software tools for data-driven research and their application to thermoelectr...
Software tools for data-driven research and their application to thermoelectr...Software tools for data-driven research and their application to thermoelectr...
Software tools for data-driven research and their application to thermoelectr...Anubhav Jain
 
Accelerated Materials Discovery & Characterization with Classical, Quantum an...
Accelerated Materials Discovery & Characterization with Classical, Quantum an...Accelerated Materials Discovery & Characterization with Classical, Quantum an...
Accelerated Materials Discovery & Characterization with Classical, Quantum an...KAMAL CHOUDHARY
 
Software tools to facilitate materials science research
Software tools to facilitate materials science researchSoftware tools to facilitate materials science research
Software tools to facilitate materials science researchAnubhav Jain
 
Introduction (Part I): High-throughput computation and machine learning appli...
Introduction (Part I): High-throughput computation and machine learning appli...Introduction (Part I): High-throughput computation and machine learning appli...
Introduction (Part I): High-throughput computation and machine learning appli...Anubhav Jain
 
The Materials Project: An Electronic Structure Database for Community-Based M...
The Materials Project: An Electronic Structure Database for Community-Based M...The Materials Project: An Electronic Structure Database for Community-Based M...
The Materials Project: An Electronic Structure Database for Community-Based M...Anubhav Jain
 

What's hot (20)

Density functional theory calculations and data mining for new thermoelectric...
Density functional theory calculations and data mining for new thermoelectric...Density functional theory calculations and data mining for new thermoelectric...
Density functional theory calculations and data mining for new thermoelectric...
 
NANO266 - Lecture 12 - High-throughput computational materials design
NANO266 - Lecture 12 - High-throughput computational materials designNANO266 - Lecture 12 - High-throughput computational materials design
NANO266 - Lecture 12 - High-throughput computational materials design
 
Discovering advanced materials for energy applications (with high-throughput ...
Discovering advanced materials for energy applications (with high-throughput ...Discovering advanced materials for energy applications (with high-throughput ...
Discovering advanced materials for energy applications (with high-throughput ...
 
Software tools for calculating materials properties in high-throughput (pymat...
Software tools for calculating materials properties in high-throughput (pymat...Software tools for calculating materials properties in high-throughput (pymat...
Software tools for calculating materials properties in high-throughput (pymat...
 
The Materials Project: Experiences from running a million computational scien...
The Materials Project: Experiences from running a million computational scien...The Materials Project: Experiences from running a million computational scien...
The Materials Project: Experiences from running a million computational scien...
 
High-throughput computation and machine learning methods applied to materials...
High-throughput computation and machine learning methods applied to materials...High-throughput computation and machine learning methods applied to materials...
High-throughput computation and machine learning methods applied to materials...
 
DuraMat Data Analytics
DuraMat Data AnalyticsDuraMat Data Analytics
DuraMat Data Analytics
 
Automated Machine Learning Applied to Diverse Materials Design Problems
Automated Machine Learning Applied to Diverse Materials Design ProblemsAutomated Machine Learning Applied to Diverse Materials Design Problems
Automated Machine Learning Applied to Diverse Materials Design Problems
 
Conducting and Enabling Data-Driven Research Through the Materials Project
Conducting and Enabling Data-Driven Research Through the Materials ProjectConducting and Enabling Data-Driven Research Through the Materials Project
Conducting and Enabling Data-Driven Research Through the Materials Project
 
Overview of accelerated materials design efforts in the Hacking Materials res...
Overview of accelerated materials design efforts in the Hacking Materials res...Overview of accelerated materials design efforts in the Hacking Materials res...
Overview of accelerated materials design efforts in the Hacking Materials res...
 
Combining density functional theory calculations, supercomputing, and data-dr...
Combining density functional theory calculations, supercomputing, and data-dr...Combining density functional theory calculations, supercomputing, and data-dr...
Combining density functional theory calculations, supercomputing, and data-dr...
 
Combining density functional theory calculations, supercomputing, and data-dr...
Combining density functional theory calculations, supercomputing, and data-dr...Combining density functional theory calculations, supercomputing, and data-dr...
Combining density functional theory calculations, supercomputing, and data-dr...
 
How might machine learning help advance solar PV research?
How might machine learning help advance solar PV research?How might machine learning help advance solar PV research?
How might machine learning help advance solar PV research?
 
Machine Learning Platform for Catalyst Design
Machine Learning Platform for Catalyst DesignMachine Learning Platform for Catalyst Design
Machine Learning Platform for Catalyst Design
 
The Materials Project: overview and infrastructure
The Materials Project: overview and infrastructureThe Materials Project: overview and infrastructure
The Materials Project: overview and infrastructure
 
Software tools for data-driven research and their application to thermoelectr...
Software tools for data-driven research and their application to thermoelectr...Software tools for data-driven research and their application to thermoelectr...
Software tools for data-driven research and their application to thermoelectr...
 
Accelerated Materials Discovery & Characterization with Classical, Quantum an...
Accelerated Materials Discovery & Characterization with Classical, Quantum an...Accelerated Materials Discovery & Characterization with Classical, Quantum an...
Accelerated Materials Discovery & Characterization with Classical, Quantum an...
 
Software tools to facilitate materials science research
Software tools to facilitate materials science researchSoftware tools to facilitate materials science research
Software tools to facilitate materials science research
 
Introduction (Part I): High-throughput computation and machine learning appli...
Introduction (Part I): High-throughput computation and machine learning appli...Introduction (Part I): High-throughput computation and machine learning appli...
Introduction (Part I): High-throughput computation and machine learning appli...
 
The Materials Project: An Electronic Structure Database for Community-Based M...
The Materials Project: An Electronic Structure Database for Community-Based M...The Materials Project: An Electronic Structure Database for Community-Based M...
The Materials Project: An Electronic Structure Database for Community-Based M...
 

Viewers also liked

酷米移動傳媒簡報(1209城邦)
酷米移動傳媒簡報(1209城邦)酷米移動傳媒簡報(1209城邦)
酷米移動傳媒簡報(1209城邦)idfamily chen
 
An overview of Mentor NW 200715 v2
An overview of Mentor NW 200715 v2An overview of Mentor NW 200715 v2
An overview of Mentor NW 200715 v2Mike Gerighty
 
D'Amato Shop | Olio extravergine, vino e conserve dal Salento
D'Amato Shop | Olio extravergine, vino e conserve dal SalentoD'Amato Shop | Olio extravergine, vino e conserve dal Salento
D'Amato Shop | Olio extravergine, vino e conserve dal SalentoD'Amato Shop
 
Internet i els drets fonamentals
Internet i els drets fonamentalsInternet i els drets fonamentals
Internet i els drets fonamentalsGrup8
 
Next Exit Tokyo: Mobile in Japan
Next Exit Tokyo: Mobile in JapanNext Exit Tokyo: Mobile in Japan
Next Exit Tokyo: Mobile in JapanChristopher Billich
 
First Niagara Targets May 18th For Completion Of Hsbc Branch Acquisition R2 ...
First Niagara Targets May 18th For Completion Of Hsbc Branch Acquisition  R2 ...First Niagara Targets May 18th For Completion Of Hsbc Branch Acquisition  R2 ...
First Niagara Targets May 18th For Completion Of Hsbc Branch Acquisition R2 ...JerryDorn
 
e-learning worldwide
e-learning worldwidee-learning worldwide
e-learning worldwidelaurenball
 
Food service supervisor perfomance appraisal 2
Food service supervisor perfomance appraisal 2Food service supervisor perfomance appraisal 2
Food service supervisor perfomance appraisal 2tonychoper4104
 

Viewers also liked (15)

酷米移動傳媒簡報(1209城邦)
酷米移動傳媒簡報(1209城邦)酷米移動傳媒簡報(1209城邦)
酷米移動傳媒簡報(1209城邦)
 
ICE 2009 H1N1
ICE 2009 H1N1ICE 2009 H1N1
ICE 2009 H1N1
 
Wordbench nagoya
Wordbench nagoyaWordbench nagoya
Wordbench nagoya
 
An overview of Mentor NW 200715 v2
An overview of Mentor NW 200715 v2An overview of Mentor NW 200715 v2
An overview of Mentor NW 200715 v2
 
Spinning Top
Spinning TopSpinning Top
Spinning Top
 
Sales Presentation
Sales PresentationSales Presentation
Sales Presentation
 
D'Amato Shop | Olio extravergine, vino e conserve dal Salento
D'Amato Shop | Olio extravergine, vino e conserve dal SalentoD'Amato Shop | Olio extravergine, vino e conserve dal Salento
D'Amato Shop | Olio extravergine, vino e conserve dal Salento
 
Internet i els drets fonamentals
Internet i els drets fonamentalsInternet i els drets fonamentals
Internet i els drets fonamentals
 
35419
3541935419
35419
 
Next Exit Tokyo: Mobile in Japan
Next Exit Tokyo: Mobile in JapanNext Exit Tokyo: Mobile in Japan
Next Exit Tokyo: Mobile in Japan
 
Hustopeče
HustopečeHustopeče
Hustopeče
 
First Niagara Targets May 18th For Completion Of Hsbc Branch Acquisition R2 ...
First Niagara Targets May 18th For Completion Of Hsbc Branch Acquisition  R2 ...First Niagara Targets May 18th For Completion Of Hsbc Branch Acquisition  R2 ...
First Niagara Targets May 18th For Completion Of Hsbc Branch Acquisition R2 ...
 
e-learning worldwide
e-learning worldwidee-learning worldwide
e-learning worldwide
 
eshgh
eshgheshgh
eshgh
 
Food service supervisor perfomance appraisal 2
Food service supervisor perfomance appraisal 2Food service supervisor perfomance appraisal 2
Food service supervisor perfomance appraisal 2
 

Similar to Using MongoDB for Materials Discovery

Data Mining to Discovery for Inorganic Solids: Software Tools and Applications
Data Mining to Discovery for Inorganic Solids: Software Tools and ApplicationsData Mining to Discovery for Inorganic Solids: Software Tools and Applications
Data Mining to Discovery for Inorganic Solids: Software Tools and Applicationsaimsnist
 
MongoDB San Francisco 2013: MongoDB for Collaborative Science presented by D...
MongoDB San Francisco 2013:  MongoDB for Collaborative Science presented by D...MongoDB San Francisco 2013:  MongoDB for Collaborative Science presented by D...
MongoDB San Francisco 2013: MongoDB for Collaborative Science presented by D...MongoDB
 
Discovering new functional materials for clean energy and beyond using high-t...
Discovering new functional materials for clean energy and beyond using high-t...Discovering new functional materials for clean energy and beyond using high-t...
Discovering new functional materials for clean energy and beyond using high-t...Anubhav Jain
 
Materials discovery through theory, computation, and machine learning
Materials discovery through theory, computation, and machine learningMaterials discovery through theory, computation, and machine learning
Materials discovery through theory, computation, and machine learningAnubhav Jain
 
Learning Systems for Science
Learning Systems for ScienceLearning Systems for Science
Learning Systems for ScienceIan Foster
 
Accelerated Materials Discovery Using Theory, Optimization, and Natural Langu...
Accelerated Materials Discovery Using Theory, Optimization, and Natural Langu...Accelerated Materials Discovery Using Theory, Optimization, and Natural Langu...
Accelerated Materials Discovery Using Theory, Optimization, and Natural Langu...Anubhav Jain
 
大強子計算網格與OSS
大強子計算網格與OSS大強子計算網格與OSS
大強子計算網格與OSSYuan CHAO
 
Computation and Knowledge
Computation and KnowledgeComputation and Knowledge
Computation and KnowledgeIan Foster
 
Open Source Tools for Materials Informatics
Open Source Tools for Materials InformaticsOpen Source Tools for Materials Informatics
Open Source Tools for Materials InformaticsAnubhav Jain
 
And Then There Are Algorithms
And Then There Are AlgorithmsAnd Then There Are Algorithms
And Then There Are AlgorithmsInfluxData
 
Progress Towards Leveraging Natural Language Processing for Collecting Experi...
Progress Towards Leveraging Natural Language Processing for Collecting Experi...Progress Towards Leveraging Natural Language Processing for Collecting Experi...
Progress Towards Leveraging Natural Language Processing for Collecting Experi...Anubhav Jain
 
Applications of Natural Language Processing to Materials Design
Applications of Natural Language Processing to Materials DesignApplications of Natural Language Processing to Materials Design
Applications of Natural Language Processing to Materials DesignAnubhav Jain
 
Available methods for predicting materials synthesizability using computation...
Available methods for predicting materials synthesizability using computation...Available methods for predicting materials synthesizability using computation...
Available methods for predicting materials synthesizability using computation...Anubhav Jain
 
Implementing a neural network potential for exascale molecular dynamics
Implementing a neural network potential for exascale molecular dynamicsImplementing a neural network potential for exascale molecular dynamics
Implementing a neural network potential for exascale molecular dynamicsPFHub PFHub
 
Introduction to active learning
Introduction to active learningIntroduction to active learning
Introduction to active learningAlexey Voropaev
 
Overview of the Exascale Additive Manufacturing Project
Overview of the Exascale Additive Manufacturing ProjectOverview of the Exascale Additive Manufacturing Project
Overview of the Exascale Additive Manufacturing Projectinside-BigData.com
 
Belak_ICME_June02015
Belak_ICME_June02015Belak_ICME_June02015
Belak_ICME_June02015Jim Belak
 
TMS workshop on machine learning in materials science: Intro to deep learning...
TMS workshop on machine learning in materials science: Intro to deep learning...TMS workshop on machine learning in materials science: Intro to deep learning...
TMS workshop on machine learning in materials science: Intro to deep learning...BrianDeCost
 
The Materials Project: A Community Data Resource for Accelerating New Materia...
The Materials Project: A Community Data Resource for Accelerating New Materia...The Materials Project: A Community Data Resource for Accelerating New Materia...
The Materials Project: A Community Data Resource for Accelerating New Materia...Anubhav Jain
 

Similar to Using MongoDB for Materials Discovery (20)

Data Mining to Discovery for Inorganic Solids: Software Tools and Applications
Data Mining to Discovery for Inorganic Solids: Software Tools and ApplicationsData Mining to Discovery for Inorganic Solids: Software Tools and Applications
Data Mining to Discovery for Inorganic Solids: Software Tools and Applications
 
MongoDB San Francisco 2013: MongoDB for Collaborative Science presented by D...
MongoDB San Francisco 2013:  MongoDB for Collaborative Science presented by D...MongoDB San Francisco 2013:  MongoDB for Collaborative Science presented by D...
MongoDB San Francisco 2013: MongoDB for Collaborative Science presented by D...
 
Discovering new functional materials for clean energy and beyond using high-t...
Discovering new functional materials for clean energy and beyond using high-t...Discovering new functional materials for clean energy and beyond using high-t...
Discovering new functional materials for clean energy and beyond using high-t...
 
Materials discovery through theory, computation, and machine learning
Materials discovery through theory, computation, and machine learningMaterials discovery through theory, computation, and machine learning
Materials discovery through theory, computation, and machine learning
 
Learning Systems for Science
Learning Systems for ScienceLearning Systems for Science
Learning Systems for Science
 
Accelerated Materials Discovery Using Theory, Optimization, and Natural Langu...
Accelerated Materials Discovery Using Theory, Optimization, and Natural Langu...Accelerated Materials Discovery Using Theory, Optimization, and Natural Langu...
Accelerated Materials Discovery Using Theory, Optimization, and Natural Langu...
 
大強子計算網格與OSS
大強子計算網格與OSS大強子計算網格與OSS
大強子計算網格與OSS
 
Computation and Knowledge
Computation and KnowledgeComputation and Knowledge
Computation and Knowledge
 
Open Source Tools for Materials Informatics
Open Source Tools for Materials InformaticsOpen Source Tools for Materials Informatics
Open Source Tools for Materials Informatics
 
ICME Workshop Jul 2014 - The Materials Project
ICME Workshop Jul 2014 - The Materials ProjectICME Workshop Jul 2014 - The Materials Project
ICME Workshop Jul 2014 - The Materials Project
 
And Then There Are Algorithms
And Then There Are AlgorithmsAnd Then There Are Algorithms
And Then There Are Algorithms
 
Progress Towards Leveraging Natural Language Processing for Collecting Experi...
Progress Towards Leveraging Natural Language Processing for Collecting Experi...Progress Towards Leveraging Natural Language Processing for Collecting Experi...
Progress Towards Leveraging Natural Language Processing for Collecting Experi...
 
Applications of Natural Language Processing to Materials Design
Applications of Natural Language Processing to Materials DesignApplications of Natural Language Processing to Materials Design
Applications of Natural Language Processing to Materials Design
 
Available methods for predicting materials synthesizability using computation...
Available methods for predicting materials synthesizability using computation...Available methods for predicting materials synthesizability using computation...
Available methods for predicting materials synthesizability using computation...
 
Implementing a neural network potential for exascale molecular dynamics
Implementing a neural network potential for exascale molecular dynamicsImplementing a neural network potential for exascale molecular dynamics
Implementing a neural network potential for exascale molecular dynamics
 
Introduction to active learning
Introduction to active learningIntroduction to active learning
Introduction to active learning
 
Overview of the Exascale Additive Manufacturing Project
Overview of the Exascale Additive Manufacturing ProjectOverview of the Exascale Additive Manufacturing Project
Overview of the Exascale Additive Manufacturing Project
 
Belak_ICME_June02015
Belak_ICME_June02015Belak_ICME_June02015
Belak_ICME_June02015
 
TMS workshop on machine learning in materials science: Intro to deep learning...
TMS workshop on machine learning in materials science: Intro to deep learning...TMS workshop on machine learning in materials science: Intro to deep learning...
TMS workshop on machine learning in materials science: Intro to deep learning...
 
The Materials Project: A Community Data Resource for Accelerating New Materia...
The Materials Project: A Community Data Resource for Accelerating New Materia...The Materials Project: A Community Data Resource for Accelerating New Materia...
The Materials Project: A Community Data Resource for Accelerating New Materia...
 

Recently uploaded

A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?Igalia
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CVKhem
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUK Journal
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 

Recently uploaded (20)

A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 

Using MongoDB for Materials Discovery

  • 1. Using MongoDB for Materials Discovery Michael Kocher and Dan Gunter Lawrence Berkeley National Lab
  • 2. Energy Mission at LBNL • Li-ion Batteries • Photovoltaic (Solar Cells) • Thermoelectrics • Biofuels • New Computational Tools • Cutting edge Spectroscopic Tools (Advanced Light Source) http://carboncycle2.lbl.gov/
  • 3. Current Material Design model is Slow 18 Years... from the average new materials discovery to commercialization Bringing New Materials to the Market: Eagar, T.W. Technology Review Feb 1995, 98, 42.
  • 4. Materials Genome Initiative: A Renaissance of American Manufacturing “To help businesses discover, develop, and deploy new materials twice as fast, we're launching what we call the Materials Genome Initiative. The invention of silicon circuits and lithium-ion batteries made computers and iPods and iPads possible -- but it took years to get those technologies from the drawing board to the marketplace. We can do it faster.” - President Obama at Carnegie Mellon University 6/24/2011
  • 5. What is a Material?
  • 6. NaCl Silicon
  • 7. LiCoO2 Li O Co
  • 8. What can we Compute using quantum mechanics? volume density total energy + formation energy metallic? etc... No empirical parameters!
  • 9. MaterialsProject.org ‘The Google of Material Science Data” + MIT and LBNL collaboration
  • 12. Machine Learning How often can you Structure 1 substitute Mg for Ca? Structure 2 (new materials) Structure 3 Structure 4 materials.bson Learning Structure 5 Algorithm Structure 6 What about Na, V, P, O? Prof. Gerbrand Ceder (DOI: 10.1103/PhysRevLett.91.135503)
  • 13. Materials Project: A Play in Three Acts I.Data generation using HTC II. Data storage III.Data analysis/logging
  • 14. Act I: Managing Calculations • Centralized distributed model is the only way to go • Hub is at LBNL • Store the state in db • Overview of running many MPI jobs at many different HP centers
  • 15. MasterQueue create a new engine, add to queue pull crystal builder.x master_queue.bson ‘The Brain’ manager.x manager.x manager.x manager.x manager.x HPC Franklin Hopper Carver lr1 lr2 NERSC Lawrencium (Oakland) (Berkeley)
  • 16. Centralized Logging Example MongoDB and Management manager.x manager.x manager.x manager.x manager.x manager.x manager.x manager.x O1 Cathode Hopper Franklin Carver lr1 lr2 DLX MIT NERSC (Oakland) LBNL Kentucky query = {‘elements’: {‘$all’: [“Li”, “O”], ‘nelectrons’ :{“$lte: 200}}
  • 17. Act II : Core Data storage
  • 19. Powerful Querying Every crystal that has (Li or Na or K), (Mn), (O or S or F or Si) plus one other element except (Zn or Ni or Fe or Cu or Co) { "lattice.volume" : { "$lt" : 500 }, "elements" : {"$all" : ['Mn'],"$size" : 4, “$nin”:['Zn','Ni','Fe','Cu','Co']}, "atoms" : { "$elemMatch" : { ‘oxidation_state’ : 3, ‘symbol’:’Mn’} }, "$where" : "match_all( this.element_names, ['Li', 'Na', 'K'], ['Mn'], ['O', 'S', 'F', 'Si'])" }
  • 20. pre-MongoDB :( ((SELECT structure.structureid FROM structure NATURAL INNER JOIN database NATURAL INNER JOIN databaseentry WHERE structureid IN ((select structure.structureid from structure NATURAL INNER JOIN elemententry where elemententry.symbol='Li' INTERSECT select structure.structureid from structure NATURAL INNER JOIN elemententry where elemententry.symbol='O') INTERSECT select structure.structureid from structure NATURAL INNER JOIN database NATURAL INNER JOIN databaseentry where database.title='ICSD')) EXCEPT (SELECT structure.structureid FROM structure where structure.entryid IN (select duplicateentry.entryid from duplicateentry))) EXCEPT (SELECT structure.structureid FROM structure where structure.entryid IN (select entryid from removals)) Search for materials with Li and O, excluding duplicates
  • 21. Map/Reduce Calculation 12 Calculation 13 ✓ Calculation 14 Calculation 15 MR tasks.bson materials.bson
  • 22. Every App uses MongoDB structure_predictors.bson candidate_materials.bson diffraction_patterns.bson by G. Hautier
  • 26. Rich Error Analysis Experimental Calculated
  • 27. Integrated logging just makes sense • Semi-structured data easily stored • Can correlate with all other data • Automation Layer: Failed tasks • Web/App Layer
  • 28. Conclusions • MongoDB is a very versatile tool • Used in several different cases • Elegant query syntax • Very useful for scientific data storage • A lot of exciting future ideas