SlideShare a Scribd company logo
1 of 29
Scalable Data Mining and Archiving in the
    Era of the Square Kilometre Array
                                Chris A. Mattmann
 Senior Computer Scientist, JPL/Caltech; Adjunct Assistant Professor, USC; Member,
                            Apache Software Foundation
Agenda
• JPL Radio Array Initiative: Scalable Data
  Archiving
• Case Study: KAT-7 Archiving System/U.S.
  Replicated Archive
• Case Study: NRAO Experience

• Future Work

20-Sep-12           HPCUF-MATTMANN-KN         2
And you are?
                                             • Senior Computer Scientist at
                                               NASA JPL in Pasadena, CA
                                               USA
                                             • Software
                                               Architecture/Engineering
                                               Prof at Univ. of Southern
                                               California




            • Apache Member involved in
               – OODT (VP, PMC), Tika (VP,PMC), Nutch (PMC), Incubator (PMC),
                 SIS (Mentor), Gora (PMC), Airavata (Mentor), cTAKES (Mentor),
20-Sep-12        Any23 (Mentor) HPCUF-MATTMANN-KN                           3
Square Kilometre Array (SKA)
• What is it?
   – Next generation radio
     astronomy instrument
     that will be built jointly
     by South Africa and
     Australia to image the
     sky like never before
• What’s the status?
   – Currently in design
     phase, will be built over
     the next decade
• Why do you care
20-Sep-12
                                       Credit: http://scienceray.com/astronomy/stunning-square-
                                       kilometer-array/
                            HPCUF-MATTMANN-KN                                                4
Key Science for the SKA
                  SKA: Key Science
            (a.k.a. m- and cm-( astronomy)

                 Emerging from the Dark Ages &
                  the Epoch of Reionization
                 Strong-field Tests of Gravity
                  with Pulsars and Black Holes

                  Galaxy Evolution, Cosmology, &
                                     Dark Energy

                 The Cradle of Life & Astrobiology

             Origin & Evolution of
             Cosmic Magnetism
                                        Credit: Joe Lazio, JPL
20-Sep-12                  HPCUF-MATTMANN-KN                     5
Radio Initiative: Archiving
•   Initiative Lead: Dayton Jones; Champion: Robert Preston
•   We will define the necessary data services and underlying
    substrate to position JPL to compete for and lead “big data”
    management efforts in astronomy, specifically, SKA, HERA,
    SKA precursors, and NRAO.

•   Perform prototyping and deployment to demonstrate JPL’s
    leadership in the “big data” and astronomy space.

•   Collaborate on Data Products and Algorithms from Adaptive
    Data Processing task

•   Establish partnerships with major SKA potential sites and pre-
    cursor efforts (South Africa, Australia)
20-Sep-12                   HPCUF-MATTMANN-KN                        6
• The Big Picture
                  JPL “Big Data” Initiative
    • Astronomy, Earth science, planetary science, life/physical
      science all drowning in data
    • Fundamental technologies and emerging techniques in
      archiving and data science
        •Largely center around open source communities and

• Research challenges (adapted from NSF)
    •   More data is being collected than we can store
    •   Many data sets are too large to download
    •   Many data sets are too poorly organized to be useful
    •   Many data sets are heterogeneous in type, structure
    •   Data utility is limited by our ability to use it


• My Focus: Big Data Archiving
    • Research methods for integrating intelligent
      algorithms for data triage, subsetting, summarization
    • Construct technologies for smart data movement
    • Evaluate cloud computing for storage/processing
    • Construct data/metadata translators “Babel Fish”
 20-Sep-12                             HPCUF-MATTMANN-KN           7
Some “Big Data” Grand Challenges
       • How do we handle 700 TB/sec of data coming off the wire when we
         actually have to keep it around?
              – Required by the Square Kilometre Array

       • Joe scientist says I’ve got an IDL or Matlab algorithm that I will not
         change and I need to run it on 10 years of data from the Colorado
         River Basin and store and disseminate the output products
              – Required by the Western Snow Hydrology project

       • How do we compare petabytes of climate model output data in a
         variety of formats (HDF, NetCDF, Grib, etc.) with petabytes of remote
         sensing data to improve climate models for the next IPCC assessment?
              – Required by the 5th IPCC assessment and the Earth System Grid and NASA

       • How do we catalog all of NASA’s current planetary science data?
              – Required by the NASA Planetary Data System

                                                                                                                                     8
Image 20-Sep-12                                      HPCUF-MATTMANN-KN
      Credit: http://www.jpl.nasa.gov/news/news.cfm?release=2011- Copyright 2012. Jet Propulsion Laboratory, California Institute of
                                                                  Technology. US Government Sponsorship Acknowledged.
295
Technologies/Thrusts
– Framework for rapidly and unobtrusively
  integrating science algorithms

– Research study identifying set of
  appropriate/inappropriate data movement
  technologies for
  big data systems

– Cloud computing research study
  spaceborne and airborne missions and
   ground based sensors

– Extensions to JPL-led Apache Tika and Apache
  OODT technologies to handle data formats by
  modern big data systems
 20-Sep-12               HPCUF-MATTMANN-KN       9
Apache OODT
•     Entered “incubation” at the Apache
      Software Foundation in 2010
•     Selected as a top level Apache Software
      Foundation project in January 2011
•     Developed by a community of participants
      from many companies, universities, and
      organizations
•     Used for a diverse set of science data
      system activities in planetary science,
      earth science, radio astronomy,
      biomedicine, astrophysics, and more
                                                       http://oodt.apache.org


OODT Development & user community includes:



    20-Sep-12                      HPCUF-MATTMANN-KN                            10
Apache OODT Press

To Host Open Source Summit -- Open Source -- Informati...                                          http://www.informationweek.com/news/government/enterprise-app...



    Welcome Guest.          Log In         Register    Benefits
                                                                                                                                                                  RSS Feeds

                                                                                                                                                                   Subscribe

                                                                                                                                                                  Newsletters

                                                                                                                                                                    Events

                                                                                                                                                                 Digital Library
      Home           News          Blogs       Video     Slideshows

    Software      Security         Hardware       Mobility   Windows     Internet     Global CIO   Government     Healthcare   Financial   SMB      Personal Tech        Cloud

       Cloud Computing                                   Information Management                        Mobile & Wireless                         Security
       Enterprise Architecture                           Leadership                                    Policy & Regulation                       State & Local




                                                                        P erm alink
                 2          Like     2        Share
                                              Share




   NASA To Host Open Source Summit
   The agency plans to bring experts together March 29-30 to discuss open source policy and how
   NASA can better support the community.

   By Elizabeth Montalbano InformationWeek
   March 15, 2011 11:37 AM



             20-Sep-12
   NASA will continue its support of the open source community by hosting its
   first-ever summit around the technology.
                                                                                                       HPCUF-MATTMANN-KN                                                           11
Why Apache and OODT?
• OODT is meant to be a set of tools to
  help build data systems
      – It’s not meant to be “turn key”
      – It attempts to exploit the boundary
        between bringing in capability vs.
        being overly rigid in science
      – Each discipline/project extends

• Apache is the elite open source
  community for software developers
      – Less than 100 projects have been
        promoted to top level (Apache Web
        Server, Tomcat, Solr, Hadoop)
      – Differs from other open source
        communities; it provides a
        governance and management
        structure

20-Sep-12                      HPCUF-MATTMANN-KN   12
Governance Model: Merit




• NASA and other government
  agencies require foundation/
  structure
20-Sep-12           HPCUF-MATTMANN-KN   13
OODT Core Components




•    All Core components implemented as web services
       –




    20-Sep-12                  HPCUF-MATTMANN-KN       14
Getting back to SKA…
• U.S. National Science Foundation does not
  highly prioritize SKA responding to Astro 2010
  Decadal Survey: New Worlds,
  New Horizons

• Taking a “wait 5 years and see”,
  approach
• How do we work with South Africa
  and other countries on SKA?
20-Sep-12          HPCUF-MATTMANN-KN           15
Case Study: KAT-7
• SKA Banff
  Meeting, 2011
• Jasper Horrell and
  Simon Ratcliffe
  express need for US
  Archive for MeerKAT
• Joe Lazio and
  Chris Mattmann
  decide it’s good fit for
  NSF SI^2 – establish
  U.S. based archive
                                                 Credit: Jasper Horrell, Simon Ratcliffe


20-Sep-12                    HPCUF-MATTMANN-KN                                       16
MeerKAT International GigaHertz Tiered Extragalactic Exploration (MIGHTEE) Survey:
            This project aims to construct radio luminosity functions, by conducting deep radio continuum
            observations, in order to track over cosmic time the contribution of accretion (onto active


                          MeerKAT: the Science
            galactic nuclei [AGN]) versus fusion (from star formation) to galaxy luminosities.

             Table 1. Mapping between the Science Frontier Areas of the NWNH and MeerKAT Large Projects.

                Science Frontier Area      Question or Discovery Area            MeerKAT Large Project
                                          Gravitational wave astronomy          Key Science Project on
                                                                                Radio Pulsar Timing
                Discovery                 Time-domain astronomy                 TRAPUM, ThunderKAT
                                          Astrometry                            VLBI with MeerKAT
                                          Epoch of Reionization                 MESMER
                                          What were the first objects to
                                          light up the Universe and when        MESMER
                                          did they do it?
                Origins
                                          What is the fossil record of     Deep H I Field,
                                          galaxy assembly and evolution    MHONGOOSE, H I Survey
                                          from the first stars to the present
                                                                           of Fornax
                                                                           Deep H I Field,
                                          How do baryons cycle in and out MHONGOOSE, H I Survey
                                          of galaxies … ?                  of Fornax, Absoption Line
                                                                           Survey
                                          How do black holes work and
                                                                           MIGHTEE
                                          influence their surroundings?
                Understanding the
                                          How do massive stars end their
                Cosmic Order                                               ThunderKAT, TRAPUM
                                          lives?
                                                                           Key Science Project on
                                          How do rotation and magnetic
                                                                           Radio Pulsar Timing,
                                          fields affect stars?             TRAPUM
                                          What controls the mass-energy- MeerGAL
                                          chemical cycles within galaxies?
                Frontiers of Knowledge    What controls the masses, spins, Key Science Project on
                                          and radii of compact stellar     Radio Pulsar Timing,
                                          remnants?                        TRAPUM

             U.S. Based MeerKAT projects and their relationship to Astro 2010 area
20-Sep-12                                         HPCUF-MATTMANN-KN                                          17
                                                                                                         5
Planned nominal archive
                             Credit: Jasper Horrell, Simon Ratcliffe




20-Sep-12           HPCUF-MATTMANN-KN                                  18
Establishing a U.S. archive is
                   good because…
• Bandwidth limitations in South Africa will make
  sharing the MeerKAT data with U.S. PIs difficult
• JPL team has significant expertise in the study and
  evaluation of selecting the best data movement
  technologies for dissemination
      – MSST 2006, Mattmann Dissertation, IEEE IT Pro 2011, SECLOUD 2011
• JPL team are the leaders of first NASA data
  management technology (OODT) to be stewarded at
  Apache, making software co-development with
  South Africa amenable and tech transfer easy
      – South Africans (Bennett) already embracing OODT
20-Sep-12                      HPCUF-MATTMANN-KN                           19
Figure 3). After successfully writing the initial HDF-5 file, it is augmented with relevant sensor
data that is collected from the KAT sensor data store.
Once the augmentation process is complete, the HDF-5 file is made available to the OODT
                     Early prototype: KAT-7
Crawler daemon. The OODT Crawler is a high powered XML-RPC-based component that uses
Apache Tika [C. Mattmann and Zitting, 2011] and its MIME detection capabilities to identify the
HDF-5 file in the staging directory. MIME detection can be based on regular expressions; digital




                 Figure 3. Prototype deployment of Apache OODT for the KAT-7 array.


                       Credit: Tom Bennett, SKA; Chris Mattmann, JPL                                9
    20-Sep-12                              HPCUF-MATTMANN-KN                                   20
Establishing the U.S. archive
              Credit: Tom Bennett, SKA; Chris Mattmann, JPL




• Deploy OODT at JPL
• Data movement from Cape Town to the U.S. (researching data movement
  approaches)
• Rapidly and easily stand up archive technology in the US and disseminate
• Collaborate on updates with Cape Town
 20-Sep-12                      HPCUF-MATTMANN-KN                            21
Synergistic efforts in Astronomy
•   November 2010: JPL Hosts Peter Quinn and Andreas Wicenic
      – Presents data management strategy to ICRAR and discusses potential
        opportunities for collaboration
•   December 2010: JPL presents to Duncall from SKA Program Office on data
    management work packages
•   April 2011: JPL works with NRAO EVLA to develop OODT prototype (Bryan
    Butler) – next portion of talk will highlight this
•   September 2011: JPL presents at NRAO ALMA Science Software Leads
    workshop on OODT: NRAO desires to leverage OODT in pipeline
•   January 2012: JPL Co-PI on MIT Haystack Observatory NSF MRI proposal to
    support reconfigurable, adaptable array (RAPID): Colin Lonsdale
•   February 2012: JPL reaches out to Murchison Widefield Array (MWA)
    scientists: Melanie Johnston-Hollitt and Lisa Harvey-Smith
•   February 2012: JPL re-engages NRAO: NRAO EVLA collaboration continues
20-Sep-12                         HPCUF-MATTMANN-KN                          22
U.S. National Radio Astronomy
            Observatory (NRAO)
• Explore JPL data system expertise
      – Leverage Apache OODT
      – Leverage architecture experience
      – Build on NRAO Socorro F2F given in April 2011 and
        Innovations in Data-Intensive Astronomy meeting in
        May 2011
• Define achievable prototype
      – Focus on EVLA summer school pipeline
            • Heavy focus on CASApy, simple pipelining, metadata
              extraction, archiving of directory-based products
            • Ideal for OODT system

20-Sep-12                      HPCUF-MATTMANN-KN                   23
Architecture
                               day2_TDEM0003_10s_norx



              EVLA

                                                             day2_TDEM0003_10s_norx
                                        WWW




                                                          Staging
                                                           Area


                                                                                                                products,




                                                                                            CAS Data
                                                                                            Services
                                                                                                                metadata
                                                         Crawler                Browser
                                                                                                                          Science

                                                                                                                system




                                                                                            Services
                                                                                                                 status




                                                                                             PCS
                                                         Curator                  FM



                                                                                                        proc        Data System
            Legend:                                                       rep             cat          status
                                                                                                                     Operator
                           data flow
             Apache
             OODT         control flow                                                             W
                                                                    Cub          WM
                                                                                                Monitor
                              data
              Disk Area       /met
                                                    ska-dc.jpl.nasa.gov

20-Sep-12                                               HPCUF-MATTMANN-KN                                                           24
                                                                                            evlascube event
Demonstration Use Case
• Run EVLA Spectral Line Cube generation
      – First step is ingest EVLARawDataOutput from Joe
      – Then fire off evlascube event
      – Workflow manager writes CASApy script dynamically
            • Via CAS-PGE
      – CAS-PGE starts CASApy
      – CASApy generates Cal tables and 2 Spectral Line Cube
        Images
      – CAS-PGE ingests them into the File Manager
• Gravy: UIs,Cmd Line Tools, Services

20-Sep-12                   HPCUF-MATTMANN-KN               25
Results: Workflow Monitor




20-Sep-12            HPCUF-MATTMANN-KN   26
NRAO and EVLA
• Extended Very Large Array has deployed
  Apache OODT for its data reduction pipeline

• Working to enable more portions of Apache
  OODT (currently only using Workflow
  Manager)

• Evaluating Apache OODT for data ingestion
  and archiving
20-Sep-12          HPCUF-MATTMANN-KN            27
Wrap-up
• JPL’s efforts in Scalable Data Mining for Big
  Data and the SKA

• Successful collaboration with SKA South Africa
  and NRAO

• Open Source Big data management
  framework from Apache (“Apache OODT”)
20-Sep-12           HPCUF-MATTMANN-KN             28
Thanks!
• Questions, more information:
• @chrismattmann
• Email: skadc-dev@jpl.nasa.gov




20-Sep-12         HPCUF-MATTMANN-KN   29

More Related Content

What's hot

LambdaGrids--Earth and Planetary Sciences Driving High Performance Networks a...
LambdaGrids--Earth and Planetary Sciences Driving High Performance Networks a...LambdaGrids--Earth and Planetary Sciences Driving High Performance Networks a...
LambdaGrids--Earth and Planetary Sciences Driving High Performance Networks a...Larry Smarr
 
Analyzing Large Earth Data Sets: New Tools from the OptiPuter and LOOKING Pro...
Analyzing Large Earth Data Sets: New Tools from the OptiPuter and LOOKING Pro...Analyzing Large Earth Data Sets: New Tools from the OptiPuter and LOOKING Pro...
Analyzing Large Earth Data Sets: New Tools from the OptiPuter and LOOKING Pro...Larry Smarr
 
A California-Wide Cyberinfrastructure for Data-Intensive Research
A California-Wide Cyberinfrastructure for Data-Intensive ResearchA California-Wide Cyberinfrastructure for Data-Intensive Research
A California-Wide Cyberinfrastructure for Data-Intensive ResearchLarry Smarr
 
The Pacific Research Platform: Building a Distributed Big Data Machine Learni...
The Pacific Research Platform: Building a Distributed Big Data Machine Learni...The Pacific Research Platform: Building a Distributed Big Data Machine Learni...
The Pacific Research Platform: Building a Distributed Big Data Machine Learni...Larry Smarr
 
Peering The Pacific Research Platform With The Great Plains Network
Peering The Pacific Research Platform With The Great Plains NetworkPeering The Pacific Research Platform With The Great Plains Network
Peering The Pacific Research Platform With The Great Plains NetworkLarry Smarr
 
Calit2-a Persistent UCSD/UCI Framework for Collaboration
Calit2-a Persistent UCSD/UCI Framework for CollaborationCalit2-a Persistent UCSD/UCI Framework for Collaboration
Calit2-a Persistent UCSD/UCI Framework for CollaborationLarry Smarr
 
Toward a Global Research Platform for Big Data Analysis
Toward a Global Research Platform for Big Data AnalysisToward a Global Research Platform for Big Data Analysis
Toward a Global Research Platform for Big Data AnalysisLarry Smarr
 
Positioning University of California Information Technology for the Future: S...
Positioning University of California Information Technology for the Future: S...Positioning University of California Information Technology for the Future: S...
Positioning University of California Information Technology for the Future: S...Larry Smarr
 
Evolving Storage and Cyber Infrastructure at the NASA Center for Climate Simu...
Evolving Storage and Cyber Infrastructure at the NASA Center for Climate Simu...Evolving Storage and Cyber Infrastructure at the NASA Center for Climate Simu...
Evolving Storage and Cyber Infrastructure at the NASA Center for Climate Simu...inside-BigData.com
 
Digital Science: Reproducibility and Visibility in Astronomy
Digital Science: Reproducibility and Visibility in AstronomyDigital Science: Reproducibility and Visibility in Astronomy
Digital Science: Reproducibility and Visibility in AstronomyJose Enrique Ruiz
 
Mexico talk foster march 2012
Mexico talk foster march 2012Mexico talk foster march 2012
Mexico talk foster march 2012Ian Foster
 
Building the Pacific Research Platform: Supernetworks for Big Data Science
Building the Pacific Research Platform: Supernetworks for Big Data ScienceBuilding the Pacific Research Platform: Supernetworks for Big Data Science
Building the Pacific Research Platform: Supernetworks for Big Data ScienceLarry Smarr
 
Introduction to Big Data and Science Clouds (Chapter 1, SC 11 Tutorial)
Introduction to Big Data and Science Clouds (Chapter 1, SC 11 Tutorial)Introduction to Big Data and Science Clouds (Chapter 1, SC 11 Tutorial)
Introduction to Big Data and Science Clouds (Chapter 1, SC 11 Tutorial)Robert Grossman
 
What to Expect of the LSST Archive: The LSST Science Platform
What to Expect of the LSST Archive: The LSST Science PlatformWhat to Expect of the LSST Archive: The LSST Science Platform
What to Expect of the LSST Archive: The LSST Science PlatformMario Juric
 
Keynote on 2015 Yale Day of Data
Keynote on 2015 Yale Day of Data Keynote on 2015 Yale Day of Data
Keynote on 2015 Yale Day of Data Robert Grossman
 
Big Process for Big Data @ PNNL, May 2013
Big Process for Big Data @ PNNL, May 2013Big Process for Big Data @ PNNL, May 2013
Big Process for Big Data @ PNNL, May 2013Ian Foster
 
Workflows to access and massage VOData
Workflows to access and massage VODataWorkflows to access and massage VOData
Workflows to access and massage VODataJose Enrique Ruiz
 
Global Research Platforms: Past, Present, Future
Global Research Platforms: Past, Present, FutureGlobal Research Platforms: Past, Present, Future
Global Research Platforms: Past, Present, FutureLarry Smarr
 
PRP, CHASE-CI, TNRP and OSG
PRP, CHASE-CI, TNRP and OSGPRP, CHASE-CI, TNRP and OSG
PRP, CHASE-CI, TNRP and OSGLarry Smarr
 
Digital Science: Towards the executable paper
Digital Science: Towards the executable paperDigital Science: Towards the executable paper
Digital Science: Towards the executable paperJose Enrique Ruiz
 

What's hot (20)

LambdaGrids--Earth and Planetary Sciences Driving High Performance Networks a...
LambdaGrids--Earth and Planetary Sciences Driving High Performance Networks a...LambdaGrids--Earth and Planetary Sciences Driving High Performance Networks a...
LambdaGrids--Earth and Planetary Sciences Driving High Performance Networks a...
 
Analyzing Large Earth Data Sets: New Tools from the OptiPuter and LOOKING Pro...
Analyzing Large Earth Data Sets: New Tools from the OptiPuter and LOOKING Pro...Analyzing Large Earth Data Sets: New Tools from the OptiPuter and LOOKING Pro...
Analyzing Large Earth Data Sets: New Tools from the OptiPuter and LOOKING Pro...
 
A California-Wide Cyberinfrastructure for Data-Intensive Research
A California-Wide Cyberinfrastructure for Data-Intensive ResearchA California-Wide Cyberinfrastructure for Data-Intensive Research
A California-Wide Cyberinfrastructure for Data-Intensive Research
 
The Pacific Research Platform: Building a Distributed Big Data Machine Learni...
The Pacific Research Platform: Building a Distributed Big Data Machine Learni...The Pacific Research Platform: Building a Distributed Big Data Machine Learni...
The Pacific Research Platform: Building a Distributed Big Data Machine Learni...
 
Peering The Pacific Research Platform With The Great Plains Network
Peering The Pacific Research Platform With The Great Plains NetworkPeering The Pacific Research Platform With The Great Plains Network
Peering The Pacific Research Platform With The Great Plains Network
 
Calit2-a Persistent UCSD/UCI Framework for Collaboration
Calit2-a Persistent UCSD/UCI Framework for CollaborationCalit2-a Persistent UCSD/UCI Framework for Collaboration
Calit2-a Persistent UCSD/UCI Framework for Collaboration
 
Toward a Global Research Platform for Big Data Analysis
Toward a Global Research Platform for Big Data AnalysisToward a Global Research Platform for Big Data Analysis
Toward a Global Research Platform for Big Data Analysis
 
Positioning University of California Information Technology for the Future: S...
Positioning University of California Information Technology for the Future: S...Positioning University of California Information Technology for the Future: S...
Positioning University of California Information Technology for the Future: S...
 
Evolving Storage and Cyber Infrastructure at the NASA Center for Climate Simu...
Evolving Storage and Cyber Infrastructure at the NASA Center for Climate Simu...Evolving Storage and Cyber Infrastructure at the NASA Center for Climate Simu...
Evolving Storage and Cyber Infrastructure at the NASA Center for Climate Simu...
 
Digital Science: Reproducibility and Visibility in Astronomy
Digital Science: Reproducibility and Visibility in AstronomyDigital Science: Reproducibility and Visibility in Astronomy
Digital Science: Reproducibility and Visibility in Astronomy
 
Mexico talk foster march 2012
Mexico talk foster march 2012Mexico talk foster march 2012
Mexico talk foster march 2012
 
Building the Pacific Research Platform: Supernetworks for Big Data Science
Building the Pacific Research Platform: Supernetworks for Big Data ScienceBuilding the Pacific Research Platform: Supernetworks for Big Data Science
Building the Pacific Research Platform: Supernetworks for Big Data Science
 
Introduction to Big Data and Science Clouds (Chapter 1, SC 11 Tutorial)
Introduction to Big Data and Science Clouds (Chapter 1, SC 11 Tutorial)Introduction to Big Data and Science Clouds (Chapter 1, SC 11 Tutorial)
Introduction to Big Data and Science Clouds (Chapter 1, SC 11 Tutorial)
 
What to Expect of the LSST Archive: The LSST Science Platform
What to Expect of the LSST Archive: The LSST Science PlatformWhat to Expect of the LSST Archive: The LSST Science Platform
What to Expect of the LSST Archive: The LSST Science Platform
 
Keynote on 2015 Yale Day of Data
Keynote on 2015 Yale Day of Data Keynote on 2015 Yale Day of Data
Keynote on 2015 Yale Day of Data
 
Big Process for Big Data @ PNNL, May 2013
Big Process for Big Data @ PNNL, May 2013Big Process for Big Data @ PNNL, May 2013
Big Process for Big Data @ PNNL, May 2013
 
Workflows to access and massage VOData
Workflows to access and massage VODataWorkflows to access and massage VOData
Workflows to access and massage VOData
 
Global Research Platforms: Past, Present, Future
Global Research Platforms: Past, Present, FutureGlobal Research Platforms: Past, Present, Future
Global Research Platforms: Past, Present, Future
 
PRP, CHASE-CI, TNRP and OSG
PRP, CHASE-CI, TNRP and OSGPRP, CHASE-CI, TNRP and OSG
PRP, CHASE-CI, TNRP and OSG
 
Digital Science: Towards the executable paper
Digital Science: Towards the executable paperDigital Science: Towards the executable paper
Digital Science: Towards the executable paper
 

Viewers also liked

e-Science for the Science Kilometre Array
e-Science for the Science Kilometre Arraye-Science for the Science Kilometre Array
e-Science for the Science Kilometre ArrayJoint ALMA Observatory
 
The Square Kilometre Array: Overview and Engineering Update
The Square Kilometre Array: Overview and Engineering UpdateThe Square Kilometre Array: Overview and Engineering Update
The Square Kilometre Array: Overview and Engineering UpdateJoint ALMA Observatory
 
Cereal Company Project - JRM
Cereal Company Project - JRMCereal Company Project - JRM
Cereal Company Project - JRMJay R Modi
 
Primavara...............................
Primavara...............................Primavara...............................
Primavara...............................Nicky Nic
 
2 culturally effective helping
2 culturally effective helping2 culturally effective helping
2 culturally effective helpingDon Thompson
 
Aloitusseminaari, Rovaniemi, Keskinarkaus, LaY, 1.4.2009
Aloitusseminaari, Rovaniemi, Keskinarkaus, LaY, 1.4.2009Aloitusseminaari, Rovaniemi, Keskinarkaus, LaY, 1.4.2009
Aloitusseminaari, Rovaniemi, Keskinarkaus, LaY, 1.4.2009jennikaisto
 
Shannon Sports
Shannon SportsShannon Sports
Shannon Sportsmissmarsh
 
Bilgi, Toplum, Iktidar - Ozgur Uckan
Bilgi, Toplum, Iktidar - Ozgur UckanBilgi, Toplum, Iktidar - Ozgur Uckan
Bilgi, Toplum, Iktidar - Ozgur UckanOzgur Uckan
 
Rovaniemi, aloitusseminaari, Karhu, 010409
Rovaniemi, aloitusseminaari, Karhu, 010409Rovaniemi, aloitusseminaari, Karhu, 010409
Rovaniemi, aloitusseminaari, Karhu, 010409jennikaisto
 
1.ganduri frumoase
1.ganduri frumoase1.ganduri frumoase
1.ganduri frumoaseNicky Nic
 
Supercharging your Apache OODT deployments with the Process Control System
Supercharging your Apache OODT deployments with the Process Control SystemSupercharging your Apache OODT deployments with the Process Control System
Supercharging your Apache OODT deployments with the Process Control SystemChris Mattmann
 
Official power point paynes prairie
Official power point   paynes prairieOfficial power point   paynes prairie
Official power point paynes prairieRobbyBarbaro
 
I.R.I.S. Retail Products 2009
I.R.I.S. Retail Products 2009I.R.I.S. Retail Products 2009
I.R.I.S. Retail Products 2009Mohamed Berrihi
 
心态调整及认同
心态调整及认同心态调整及认同
心态调整及认同20004
 
《以客户为中心的专业销售技巧》
《以客户为中心的专业销售技巧》《以客户为中心的专业销售技巧》
《以客户为中心的专业销售技巧》20004
 
Wengines, Workflows, and 2 years of advanced data processing in Apache OODT
Wengines, Workflows, and 2 years of advanced data processing in Apache OODTWengines, Workflows, and 2 years of advanced data processing in Apache OODT
Wengines, Workflows, and 2 years of advanced data processing in Apache OODTChris Mattmann
 
绩效考核及团队沟通
绩效考核及团队沟通绩效考核及团队沟通
绩效考核及团队沟通20004
 
11..Charlotte Trees In Flower
11..Charlotte  Trees In Flower11..Charlotte  Trees In Flower
11..Charlotte Trees In FlowerNicky Nic
 

Viewers also liked (20)

e-Science for the Science Kilometre Array
e-Science for the Science Kilometre Arraye-Science for the Science Kilometre Array
e-Science for the Science Kilometre Array
 
The Square Kilometre Array: Overview and Engineering Update
The Square Kilometre Array: Overview and Engineering UpdateThe Square Kilometre Array: Overview and Engineering Update
The Square Kilometre Array: Overview and Engineering Update
 
UKTI Webinar Square Kilometre Array
UKTI Webinar Square Kilometre Array UKTI Webinar Square Kilometre Array
UKTI Webinar Square Kilometre Array
 
Cereal Company Project - JRM
Cereal Company Project - JRMCereal Company Project - JRM
Cereal Company Project - JRM
 
Primavara...............................
Primavara...............................Primavara...............................
Primavara...............................
 
2 culturally effective helping
2 culturally effective helping2 culturally effective helping
2 culturally effective helping
 
Aloitusseminaari, Rovaniemi, Keskinarkaus, LaY, 1.4.2009
Aloitusseminaari, Rovaniemi, Keskinarkaus, LaY, 1.4.2009Aloitusseminaari, Rovaniemi, Keskinarkaus, LaY, 1.4.2009
Aloitusseminaari, Rovaniemi, Keskinarkaus, LaY, 1.4.2009
 
Shannon Sports
Shannon SportsShannon Sports
Shannon Sports
 
Bilgi, Toplum, Iktidar - Ozgur Uckan
Bilgi, Toplum, Iktidar - Ozgur UckanBilgi, Toplum, Iktidar - Ozgur Uckan
Bilgi, Toplum, Iktidar - Ozgur Uckan
 
Rovaniemi, aloitusseminaari, Karhu, 010409
Rovaniemi, aloitusseminaari, Karhu, 010409Rovaniemi, aloitusseminaari, Karhu, 010409
Rovaniemi, aloitusseminaari, Karhu, 010409
 
1.ganduri frumoase
1.ganduri frumoase1.ganduri frumoase
1.ganduri frumoase
 
Supercharging your Apache OODT deployments with the Process Control System
Supercharging your Apache OODT deployments with the Process Control SystemSupercharging your Apache OODT deployments with the Process Control System
Supercharging your Apache OODT deployments with the Process Control System
 
Official power point paynes prairie
Official power point   paynes prairieOfficial power point   paynes prairie
Official power point paynes prairie
 
I.R.I.S. Retail Products 2009
I.R.I.S. Retail Products 2009I.R.I.S. Retail Products 2009
I.R.I.S. Retail Products 2009
 
Iafie europe 2017
Iafie europe 2017Iafie europe 2017
Iafie europe 2017
 
心态调整及认同
心态调整及认同心态调整及认同
心态调整及认同
 
《以客户为中心的专业销售技巧》
《以客户为中心的专业销售技巧》《以客户为中心的专业销售技巧》
《以客户为中心的专业销售技巧》
 
Wengines, Workflows, and 2 years of advanced data processing in Apache OODT
Wengines, Workflows, and 2 years of advanced data processing in Apache OODTWengines, Workflows, and 2 years of advanced data processing in Apache OODT
Wengines, Workflows, and 2 years of advanced data processing in Apache OODT
 
绩效考核及团队沟通
绩效考核及团队沟通绩效考核及团队沟通
绩效考核及团队沟通
 
11..Charlotte Trees In Flower
11..Charlotte  Trees In Flower11..Charlotte  Trees In Flower
11..Charlotte Trees In Flower
 

Similar to Scalable Data Mining and Archiving in the Era of the Square Kilometre Array

Spark at NASA/JPL-(Chris Mattmann, NASA/JPL)
Spark at NASA/JPL-(Chris Mattmann, NASA/JPL)Spark at NASA/JPL-(Chris Mattmann, NASA/JPL)
Spark at NASA/JPL-(Chris Mattmann, NASA/JPL)Spark Summit
 
Big Data Challenges at NASA
Big Data Challenges at NASABig Data Challenges at NASA
Big Data Challenges at NASADataWorks Summit
 
Creating a Science-Driven Big Data Superhighway
Creating a Science-Driven Big Data SuperhighwayCreating a Science-Driven Big Data Superhighway
Creating a Science-Driven Big Data SuperhighwayLarry Smarr
 
Data Infrastructure Development for SKA/Jasper Horrell
Data Infrastructure Development for SKA/Jasper HorrellData Infrastructure Development for SKA/Jasper Horrell
Data Infrastructure Development for SKA/Jasper HorrellAfrican Open Science Platform
 
ApacheCon NA 2013 VFASTR
ApacheCon NA 2013 VFASTRApacheCon NA 2013 VFASTR
ApacheCon NA 2013 VFASTRLucaCinquini
 
The Pacific Research Platform:a Science-Driven Big-Data Freeway System
The Pacific Research Platform:a Science-Driven Big-Data Freeway SystemThe Pacific Research Platform:a Science-Driven Big-Data Freeway System
The Pacific Research Platform:a Science-Driven Big-Data Freeway SystemLarry Smarr
 
The Pacific Research Platform
The Pacific Research PlatformThe Pacific Research Platform
The Pacific Research PlatformLarry Smarr
 
A High-Performance Campus-Scale Cyberinfrastructure: The Technical, Political...
A High-Performance Campus-Scale Cyberinfrastructure: The Technical, Political...A High-Performance Campus-Scale Cyberinfrastructure: The Technical, Political...
A High-Performance Campus-Scale Cyberinfrastructure: The Technical, Political...Larry Smarr
 
The Pacific Research Platform: A Science-Driven Big-Data Freeway System
The Pacific Research Platform: A Science-Driven Big-Data Freeway SystemThe Pacific Research Platform: A Science-Driven Big-Data Freeway System
The Pacific Research Platform: A Science-Driven Big-Data Freeway SystemLarry Smarr
 
Set My Data Free: High-Performance CI for Data-Intensive Research
Set My Data Free: High-Performance CI for Data-Intensive ResearchSet My Data Free: High-Performance CI for Data-Intensive Research
Set My Data Free: High-Performance CI for Data-Intensive ResearchLarry Smarr
 
A Campus-Scale High Performance Cyberinfrastructure is Required for Data-Int...
A Campus-Scale High Performance Cyberinfrastructure is Required for Data-Int...A Campus-Scale High Performance Cyberinfrastructure is Required for Data-Int...
A Campus-Scale High Performance Cyberinfrastructure is Required for Data-Int...Larry Smarr
 
Cyberinfrastructure to Support Ocean Observatories
Cyberinfrastructure to Support Ocean ObservatoriesCyberinfrastructure to Support Ocean Observatories
Cyberinfrastructure to Support Ocean ObservatoriesLarry Smarr
 
Looking Back, Looking Forward NSF CI Funding 1985-2025
Looking Back, Looking Forward NSF CI Funding 1985-2025Looking Back, Looking Forward NSF CI Funding 1985-2025
Looking Back, Looking Forward NSF CI Funding 1985-2025Larry Smarr
 
The OptIPlanet Collaboratory -- a Global CineGrid Testbed
The OptIPlanet Collaboratory -- a Global CineGrid TestbedThe OptIPlanet Collaboratory -- a Global CineGrid Testbed
The OptIPlanet Collaboratory -- a Global CineGrid TestbedLarry Smarr
 
The Pacific Research Platform: a Science-Driven Big-Data Freeway System
The Pacific Research Platform: a Science-Driven Big-Data Freeway SystemThe Pacific Research Platform: a Science-Driven Big-Data Freeway System
The Pacific Research Platform: a Science-Driven Big-Data Freeway SystemLarry Smarr
 
Information Technology Infrastructure Committee (ITIC)
Information Technology Infrastructure Committee (ITIC)Information Technology Infrastructure Committee (ITIC)
Information Technology Infrastructure Committee (ITIC)Larry Smarr
 
Emc 2013 Big Data in Astronomy
Emc 2013 Big Data in AstronomyEmc 2013 Big Data in Astronomy
Emc 2013 Big Data in AstronomyFabio Porto
 
The Pacific Research Platform
 Two Years In
The Pacific Research Platform
 Two Years InThe Pacific Research Platform
 Two Years In
The Pacific Research Platform
 Two Years InLarry Smarr
 
Report to the NAC
Report to the NACReport to the NAC
Report to the NACLarry Smarr
 

Similar to Scalable Data Mining and Archiving in the Era of the Square Kilometre Array (20)

Spark at NASA/JPL-(Chris Mattmann, NASA/JPL)
Spark at NASA/JPL-(Chris Mattmann, NASA/JPL)Spark at NASA/JPL-(Chris Mattmann, NASA/JPL)
Spark at NASA/JPL-(Chris Mattmann, NASA/JPL)
 
Big Data Challenges at NASA
Big Data Challenges at NASABig Data Challenges at NASA
Big Data Challenges at NASA
 
Creating a Science-Driven Big Data Superhighway
Creating a Science-Driven Big Data SuperhighwayCreating a Science-Driven Big Data Superhighway
Creating a Science-Driven Big Data Superhighway
 
Data Infrastructure Development for SKA/Jasper Horrell
Data Infrastructure Development for SKA/Jasper HorrellData Infrastructure Development for SKA/Jasper Horrell
Data Infrastructure Development for SKA/Jasper Horrell
 
ApacheCon NA 2013 VFASTR
ApacheCon NA 2013 VFASTRApacheCon NA 2013 VFASTR
ApacheCon NA 2013 VFASTR
 
The Pacific Research Platform:a Science-Driven Big-Data Freeway System
The Pacific Research Platform:a Science-Driven Big-Data Freeway SystemThe Pacific Research Platform:a Science-Driven Big-Data Freeway System
The Pacific Research Platform:a Science-Driven Big-Data Freeway System
 
The Pacific Research Platform
The Pacific Research PlatformThe Pacific Research Platform
The Pacific Research Platform
 
A High-Performance Campus-Scale Cyberinfrastructure: The Technical, Political...
A High-Performance Campus-Scale Cyberinfrastructure: The Technical, Political...A High-Performance Campus-Scale Cyberinfrastructure: The Technical, Political...
A High-Performance Campus-Scale Cyberinfrastructure: The Technical, Political...
 
The Pacific Research Platform: A Science-Driven Big-Data Freeway System
The Pacific Research Platform: A Science-Driven Big-Data Freeway SystemThe Pacific Research Platform: A Science-Driven Big-Data Freeway System
The Pacific Research Platform: A Science-Driven Big-Data Freeway System
 
Cifar
CifarCifar
Cifar
 
Set My Data Free: High-Performance CI for Data-Intensive Research
Set My Data Free: High-Performance CI for Data-Intensive ResearchSet My Data Free: High-Performance CI for Data-Intensive Research
Set My Data Free: High-Performance CI for Data-Intensive Research
 
A Campus-Scale High Performance Cyberinfrastructure is Required for Data-Int...
A Campus-Scale High Performance Cyberinfrastructure is Required for Data-Int...A Campus-Scale High Performance Cyberinfrastructure is Required for Data-Int...
A Campus-Scale High Performance Cyberinfrastructure is Required for Data-Int...
 
Cyberinfrastructure to Support Ocean Observatories
Cyberinfrastructure to Support Ocean ObservatoriesCyberinfrastructure to Support Ocean Observatories
Cyberinfrastructure to Support Ocean Observatories
 
Looking Back, Looking Forward NSF CI Funding 1985-2025
Looking Back, Looking Forward NSF CI Funding 1985-2025Looking Back, Looking Forward NSF CI Funding 1985-2025
Looking Back, Looking Forward NSF CI Funding 1985-2025
 
The OptIPlanet Collaboratory -- a Global CineGrid Testbed
The OptIPlanet Collaboratory -- a Global CineGrid TestbedThe OptIPlanet Collaboratory -- a Global CineGrid Testbed
The OptIPlanet Collaboratory -- a Global CineGrid Testbed
 
The Pacific Research Platform: a Science-Driven Big-Data Freeway System
The Pacific Research Platform: a Science-Driven Big-Data Freeway SystemThe Pacific Research Platform: a Science-Driven Big-Data Freeway System
The Pacific Research Platform: a Science-Driven Big-Data Freeway System
 
Information Technology Infrastructure Committee (ITIC)
Information Technology Infrastructure Committee (ITIC)Information Technology Infrastructure Committee (ITIC)
Information Technology Infrastructure Committee (ITIC)
 
Emc 2013 Big Data in Astronomy
Emc 2013 Big Data in AstronomyEmc 2013 Big Data in Astronomy
Emc 2013 Big Data in Astronomy
 
The Pacific Research Platform
 Two Years In
The Pacific Research Platform
 Two Years InThe Pacific Research Platform
 Two Years In
The Pacific Research Platform
 Two Years In
 
Report to the NAC
Report to the NACReport to the NAC
Report to the NAC
 

More from Chris Mattmann

Teaching NASA to Open Source its Software the Apache Way
Teaching NASA to Open Source its Software the Apache WayTeaching NASA to Open Source its Software the Apache Way
Teaching NASA to Open Source its Software the Apache WayChris Mattmann
 
Apache Tika: 1 point Oh!
Apache Tika: 1 point Oh!Apache Tika: 1 point Oh!
Apache Tika: 1 point Oh!Chris Mattmann
 
A Look into the Apache OODT Ecosystem
A Look into the Apache OODT EcosystemA Look into the Apache OODT Ecosystem
A Look into the Apache OODT EcosystemChris Mattmann
 
Understanding the Meaningful Use of Open Source Software
Understanding the Meaningful Use of Open Source SoftwareUnderstanding the Meaningful Use of Open Source Software
Understanding the Meaningful Use of Open Source SoftwareChris Mattmann
 
An Open Source Strategy for NASA
An Open Source Strategy for NASAAn Open Source Strategy for NASA
An Open Source Strategy for NASAChris Mattmann
 
Lessons Learned in the Development of a Web-scale Search Engine: Nutch2 and b...
Lessons Learned in the Development of a Web-scale Search Engine: Nutch2 and b...Lessons Learned in the Development of a Web-scale Search Engine: Nutch2 and b...
Lessons Learned in the Development of a Web-scale Search Engine: Nutch2 and b...Chris Mattmann
 
Lessons Learned in the Development of a Web-scale Search Engine: Nutch2 and b...
Lessons Learned in the Development of a Web-scale Search Engine: Nutch2 and b...Lessons Learned in the Development of a Web-scale Search Engine: Nutch2 and b...
Lessons Learned in the Development of a Web-scale Search Engine: Nutch2 and b...Chris Mattmann
 
Scientific data curation and processing with Apache Tika
Scientific data curation and processing with Apache TikaScientific data curation and processing with Apache Tika
Scientific data curation and processing with Apache TikaChris Mattmann
 

More from Chris Mattmann (8)

Teaching NASA to Open Source its Software the Apache Way
Teaching NASA to Open Source its Software the Apache WayTeaching NASA to Open Source its Software the Apache Way
Teaching NASA to Open Source its Software the Apache Way
 
Apache Tika: 1 point Oh!
Apache Tika: 1 point Oh!Apache Tika: 1 point Oh!
Apache Tika: 1 point Oh!
 
A Look into the Apache OODT Ecosystem
A Look into the Apache OODT EcosystemA Look into the Apache OODT Ecosystem
A Look into the Apache OODT Ecosystem
 
Understanding the Meaningful Use of Open Source Software
Understanding the Meaningful Use of Open Source SoftwareUnderstanding the Meaningful Use of Open Source Software
Understanding the Meaningful Use of Open Source Software
 
An Open Source Strategy for NASA
An Open Source Strategy for NASAAn Open Source Strategy for NASA
An Open Source Strategy for NASA
 
Lessons Learned in the Development of a Web-scale Search Engine: Nutch2 and b...
Lessons Learned in the Development of a Web-scale Search Engine: Nutch2 and b...Lessons Learned in the Development of a Web-scale Search Engine: Nutch2 and b...
Lessons Learned in the Development of a Web-scale Search Engine: Nutch2 and b...
 
Lessons Learned in the Development of a Web-scale Search Engine: Nutch2 and b...
Lessons Learned in the Development of a Web-scale Search Engine: Nutch2 and b...Lessons Learned in the Development of a Web-scale Search Engine: Nutch2 and b...
Lessons Learned in the Development of a Web-scale Search Engine: Nutch2 and b...
 
Scientific data curation and processing with Apache Tika
Scientific data curation and processing with Apache TikaScientific data curation and processing with Apache Tika
Scientific data curation and processing with Apache Tika
 

Recently uploaded

Igniting Next Level Productivity with AI-Infused Data Integration Workflows
Igniting Next Level Productivity with AI-Infused Data Integration WorkflowsIgniting Next Level Productivity with AI-Infused Data Integration Workflows
Igniting Next Level Productivity with AI-Infused Data Integration WorkflowsSafe Software
 
The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...
The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...
The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...Aggregage
 
Basic Building Blocks of Internet of Things.
Basic Building Blocks of Internet of Things.Basic Building Blocks of Internet of Things.
Basic Building Blocks of Internet of Things.YounusS2
 
Bird eye's view on Camunda open source ecosystem
Bird eye's view on Camunda open source ecosystemBird eye's view on Camunda open source ecosystem
Bird eye's view on Camunda open source ecosystemAsko Soukka
 
Meet the new FSP 3000 M-Flex800™
Meet the new FSP 3000 M-Flex800™Meet the new FSP 3000 M-Flex800™
Meet the new FSP 3000 M-Flex800™Adtran
 
OpenShift Commons Paris - Choose Your Own Observability Adventure
OpenShift Commons Paris - Choose Your Own Observability AdventureOpenShift Commons Paris - Choose Your Own Observability Adventure
OpenShift Commons Paris - Choose Your Own Observability AdventureEric D. Schabell
 
AI Fame Rush Review – Virtual Influencer Creation In Just Minutes
AI Fame Rush Review – Virtual Influencer Creation In Just MinutesAI Fame Rush Review – Virtual Influencer Creation In Just Minutes
AI Fame Rush Review – Virtual Influencer Creation In Just MinutesMd Hossain Ali
 
Secure your environment with UiPath and CyberArk technologies - Session 1
Secure your environment with UiPath and CyberArk technologies - Session 1Secure your environment with UiPath and CyberArk technologies - Session 1
Secure your environment with UiPath and CyberArk technologies - Session 1DianaGray10
 
COMPUTER 10 Lesson 8 - Building a Website
COMPUTER 10 Lesson 8 - Building a WebsiteCOMPUTER 10 Lesson 8 - Building a Website
COMPUTER 10 Lesson 8 - Building a Websitedgelyza
 
Salesforce Miami User Group Event - 1st Quarter 2024
Salesforce Miami User Group Event - 1st Quarter 2024Salesforce Miami User Group Event - 1st Quarter 2024
Salesforce Miami User Group Event - 1st Quarter 2024SkyPlanner
 
KubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCost
KubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCostKubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCost
KubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCostMatt Ray
 
UiPath Solutions Management Preview - Northern CA Chapter - March 22.pdf
UiPath Solutions Management Preview - Northern CA Chapter - March 22.pdfUiPath Solutions Management Preview - Northern CA Chapter - March 22.pdf
UiPath Solutions Management Preview - Northern CA Chapter - March 22.pdfDianaGray10
 
Designing A Time bound resource download URL
Designing A Time bound resource download URLDesigning A Time bound resource download URL
Designing A Time bound resource download URLRuncy Oommen
 
Introduction to Matsuo Laboratory (ENG).pptx
Introduction to Matsuo Laboratory (ENG).pptxIntroduction to Matsuo Laboratory (ENG).pptx
Introduction to Matsuo Laboratory (ENG).pptxMatsuo Lab
 
Empowering Africa's Next Generation: The AI Leadership Blueprint
Empowering Africa's Next Generation: The AI Leadership BlueprintEmpowering Africa's Next Generation: The AI Leadership Blueprint
Empowering Africa's Next Generation: The AI Leadership BlueprintMahmoud Rabie
 
Anypoint Code Builder , Google Pub sub connector and MuleSoft RPA
Anypoint Code Builder , Google Pub sub connector and MuleSoft RPAAnypoint Code Builder , Google Pub sub connector and MuleSoft RPA
Anypoint Code Builder , Google Pub sub connector and MuleSoft RPAshyamraj55
 
How Accurate are Carbon Emissions Projections?
How Accurate are Carbon Emissions Projections?How Accurate are Carbon Emissions Projections?
How Accurate are Carbon Emissions Projections?IES VE
 
Artificial Intelligence & SEO Trends for 2024
Artificial Intelligence & SEO Trends for 2024Artificial Intelligence & SEO Trends for 2024
Artificial Intelligence & SEO Trends for 2024D Cloud Solutions
 
Using IESVE for Loads, Sizing and Heat Pump Modeling to Achieve Decarbonization
Using IESVE for Loads, Sizing and Heat Pump Modeling to Achieve DecarbonizationUsing IESVE for Loads, Sizing and Heat Pump Modeling to Achieve Decarbonization
Using IESVE for Loads, Sizing and Heat Pump Modeling to Achieve DecarbonizationIES VE
 

Recently uploaded (20)

Igniting Next Level Productivity with AI-Infused Data Integration Workflows
Igniting Next Level Productivity with AI-Infused Data Integration WorkflowsIgniting Next Level Productivity with AI-Infused Data Integration Workflows
Igniting Next Level Productivity with AI-Infused Data Integration Workflows
 
The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...
The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...
The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...
 
Basic Building Blocks of Internet of Things.
Basic Building Blocks of Internet of Things.Basic Building Blocks of Internet of Things.
Basic Building Blocks of Internet of Things.
 
Bird eye's view on Camunda open source ecosystem
Bird eye's view on Camunda open source ecosystemBird eye's view on Camunda open source ecosystem
Bird eye's view on Camunda open source ecosystem
 
Meet the new FSP 3000 M-Flex800™
Meet the new FSP 3000 M-Flex800™Meet the new FSP 3000 M-Flex800™
Meet the new FSP 3000 M-Flex800™
 
OpenShift Commons Paris - Choose Your Own Observability Adventure
OpenShift Commons Paris - Choose Your Own Observability AdventureOpenShift Commons Paris - Choose Your Own Observability Adventure
OpenShift Commons Paris - Choose Your Own Observability Adventure
 
AI Fame Rush Review – Virtual Influencer Creation In Just Minutes
AI Fame Rush Review – Virtual Influencer Creation In Just MinutesAI Fame Rush Review – Virtual Influencer Creation In Just Minutes
AI Fame Rush Review – Virtual Influencer Creation In Just Minutes
 
Secure your environment with UiPath and CyberArk technologies - Session 1
Secure your environment with UiPath and CyberArk technologies - Session 1Secure your environment with UiPath and CyberArk technologies - Session 1
Secure your environment with UiPath and CyberArk technologies - Session 1
 
COMPUTER 10 Lesson 8 - Building a Website
COMPUTER 10 Lesson 8 - Building a WebsiteCOMPUTER 10 Lesson 8 - Building a Website
COMPUTER 10 Lesson 8 - Building a Website
 
Salesforce Miami User Group Event - 1st Quarter 2024
Salesforce Miami User Group Event - 1st Quarter 2024Salesforce Miami User Group Event - 1st Quarter 2024
Salesforce Miami User Group Event - 1st Quarter 2024
 
KubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCost
KubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCostKubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCost
KubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCost
 
UiPath Solutions Management Preview - Northern CA Chapter - March 22.pdf
UiPath Solutions Management Preview - Northern CA Chapter - March 22.pdfUiPath Solutions Management Preview - Northern CA Chapter - March 22.pdf
UiPath Solutions Management Preview - Northern CA Chapter - March 22.pdf
 
Designing A Time bound resource download URL
Designing A Time bound resource download URLDesigning A Time bound resource download URL
Designing A Time bound resource download URL
 
Introduction to Matsuo Laboratory (ENG).pptx
Introduction to Matsuo Laboratory (ENG).pptxIntroduction to Matsuo Laboratory (ENG).pptx
Introduction to Matsuo Laboratory (ENG).pptx
 
201610817 - edge part1
201610817 - edge part1201610817 - edge part1
201610817 - edge part1
 
Empowering Africa's Next Generation: The AI Leadership Blueprint
Empowering Africa's Next Generation: The AI Leadership BlueprintEmpowering Africa's Next Generation: The AI Leadership Blueprint
Empowering Africa's Next Generation: The AI Leadership Blueprint
 
Anypoint Code Builder , Google Pub sub connector and MuleSoft RPA
Anypoint Code Builder , Google Pub sub connector and MuleSoft RPAAnypoint Code Builder , Google Pub sub connector and MuleSoft RPA
Anypoint Code Builder , Google Pub sub connector and MuleSoft RPA
 
How Accurate are Carbon Emissions Projections?
How Accurate are Carbon Emissions Projections?How Accurate are Carbon Emissions Projections?
How Accurate are Carbon Emissions Projections?
 
Artificial Intelligence & SEO Trends for 2024
Artificial Intelligence & SEO Trends for 2024Artificial Intelligence & SEO Trends for 2024
Artificial Intelligence & SEO Trends for 2024
 
Using IESVE for Loads, Sizing and Heat Pump Modeling to Achieve Decarbonization
Using IESVE for Loads, Sizing and Heat Pump Modeling to Achieve DecarbonizationUsing IESVE for Loads, Sizing and Heat Pump Modeling to Achieve Decarbonization
Using IESVE for Loads, Sizing and Heat Pump Modeling to Achieve Decarbonization
 

Scalable Data Mining and Archiving in the Era of the Square Kilometre Array

  • 1. Scalable Data Mining and Archiving in the Era of the Square Kilometre Array Chris A. Mattmann Senior Computer Scientist, JPL/Caltech; Adjunct Assistant Professor, USC; Member, Apache Software Foundation
  • 2. Agenda • JPL Radio Array Initiative: Scalable Data Archiving • Case Study: KAT-7 Archiving System/U.S. Replicated Archive • Case Study: NRAO Experience • Future Work 20-Sep-12 HPCUF-MATTMANN-KN 2
  • 3. And you are? • Senior Computer Scientist at NASA JPL in Pasadena, CA USA • Software Architecture/Engineering Prof at Univ. of Southern California • Apache Member involved in – OODT (VP, PMC), Tika (VP,PMC), Nutch (PMC), Incubator (PMC), SIS (Mentor), Gora (PMC), Airavata (Mentor), cTAKES (Mentor), 20-Sep-12 Any23 (Mentor) HPCUF-MATTMANN-KN 3
  • 4. Square Kilometre Array (SKA) • What is it? – Next generation radio astronomy instrument that will be built jointly by South Africa and Australia to image the sky like never before • What’s the status? – Currently in design phase, will be built over the next decade • Why do you care 20-Sep-12 Credit: http://scienceray.com/astronomy/stunning-square- kilometer-array/ HPCUF-MATTMANN-KN 4
  • 5. Key Science for the SKA SKA: Key Science (a.k.a. m- and cm-( astronomy) Emerging from the Dark Ages & the Epoch of Reionization Strong-field Tests of Gravity with Pulsars and Black Holes Galaxy Evolution, Cosmology, & Dark Energy The Cradle of Life & Astrobiology Origin & Evolution of Cosmic Magnetism Credit: Joe Lazio, JPL 20-Sep-12 HPCUF-MATTMANN-KN 5
  • 6. Radio Initiative: Archiving • Initiative Lead: Dayton Jones; Champion: Robert Preston • We will define the necessary data services and underlying substrate to position JPL to compete for and lead “big data” management efforts in astronomy, specifically, SKA, HERA, SKA precursors, and NRAO. • Perform prototyping and deployment to demonstrate JPL’s leadership in the “big data” and astronomy space. • Collaborate on Data Products and Algorithms from Adaptive Data Processing task • Establish partnerships with major SKA potential sites and pre- cursor efforts (South Africa, Australia) 20-Sep-12 HPCUF-MATTMANN-KN 6
  • 7. • The Big Picture JPL “Big Data” Initiative • Astronomy, Earth science, planetary science, life/physical science all drowning in data • Fundamental technologies and emerging techniques in archiving and data science •Largely center around open source communities and • Research challenges (adapted from NSF) • More data is being collected than we can store • Many data sets are too large to download • Many data sets are too poorly organized to be useful • Many data sets are heterogeneous in type, structure • Data utility is limited by our ability to use it • My Focus: Big Data Archiving • Research methods for integrating intelligent algorithms for data triage, subsetting, summarization • Construct technologies for smart data movement • Evaluate cloud computing for storage/processing • Construct data/metadata translators “Babel Fish” 20-Sep-12 HPCUF-MATTMANN-KN 7
  • 8. Some “Big Data” Grand Challenges • How do we handle 700 TB/sec of data coming off the wire when we actually have to keep it around? – Required by the Square Kilometre Array • Joe scientist says I’ve got an IDL or Matlab algorithm that I will not change and I need to run it on 10 years of data from the Colorado River Basin and store and disseminate the output products – Required by the Western Snow Hydrology project • How do we compare petabytes of climate model output data in a variety of formats (HDF, NetCDF, Grib, etc.) with petabytes of remote sensing data to improve climate models for the next IPCC assessment? – Required by the 5th IPCC assessment and the Earth System Grid and NASA • How do we catalog all of NASA’s current planetary science data? – Required by the NASA Planetary Data System 8 Image 20-Sep-12 HPCUF-MATTMANN-KN Credit: http://www.jpl.nasa.gov/news/news.cfm?release=2011- Copyright 2012. Jet Propulsion Laboratory, California Institute of Technology. US Government Sponsorship Acknowledged. 295
  • 9. Technologies/Thrusts – Framework for rapidly and unobtrusively integrating science algorithms – Research study identifying set of appropriate/inappropriate data movement technologies for big data systems – Cloud computing research study spaceborne and airborne missions and ground based sensors – Extensions to JPL-led Apache Tika and Apache OODT technologies to handle data formats by modern big data systems 20-Sep-12 HPCUF-MATTMANN-KN 9
  • 10. Apache OODT • Entered “incubation” at the Apache Software Foundation in 2010 • Selected as a top level Apache Software Foundation project in January 2011 • Developed by a community of participants from many companies, universities, and organizations • Used for a diverse set of science data system activities in planetary science, earth science, radio astronomy, biomedicine, astrophysics, and more http://oodt.apache.org OODT Development & user community includes: 20-Sep-12 HPCUF-MATTMANN-KN 10
  • 11. Apache OODT Press To Host Open Source Summit -- Open Source -- Informati... http://www.informationweek.com/news/government/enterprise-app... Welcome Guest. Log In Register Benefits RSS Feeds Subscribe Newsletters Events Digital Library Home News Blogs Video Slideshows Software Security Hardware Mobility Windows Internet Global CIO Government Healthcare Financial SMB Personal Tech Cloud Cloud Computing Information Management Mobile & Wireless Security Enterprise Architecture Leadership Policy & Regulation State & Local P erm alink 2 Like 2 Share Share NASA To Host Open Source Summit The agency plans to bring experts together March 29-30 to discuss open source policy and how NASA can better support the community. By Elizabeth Montalbano InformationWeek March 15, 2011 11:37 AM 20-Sep-12 NASA will continue its support of the open source community by hosting its first-ever summit around the technology. HPCUF-MATTMANN-KN 11
  • 12. Why Apache and OODT? • OODT is meant to be a set of tools to help build data systems – It’s not meant to be “turn key” – It attempts to exploit the boundary between bringing in capability vs. being overly rigid in science – Each discipline/project extends • Apache is the elite open source community for software developers – Less than 100 projects have been promoted to top level (Apache Web Server, Tomcat, Solr, Hadoop) – Differs from other open source communities; it provides a governance and management structure 20-Sep-12 HPCUF-MATTMANN-KN 12
  • 13. Governance Model: Merit • NASA and other government agencies require foundation/ structure 20-Sep-12 HPCUF-MATTMANN-KN 13
  • 14. OODT Core Components • All Core components implemented as web services – 20-Sep-12 HPCUF-MATTMANN-KN 14
  • 15. Getting back to SKA… • U.S. National Science Foundation does not highly prioritize SKA responding to Astro 2010 Decadal Survey: New Worlds, New Horizons • Taking a “wait 5 years and see”, approach • How do we work with South Africa and other countries on SKA? 20-Sep-12 HPCUF-MATTMANN-KN 15
  • 16. Case Study: KAT-7 • SKA Banff Meeting, 2011 • Jasper Horrell and Simon Ratcliffe express need for US Archive for MeerKAT • Joe Lazio and Chris Mattmann decide it’s good fit for NSF SI^2 – establish U.S. based archive Credit: Jasper Horrell, Simon Ratcliffe 20-Sep-12 HPCUF-MATTMANN-KN 16
  • 17. MeerKAT International GigaHertz Tiered Extragalactic Exploration (MIGHTEE) Survey: This project aims to construct radio luminosity functions, by conducting deep radio continuum observations, in order to track over cosmic time the contribution of accretion (onto active MeerKAT: the Science galactic nuclei [AGN]) versus fusion (from star formation) to galaxy luminosities. Table 1. Mapping between the Science Frontier Areas of the NWNH and MeerKAT Large Projects. Science Frontier Area Question or Discovery Area MeerKAT Large Project Gravitational wave astronomy Key Science Project on Radio Pulsar Timing Discovery Time-domain astronomy TRAPUM, ThunderKAT Astrometry VLBI with MeerKAT Epoch of Reionization MESMER What were the first objects to light up the Universe and when MESMER did they do it? Origins What is the fossil record of Deep H I Field, galaxy assembly and evolution MHONGOOSE, H I Survey from the first stars to the present of Fornax Deep H I Field, How do baryons cycle in and out MHONGOOSE, H I Survey of galaxies … ? of Fornax, Absoption Line Survey How do black holes work and MIGHTEE influence their surroundings? Understanding the How do massive stars end their Cosmic Order ThunderKAT, TRAPUM lives? Key Science Project on How do rotation and magnetic Radio Pulsar Timing, fields affect stars? TRAPUM What controls the mass-energy- MeerGAL chemical cycles within galaxies? Frontiers of Knowledge What controls the masses, spins, Key Science Project on and radii of compact stellar Radio Pulsar Timing, remnants? TRAPUM U.S. Based MeerKAT projects and their relationship to Astro 2010 area 20-Sep-12 HPCUF-MATTMANN-KN 17 5
  • 18. Planned nominal archive Credit: Jasper Horrell, Simon Ratcliffe 20-Sep-12 HPCUF-MATTMANN-KN 18
  • 19. Establishing a U.S. archive is good because… • Bandwidth limitations in South Africa will make sharing the MeerKAT data with U.S. PIs difficult • JPL team has significant expertise in the study and evaluation of selecting the best data movement technologies for dissemination – MSST 2006, Mattmann Dissertation, IEEE IT Pro 2011, SECLOUD 2011 • JPL team are the leaders of first NASA data management technology (OODT) to be stewarded at Apache, making software co-development with South Africa amenable and tech transfer easy – South Africans (Bennett) already embracing OODT 20-Sep-12 HPCUF-MATTMANN-KN 19
  • 20. Figure 3). After successfully writing the initial HDF-5 file, it is augmented with relevant sensor data that is collected from the KAT sensor data store. Once the augmentation process is complete, the HDF-5 file is made available to the OODT Early prototype: KAT-7 Crawler daemon. The OODT Crawler is a high powered XML-RPC-based component that uses Apache Tika [C. Mattmann and Zitting, 2011] and its MIME detection capabilities to identify the HDF-5 file in the staging directory. MIME detection can be based on regular expressions; digital Figure 3. Prototype deployment of Apache OODT for the KAT-7 array. Credit: Tom Bennett, SKA; Chris Mattmann, JPL 9 20-Sep-12 HPCUF-MATTMANN-KN 20
  • 21. Establishing the U.S. archive Credit: Tom Bennett, SKA; Chris Mattmann, JPL • Deploy OODT at JPL • Data movement from Cape Town to the U.S. (researching data movement approaches) • Rapidly and easily stand up archive technology in the US and disseminate • Collaborate on updates with Cape Town 20-Sep-12 HPCUF-MATTMANN-KN 21
  • 22. Synergistic efforts in Astronomy • November 2010: JPL Hosts Peter Quinn and Andreas Wicenic – Presents data management strategy to ICRAR and discusses potential opportunities for collaboration • December 2010: JPL presents to Duncall from SKA Program Office on data management work packages • April 2011: JPL works with NRAO EVLA to develop OODT prototype (Bryan Butler) – next portion of talk will highlight this • September 2011: JPL presents at NRAO ALMA Science Software Leads workshop on OODT: NRAO desires to leverage OODT in pipeline • January 2012: JPL Co-PI on MIT Haystack Observatory NSF MRI proposal to support reconfigurable, adaptable array (RAPID): Colin Lonsdale • February 2012: JPL reaches out to Murchison Widefield Array (MWA) scientists: Melanie Johnston-Hollitt and Lisa Harvey-Smith • February 2012: JPL re-engages NRAO: NRAO EVLA collaboration continues 20-Sep-12 HPCUF-MATTMANN-KN 22
  • 23. U.S. National Radio Astronomy Observatory (NRAO) • Explore JPL data system expertise – Leverage Apache OODT – Leverage architecture experience – Build on NRAO Socorro F2F given in April 2011 and Innovations in Data-Intensive Astronomy meeting in May 2011 • Define achievable prototype – Focus on EVLA summer school pipeline • Heavy focus on CASApy, simple pipelining, metadata extraction, archiving of directory-based products • Ideal for OODT system 20-Sep-12 HPCUF-MATTMANN-KN 23
  • 24. Architecture day2_TDEM0003_10s_norx EVLA day2_TDEM0003_10s_norx WWW Staging Area products, CAS Data Services metadata Crawler Browser Science system Services status PCS Curator FM proc Data System Legend: rep cat status Operator data flow Apache OODT control flow W Cub WM Monitor data Disk Area /met ska-dc.jpl.nasa.gov 20-Sep-12 HPCUF-MATTMANN-KN 24 evlascube event
  • 25. Demonstration Use Case • Run EVLA Spectral Line Cube generation – First step is ingest EVLARawDataOutput from Joe – Then fire off evlascube event – Workflow manager writes CASApy script dynamically • Via CAS-PGE – CAS-PGE starts CASApy – CASApy generates Cal tables and 2 Spectral Line Cube Images – CAS-PGE ingests them into the File Manager • Gravy: UIs,Cmd Line Tools, Services 20-Sep-12 HPCUF-MATTMANN-KN 25
  • 26. Results: Workflow Monitor 20-Sep-12 HPCUF-MATTMANN-KN 26
  • 27. NRAO and EVLA • Extended Very Large Array has deployed Apache OODT for its data reduction pipeline • Working to enable more portions of Apache OODT (currently only using Workflow Manager) • Evaluating Apache OODT for data ingestion and archiving 20-Sep-12 HPCUF-MATTMANN-KN 27
  • 28. Wrap-up • JPL’s efforts in Scalable Data Mining for Big Data and the SKA • Successful collaboration with SKA South Africa and NRAO • Open Source Big data management framework from Apache (“Apache OODT”) 20-Sep-12 HPCUF-MATTMANN-KN 28
  • 29. Thanks! • Questions, more information: • @chrismattmann • Email: skadc-dev@jpl.nasa.gov 20-Sep-12 HPCUF-MATTMANN-KN 29

Editor's Notes

  1. [email_address] Lisa.Harvey-Smith@csiro.au> <Lisa.Harvey-Smith@csiro.au <Lisa.Harvey-Smith@csiro.au> Tony Beasley Tim Cornwell