SlideShare a Scribd company logo
1 of 27
Download to read offline
Scientific Applications of
   Data Distribution Service
    Svetlana Shasharina#, Nanbor Wang,
Rooparani Pundaleeka, James Matykiewicz and
              Steve Goldhaber     http://www.txcorp.com
                # sveta@txcorp.com
Outline

• Introduction
   – Tech-X Corporation
   – Some driving scientific applications
   – Common requirements
• QuIDS project
   • Workflow management
   • Security experiments
   • Python DDS implementation
• Summary
http://www.txcorp.com

       Tech-X Corporation
• Founded in 1994, located in Boulder CO
• 65 people (mostly computational physics, app math,
  applied computer science)
• Have been merging CS (C++, CORBA, GRID, MPI, GPU,
  complex via and data management) and physics and
  looking at DDS
• Funded by DOE, NASA, DOD and sales
• Applications in
  – Plasma modeling (accelerators, lasers, fusion devices,
    semiconductors) and beam physics
  – Nanotechnology
  – Data analysis
Large Synoptic Survey
                         Telescope (LSST)
•   On ground digital camera to build in
    Chile to start in 2020 (?). Funded
    by DOE, NASA, university, private
    sector
•   Up to 2000 images/day +
    calibration data -> 30 TB/day
•   Processed locally and reprocessed
    and archived in Illinois (National
    Center for Supercomputing
    Applications)
•   Uses OpenSplice for control
    software
•   Can we help with data
    management: orchestration of
    steps and monitoring of data
    processing?
NoVA
• NoVA: NuMI Off-axis ve (electron
neutrino) Appearance experiment
• Will generate neutrino beams at
FNAL and send it to a detector in
Ash River, Minnesota (500 mile in < 2 ms)
• DOE funded (many labs and universities)
• RMS (Responsive Messaging System) is DDS-based
  system to pass control and status messages in the NoVA
  data acquisition system (two types of topics but has many
  actual topics to implement point-to-point communications)
• Will eventually need to go over WAN, and provide 10 Hz
  status transmissions between ~100 applications
• Simplifies OpenSplice using traits (like simd-cxx) to
  minimize the amount of data types and mapping topics to
  strings
SciDAC-II LQCD
• LQCD: Lattice Quantum Chromodynamics
  (computational version of QCD: a theory of strong
  interaction involving quarks and gluons making up
  hadrons like protons and neutrons)
• DDS is used to perform monitoring of clusters doing
  LQCD calculations (detect job failures, evaluate nodes
  loads and performance, start/kill etc)
• Topics for monitoring and controls of jobs and
  resources
• Use OpenSplice
Common themes for scientific
               apps and DDS
• RT issues are not well estimated
• Common usability needs
   – Support for scientific data formats and data products (from and
     out of topics): domain schemas and data transformation tools
   – Control and monitoring topics (can we come up with reusable
     schema?)
   – Simple APIs corresponding to expectations of scientists
   – Ease of modification (evolving systems not just production
     systems)
   – QoS cookbook (how to get correct combinations)
   – General education
      • Is DDS good for point-to-point (Bill talks only to Pete)
      • How one uses DDS without killing the system (memory etc)
• Other requirements
   – Site and community specific security
   – WAN operation (Chicago and Berkeley, for example)
Common extra expectations
• Automated test harness:
   – How one tests for correct behavior and QoS
   – Avoid regression in rapidly evolving system modified by a
      team
• Interacting with databases (all data should be archived and
  allow queries)
• Can we do everything using DDS to minimize external
  dependencies?
   – Efficient bulk data transfer (usually point-to-point and BIG
      triangles :-)
   – Workflow engine (workflow: loosely coupled applications
      often through files and can be distributed, while
      simulations is typically tightly coupled on a HPC resource)
• Interacting with Web interfaces and Web Services
QuIDS: to address some issues
• QuIDS: Quality Information Distribution System
• Helping the applications above through Phase II SBIR from
  DOE (HEP office)
• Collaboration of Tech-X and Fermilab
• Goals (we will talk about the ones in red in rest of this talk):
   – Implement a DDS-based system to monitor distributed
     processing of astronomical data
      • Simplifying C++(done with simd-cxx?) and Python APIs
      • Support for FITS and monitoring and control data, images,
        histograms, spectra
      • Security
      • WAN
      • Testing harness
   – Investigate of of DDS for workflow management
QuIDS, bird eye view (courtesy of
             FNAL)
QuIDS at FNAL computational
                      domain
 MCTopic        SciTopic




                                       Monitor
                             MCTopic
W W W      R R                          R      R      W
CampaignManager

                                             MCTopic
                           MCTopic
                                                          Workflow
                                     Application(s)       of apps
             SciTopic
                                      R     W W
Computa-onal	
  Domain
Generic workflows: do we need all?
• Workflow is something outside of HPC (loosely coupled and
  can tolerate even WAN, while simulation is something that
  goes to qsub…)
• Kepler (de-facto workflow engine expected for DOE
  applications):
   –   Support for complex workflows
   –   Java based
   –   Heavy and hard to learn
   –   Not portable to future platforms (DOE supercomputers might not have
       Java at all)
• Real workflows in astronomy are simple (do not
  expressivness of full programming language or pi-algebra)
   – Pipelines
   – DAGs
• How one implements such workflows using DDS?
Parallel pipeline: most of
                 astronomy workflows

Worker(0)       Task(0)         Task(1)         Task(N-1)

Initialize        Task(0)     Task(1)         Task(N-1)     Finalize


Worker(2)       Task(0)          Task(1)        Task(N-1)

      Tasks can be continued by different
      working processes: data can be passed
      between them (the Worker(1)
      performs Task(1) using data from
      Worker (0))
ddspipe: current implementation of
                    workflow engine
•   Parallel pipeline job consist of
     – Initialization phase (splitting data into manageable pieces) running on
        one node
     – Parallel worker processes doing data processing tasks (possible not
        sharing the same address space)
     – Finalization step (merging data into a final image or movie)
•   There is an expected order in tasks, so that tasks can be numbered and
    output data of a previous step as input to next
•   Design decisions for now:
     – Workers do not communicate to each other
     – Workers are given work by a mediating entity: tuple space manager
        (no self-organization)
     – No scheduling for now (round-robin: tasks and workers are queued in
        the server)
     – Workers can get data coming from a task completed by a different
        worker (do not address the problem of data transfer now)
•   All communication is via DDS topics while data to work on is available
    through local files to all workers
GDS = Tuple Space but we
                    want more (?)
•   Tickets:
     – Task ticket (id, indata, out data, status)
     – Task status: standby, ready, running, finished
     – Worker ticket (id, task ticket, status)
     – Worker status: ready, busy
•   Classic tuple space = set of task ticket and we could use only them
    but… instead of dealing with a self-organized (wild) system, we
    would like to implement
     – Workflow: M sequences of tasks with matching in and out data
     – Scheduling (based on policies, resources, location of workers)
     – Fault-tolerance (detecting and rescheduling of incomplete tasks)
•   Hence: we decided to have a class TupleSpaceServer to address
    these (currently just pipeline and queues and no FT)
ddspipe classes:
• Initializer
   – Splits data, possibly creates workers and workspaces, publishes (for
     all initial work tickets with correct specification of the workflow
• TupleSpaceServer
   – Changes status in task tickets in accordance with the workflow order:
     once a worker reports that task n done, a ticket for task n-1 with <n-1
     in-data> = <n out-data> is changed to ready and worker topic is
     published (with the worker id next in the queue). Once a worker
     reports that is doing this work, the status is changed to running etc.
• Workers
   – Publish their status
   – Listen to task assignment (matching its id to the one in the worker
     ticket)
• Finalizer
   – Whatever to finish up (merge data and clean)
States of Tasks in Tuple Space
                              Initial              jobStatus,Run                                              Eventual
                              ticket                                                                             ticket
                              states                                                                            states
 Taks executed sequentially



                                                                           Tuplespace
                                         Task0                     Task0    internal    Task0      Task0
                                         Standby                   Ready                Running   Completed
                                                                           scheduling



                                        taskStatus,Task0,Completed




                                                                           Tuplespace
                                         Task1                     Task1    internal    Task1      Task1
                                         Standby                   Ready                Running   Completed
                                                                           scheduling




                                        taskStatus,Taskn-1,Completed




                                                                           Tuplespace
                                         Taskn                     Taskn    internal    Taskn      Taskn
                                         Standby                   Ready                Running   Completed
                                                                           scheduling




Sequences proceed independently in parallel
Tuple Space Manager Maintains
              Tasks Tickets and Schedules Tasks

                                  Tuple Space
                   ticket          Manager                     jobStatus
   Job                                                                       Job
                                Job    task   seq   status

Initializer
                 jobStatus      ID     ID     ID                           Finalizer


                workerStatus                                                    jobStatus
                workTicket
  Worker
                   ticket             Idle Workers
                                                                            Compl      Dispos
                                                                 Run
                                                                             eted      able




       Worker          Worker          Worker                Worker        Worker
Status and next steps of
         ddspipe (beyond what bash can
                     do :-)
• Prototype working
   – Although we do have some issues with memory and bugs
   – I would like to experiment with no queuing: next task open for
     grabs if one of the tasks of the previous stage is finished
• Next steps
   – User definition of workflow
   – Multiple jobs
   – Separation of worker manager from task manager?
   – Implementing workers doing slit-spectroscopy based on
     running R
   – DAG support
   – Some scheduling (balance between data transfer and mixing
     data between slow and fast workers?)
   – Data transfer implementation and in-memory data exchange
Security for scientific projects:
              from nothing to everything
• OpenSplice enterprise edition provides Secure Networking
  Service:
   – Security parameters (i.e., data encryption and authentication) are
     d fined in Security Profiles in the OpenSplice configuration file.
   – Node authentication and access control are also specified in the
     configuration profile.
   – Security Profiles are attached to partitions which set the security
     parameters in use for that partition.
   – Secure networking is activated as a domain Service
• Scientists are not used to pay 
• Used to:
   – Authenticate and authorize on connection
   – Rules are in admin area of a virtual organization (DDS
     should consult)
Providing security in community
                edition of OpenSplice
• Tried to replace the the lower networking layer of
  OpenSplice with OpenSSL and see how one can provide
  authentication, authorization and encryption
• OpenSSL is an open source toolkit:
   – Secure Sockets Layer and Transport Layer Security (new
     standard, replacing SSL)
   – General purpose cryptography library
   – Public Key Infrastructure (PKI): e.g., certificate creation, signing
     and checking, CA management
Switching to OpenSSL++ in the
           networking layer of community
                    OpenSplice
         Applications                    Applications

 Data Centric Publish/Subscibe   Data Centric Publish/Subscibe

  Real Time Publish/Subscribe    Real Time Publish/Subscribe

    RT Net              DDSi        RT Net          DDSi

             UDP / IP                     OpenSSL

• The UDP/IP layer handles the interface to the operating-
  system’s socket interface
• Switching to OpenSSL allowed us to establish secure
  tunnel between two sites
• But the configuration should be done per each two nodes!
Future Directions in Security
                  Work

• Waiting for new development and will be happy to use to
  implement what is expected by DOE labs (our
  collaborators) security
• Explore user data etc fields to address applications
  specifics if this is not addressed by the security profile
PyDDS: Python bindings for
           DDS communications
• Started with SWIG for wrapping generated bindings:
  works fine but needed manual wrapping of multiple
  generated classes
• Next worked with Boost.Python for wrapping of
  communication layer (set of classes that are used in
  the IDLPP generated code to call into OpenSplice for
  communication) so that there will be no need to wrap
  generated bindings
• Problem: need to take care of inheritance manually
  and deal with several handlers that are unexposed C
  structs used in forward declarations
Status and next steps for
                         PyDDS
• Hand-wrapping of C++ bindings using SWIG works:
#!/usr/bin/python
import time
import TxddsPy
qos = TxddsPy.Qos()
qos.setTopicReliable()
qos.setWriterReliable()
writer = TxddsPy.RawImageWriter(”rawImage")
writer.setTopicQos(qos)
writer.setWriterQos(qos)
data = TxddsPy.rawImage()
writer.writeData(data)

• Next:
    – Investigate wrapping communication classes so that we
      expose minimum for Boost.Python
    – Or develop tools for generating glue code needed to
      Boost.Python using a string list of topics
QuIDS summary and future
                      directions
• We have prototyped DDS-bases solutions for astronomy data
  processing applications
   –   Tools for bringing FITS data into DDS
   –   Simple QoS scenarios (getting prepublished data)
   –   Parallel pipeline workflows
   –   Security studies
   –   Python APIs
• Possible next steps
   –   Language to describe workflows for user input into the system
   –   More complex workflows and concrete implementations
   –   Implementing FNAL security requirements
   –   WAN communication between FNAL and LBNL going through firewall
   –   Streamlining glue code generation for Python
   –   Testing harness
   –   Bulk data transfers
   –   Archiving data into databases
   –   Web interfaces
Acknowledgements

• Ground Data System (GDS) team from Fermi National
  Accelerator Laboratory
• PrismTech
• Nikolay Malitsky (BNL)
• OpenSplice mailing list and its contributors
• US Department of Energy, Office of High Energy
  Physics

More Related Content

What's hot

KIISE:SIGDB Workshop presentation.
KIISE:SIGDB Workshop presentation.KIISE:SIGDB Workshop presentation.
KIISE:SIGDB Workshop presentation.Kyong-Ha Lee
 
Using Simple PID Controllers to Prevent and Mitigate Faults in Scientific Wor...
Using Simple PID Controllers to Prevent and Mitigate Faults in Scientific Wor...Using Simple PID Controllers to Prevent and Mitigate Faults in Scientific Wor...
Using Simple PID Controllers to Prevent and Mitigate Faults in Scientific Wor...Rafael Ferreira da Silva
 
Deep Recurrent Neural Networks for Sequence Learning in Spark by Yves Mabiala
Deep Recurrent Neural Networks for Sequence Learning in Spark by Yves MabialaDeep Recurrent Neural Networks for Sequence Learning in Spark by Yves Mabiala
Deep Recurrent Neural Networks for Sequence Learning in Spark by Yves MabialaSpark Summit
 
Parallel Data Processing with MapReduce: A Survey
Parallel Data Processing with MapReduce: A SurveyParallel Data Processing with MapReduce: A Survey
Parallel Data Processing with MapReduce: A SurveyKyong-Ha Lee
 
MapReduce Scheduling Algorithms
MapReduce Scheduling AlgorithmsMapReduce Scheduling Algorithms
MapReduce Scheduling AlgorithmsLeila panahi
 
Database Research on Modern Computing Architecture
Database Research on Modern Computing ArchitectureDatabase Research on Modern Computing Architecture
Database Research on Modern Computing ArchitectureKyong-Ha Lee
 
Scalable Algorithm Design with MapReduce
Scalable Algorithm Design with MapReduceScalable Algorithm Design with MapReduce
Scalable Algorithm Design with MapReducePietro Michiardi
 
Tokyo Webmining Talk1
Tokyo Webmining Talk1Tokyo Webmining Talk1
Tokyo Webmining Talk1Kenta Oono
 
(Slides) Task scheduling algorithm for multicore processor system for minimiz...
(Slides) Task scheduling algorithm for multicore processor system for minimiz...(Slides) Task scheduling algorithm for multicore processor system for minimiz...
(Slides) Task scheduling algorithm for multicore processor system for minimiz...Naoki Shibata
 
Alex Smola, Professor in the Machine Learning Department, Carnegie Mellon Uni...
Alex Smola, Professor in the Machine Learning Department, Carnegie Mellon Uni...Alex Smola, Professor in the Machine Learning Department, Carnegie Mellon Uni...
Alex Smola, Professor in the Machine Learning Department, Carnegie Mellon Uni...MLconf
 
Session 46 - Principles of workflow management and execution
Session 46 - Principles of workflow management and execution Session 46 - Principles of workflow management and execution
Session 46 - Principles of workflow management and execution ISSGC Summer School
 
Hadoop interview questions - Softwarequery.com
Hadoop interview questions - Softwarequery.comHadoop interview questions - Softwarequery.com
Hadoop interview questions - Softwarequery.comsoftwarequery
 
Beyond Map/Reduce: Getting Creative With Parallel Processing
Beyond Map/Reduce: Getting Creative With Parallel ProcessingBeyond Map/Reduce: Getting Creative With Parallel Processing
Beyond Map/Reduce: Getting Creative With Parallel ProcessingEd Kohlwey
 
MapReduce: Distributed Computing for Machine Learning
MapReduce: Distributed Computing for Machine LearningMapReduce: Distributed Computing for Machine Learning
MapReduce: Distributed Computing for Machine Learningbutest
 
Task Scheduling Algorithm for Multicore Processor Systems with Turbo Boost an...
Task Scheduling Algorithm for Multicore Processor Systems with Turbo Boost an...Task Scheduling Algorithm for Multicore Processor Systems with Turbo Boost an...
Task Scheduling Algorithm for Multicore Processor Systems with Turbo Boost an...Naoki Shibata
 
benchmarks-sigmod09
benchmarks-sigmod09benchmarks-sigmod09
benchmarks-sigmod09Hiroshi Ono
 
NVIDIA 深度學習教育機構 (DLI): Image segmentation with tensorflow
NVIDIA 深度學習教育機構 (DLI): Image segmentation with tensorflowNVIDIA 深度學習教育機構 (DLI): Image segmentation with tensorflow
NVIDIA 深度學習教育機構 (DLI): Image segmentation with tensorflowNVIDIA Taiwan
 
Google Spanner : our understanding of concepts and implications
Google Spanner : our understanding of concepts and implicationsGoogle Spanner : our understanding of concepts and implications
Google Spanner : our understanding of concepts and implicationsHarisankar H
 

What's hot (20)

KIISE:SIGDB Workshop presentation.
KIISE:SIGDB Workshop presentation.KIISE:SIGDB Workshop presentation.
KIISE:SIGDB Workshop presentation.
 
Using Simple PID Controllers to Prevent and Mitigate Faults in Scientific Wor...
Using Simple PID Controllers to Prevent and Mitigate Faults in Scientific Wor...Using Simple PID Controllers to Prevent and Mitigate Faults in Scientific Wor...
Using Simple PID Controllers to Prevent and Mitigate Faults in Scientific Wor...
 
Deep Recurrent Neural Networks for Sequence Learning in Spark by Yves Mabiala
Deep Recurrent Neural Networks for Sequence Learning in Spark by Yves MabialaDeep Recurrent Neural Networks for Sequence Learning in Spark by Yves Mabiala
Deep Recurrent Neural Networks for Sequence Learning in Spark by Yves Mabiala
 
Parallel Data Processing with MapReduce: A Survey
Parallel Data Processing with MapReduce: A SurveyParallel Data Processing with MapReduce: A Survey
Parallel Data Processing with MapReduce: A Survey
 
MapReduce Scheduling Algorithms
MapReduce Scheduling AlgorithmsMapReduce Scheduling Algorithms
MapReduce Scheduling Algorithms
 
Database Research on Modern Computing Architecture
Database Research on Modern Computing ArchitectureDatabase Research on Modern Computing Architecture
Database Research on Modern Computing Architecture
 
Scalable Algorithm Design with MapReduce
Scalable Algorithm Design with MapReduceScalable Algorithm Design with MapReduce
Scalable Algorithm Design with MapReduce
 
Tokyo Webmining Talk1
Tokyo Webmining Talk1Tokyo Webmining Talk1
Tokyo Webmining Talk1
 
(Slides) Task scheduling algorithm for multicore processor system for minimiz...
(Slides) Task scheduling algorithm for multicore processor system for minimiz...(Slides) Task scheduling algorithm for multicore processor system for minimiz...
(Slides) Task scheduling algorithm for multicore processor system for minimiz...
 
Alex Smola, Professor in the Machine Learning Department, Carnegie Mellon Uni...
Alex Smola, Professor in the Machine Learning Department, Carnegie Mellon Uni...Alex Smola, Professor in the Machine Learning Department, Carnegie Mellon Uni...
Alex Smola, Professor in the Machine Learning Department, Carnegie Mellon Uni...
 
Session 46 - Principles of workflow management and execution
Session 46 - Principles of workflow management and execution Session 46 - Principles of workflow management and execution
Session 46 - Principles of workflow management and execution
 
Hadoop interview questions - Softwarequery.com
Hadoop interview questions - Softwarequery.comHadoop interview questions - Softwarequery.com
Hadoop interview questions - Softwarequery.com
 
Beyond Map/Reduce: Getting Creative With Parallel Processing
Beyond Map/Reduce: Getting Creative With Parallel ProcessingBeyond Map/Reduce: Getting Creative With Parallel Processing
Beyond Map/Reduce: Getting Creative With Parallel Processing
 
MapReduce: Distributed Computing for Machine Learning
MapReduce: Distributed Computing for Machine LearningMapReduce: Distributed Computing for Machine Learning
MapReduce: Distributed Computing for Machine Learning
 
Task Scheduling Algorithm for Multicore Processor Systems with Turbo Boost an...
Task Scheduling Algorithm for Multicore Processor Systems with Turbo Boost an...Task Scheduling Algorithm for Multicore Processor Systems with Turbo Boost an...
Task Scheduling Algorithm for Multicore Processor Systems with Turbo Boost an...
 
MapReduce basics
MapReduce basicsMapReduce basics
MapReduce basics
 
Zaharia spark-scala-days-2012
Zaharia spark-scala-days-2012Zaharia spark-scala-days-2012
Zaharia spark-scala-days-2012
 
benchmarks-sigmod09
benchmarks-sigmod09benchmarks-sigmod09
benchmarks-sigmod09
 
NVIDIA 深度學習教育機構 (DLI): Image segmentation with tensorflow
NVIDIA 深度學習教育機構 (DLI): Image segmentation with tensorflowNVIDIA 深度學習教育機構 (DLI): Image segmentation with tensorflow
NVIDIA 深度學習教育機構 (DLI): Image segmentation with tensorflow
 
Google Spanner : our understanding of concepts and implications
Google Spanner : our understanding of concepts and implicationsGoogle Spanner : our understanding of concepts and implications
Google Spanner : our understanding of concepts and implications
 

Viewers also liked

MV open innovation cluster conference handout
MV open innovation cluster conference handoutMV open innovation cluster conference handout
MV open innovation cluster conference handoutJohn Michitson
 
Pleno municipal infantil 2012
Pleno municipal infantil 2012Pleno municipal infantil 2012
Pleno municipal infantil 2012XXX XXX
 
Pleno municipal infantil 2013
Pleno municipal infantil 2013Pleno municipal infantil 2013
Pleno municipal infantil 2013XXX XXX
 
Cyberpolitics 2009 W11
Cyberpolitics 2009 W11Cyberpolitics 2009 W11
Cyberpolitics 2009 W11oiwan
 
Borderland.Reading Is Thinking.Sept2015
Borderland.Reading Is Thinking.Sept2015Borderland.Reading Is Thinking.Sept2015
Borderland.Reading Is Thinking.Sept2015Faye Brownlie
 
Premios dia del libro 2016
Premios dia del libro 2016Premios dia del libro 2016
Premios dia del libro 2016XXX XXX
 
OpenSplice DDS Goes Open Source
OpenSplice DDS Goes Open SourceOpenSplice DDS Goes Open Source
OpenSplice DDS Goes Open SourceAngelo Corsaro
 
Search Party - Internet & Social Media Search Tricks that Will Improve the Wa...
Search Party - Internet & Social Media Search Tricks that Will Improve the Wa...Search Party - Internet & Social Media Search Tricks that Will Improve the Wa...
Search Party - Internet & Social Media Search Tricks that Will Improve the Wa...Marian Madonia, CSP
 
Inside a Computer
Inside a ComputerInside a Computer
Inside a ComputerSMumford
 
PCI Compliance: What You Need to Know
PCI Compliance: What You Need to KnowPCI Compliance: What You Need to Know
PCI Compliance: What You Need to KnowSasha Nunke
 
Week 6 cyberpolitics
Week 6 cyberpoliticsWeek 6 cyberpolitics
Week 6 cyberpoliticsoiwan
 
Web Application Security For Small and Medium Businesses
Web Application Security For Small and Medium BusinessesWeb Application Security For Small and Medium Businesses
Web Application Security For Small and Medium BusinessesSasha Nunke
 

Viewers also liked (20)

MV open innovation cluster conference handout
MV open innovation cluster conference handoutMV open innovation cluster conference handout
MV open innovation cluster conference handout
 
Good thoughts
Good thoughtsGood thoughts
Good thoughts
 
Peqoud
PeqoudPeqoud
Peqoud
 
Pleno municipal infantil 2012
Pleno municipal infantil 2012Pleno municipal infantil 2012
Pleno municipal infantil 2012
 
ikp321-02
ikp321-02ikp321-02
ikp321-02
 
Pleno municipal infantil 2013
Pleno municipal infantil 2013Pleno municipal infantil 2013
Pleno municipal infantil 2013
 
Cyberpolitics 2009 W11
Cyberpolitics 2009 W11Cyberpolitics 2009 W11
Cyberpolitics 2009 W11
 
Borderland.Reading Is Thinking.Sept2015
Borderland.Reading Is Thinking.Sept2015Borderland.Reading Is Thinking.Sept2015
Borderland.Reading Is Thinking.Sept2015
 
Premios dia del libro 2016
Premios dia del libro 2016Premios dia del libro 2016
Premios dia del libro 2016
 
OpenSplice DDS Goes Open Source
OpenSplice DDS Goes Open SourceOpenSplice DDS Goes Open Source
OpenSplice DDS Goes Open Source
 
Search Party - Internet & Social Media Search Tricks that Will Improve the Wa...
Search Party - Internet & Social Media Search Tricks that Will Improve the Wa...Search Party - Internet & Social Media Search Tricks that Will Improve the Wa...
Search Party - Internet & Social Media Search Tricks that Will Improve the Wa...
 
Inside a Computer
Inside a ComputerInside a Computer
Inside a Computer
 
PCI Compliance: What You Need to Know
PCI Compliance: What You Need to KnowPCI Compliance: What You Need to Know
PCI Compliance: What You Need to Know
 
Sph 106 Ch 2
Sph 106 Ch 2Sph 106 Ch 2
Sph 106 Ch 2
 
ikp321-04
ikp321-04ikp321-04
ikp321-04
 
Week 6 cyberpolitics
Week 6 cyberpoliticsWeek 6 cyberpolitics
Week 6 cyberpolitics
 
Excellent Roth IRA Alternative
Excellent Roth IRA  AlternativeExcellent Roth IRA  Alternative
Excellent Roth IRA Alternative
 
Sph 106 Ch 15
Sph 106 Ch 15Sph 106 Ch 15
Sph 106 Ch 15
 
Egipto
EgiptoEgipto
Egipto
 
Web Application Security For Small and Medium Businesses
Web Application Security For Small and Medium BusinessesWeb Application Security For Small and Medium Businesses
Web Application Security For Small and Medium Businesses
 

Similar to Scientific Applications of The Data Distribution Service

Hadoop fault tolerance
Hadoop  fault toleranceHadoop  fault tolerance
Hadoop fault tolerancePallav Jha
 
Hadoop first mr job - inverted index construction
Hadoop first mr job - inverted index constructionHadoop first mr job - inverted index construction
Hadoop first mr job - inverted index constructionSubhas Kumar Ghosh
 
Overview of Scientific Workflows - Why Use Them?
Overview of Scientific Workflows - Why Use Them?Overview of Scientific Workflows - Why Use Them?
Overview of Scientific Workflows - Why Use Them?inside-BigData.com
 
How Texas Instruments Uses InfluxDB to Uphold Product Standards and to Improv...
How Texas Instruments Uses InfluxDB to Uphold Product Standards and to Improv...How Texas Instruments Uses InfluxDB to Uphold Product Standards and to Improv...
How Texas Instruments Uses InfluxDB to Uphold Product Standards and to Improv...InfluxData
 
Atomate: a high-level interface to generate, execute, and analyze computation...
Atomate: a high-level interface to generate, execute, and analyze computation...Atomate: a high-level interface to generate, execute, and analyze computation...
Atomate: a high-level interface to generate, execute, and analyze computation...Anubhav Jain
 
Juniper Innovation Contest
Juniper Innovation ContestJuniper Innovation Contest
Juniper Innovation ContestAMIT BORUDE
 
Hanborq optimizations on hadoop map reduce 20120221a
Hanborq optimizations on hadoop map reduce 20120221aHanborq optimizations on hadoop map reduce 20120221a
Hanborq optimizations on hadoop map reduce 20120221aSchubert Zhang
 
How Texas Instruments Uses InfluxDB to Uphold Product Standards and to Improv...
How Texas Instruments Uses InfluxDB to Uphold Product Standards and to Improv...How Texas Instruments Uses InfluxDB to Uphold Product Standards and to Improv...
How Texas Instruments Uses InfluxDB to Uphold Product Standards and to Improv...DevOps.com
 
Common Design Elements for Data Movement Eli Dart
Common Design Elements for Data Movement Eli DartCommon Design Elements for Data Movement Eli Dart
Common Design Elements for Data Movement Eli DartEd Dodds
 
Extending Hadoop for Fun & Profit
Extending Hadoop for Fun & ProfitExtending Hadoop for Fun & Profit
Extending Hadoop for Fun & ProfitMilind Bhandarkar
 
Spark Overview and Performance Issues
Spark Overview and Performance IssuesSpark Overview and Performance Issues
Spark Overview and Performance IssuesAntonios Katsarakis
 
PEARC 17: Spark On the ARC
PEARC 17: Spark On the ARCPEARC 17: Spark On the ARC
PEARC 17: Spark On the ARCHimanshu Bedi
 
Seminar Presentation Hadoop
Seminar Presentation HadoopSeminar Presentation Hadoop
Seminar Presentation HadoopVarun Narang
 
Hadoop Tutorial.ppt
Hadoop Tutorial.pptHadoop Tutorial.ppt
Hadoop Tutorial.pptSathish24111
 
Hadoop training-in-hyderabad
Hadoop training-in-hyderabadHadoop training-in-hyderabad
Hadoop training-in-hyderabadsreehari orienit
 
Apache Tez : Accelerating Hadoop Query Processing
Apache Tez : Accelerating Hadoop Query ProcessingApache Tez : Accelerating Hadoop Query Processing
Apache Tez : Accelerating Hadoop Query ProcessingTeddy Choi
 

Similar to Scientific Applications of The Data Distribution Service (20)

Hadoop fault tolerance
Hadoop  fault toleranceHadoop  fault tolerance
Hadoop fault tolerance
 
Hadoop map reduce concepts
Hadoop map reduce conceptsHadoop map reduce concepts
Hadoop map reduce concepts
 
Hadoop first mr job - inverted index construction
Hadoop first mr job - inverted index constructionHadoop first mr job - inverted index construction
Hadoop first mr job - inverted index construction
 
Overview of Scientific Workflows - Why Use Them?
Overview of Scientific Workflows - Why Use Them?Overview of Scientific Workflows - Why Use Them?
Overview of Scientific Workflows - Why Use Them?
 
How Texas Instruments Uses InfluxDB to Uphold Product Standards and to Improv...
How Texas Instruments Uses InfluxDB to Uphold Product Standards and to Improv...How Texas Instruments Uses InfluxDB to Uphold Product Standards and to Improv...
How Texas Instruments Uses InfluxDB to Uphold Product Standards and to Improv...
 
Atomate: a high-level interface to generate, execute, and analyze computation...
Atomate: a high-level interface to generate, execute, and analyze computation...Atomate: a high-level interface to generate, execute, and analyze computation...
Atomate: a high-level interface to generate, execute, and analyze computation...
 
Juniper Innovation Contest
Juniper Innovation ContestJuniper Innovation Contest
Juniper Innovation Contest
 
Hanborq optimizations on hadoop map reduce 20120221a
Hanborq optimizations on hadoop map reduce 20120221aHanborq optimizations on hadoop map reduce 20120221a
Hanborq optimizations on hadoop map reduce 20120221a
 
try
trytry
try
 
How Texas Instruments Uses InfluxDB to Uphold Product Standards and to Improv...
How Texas Instruments Uses InfluxDB to Uphold Product Standards and to Improv...How Texas Instruments Uses InfluxDB to Uphold Product Standards and to Improv...
How Texas Instruments Uses InfluxDB to Uphold Product Standards and to Improv...
 
Common Design Elements for Data Movement Eli Dart
Common Design Elements for Data Movement Eli DartCommon Design Elements for Data Movement Eli Dart
Common Design Elements for Data Movement Eli Dart
 
Extending Hadoop for Fun & Profit
Extending Hadoop for Fun & ProfitExtending Hadoop for Fun & Profit
Extending Hadoop for Fun & Profit
 
Spark Overview and Performance Issues
Spark Overview and Performance IssuesSpark Overview and Performance Issues
Spark Overview and Performance Issues
 
PEARC 17: Spark On the ARC
PEARC 17: Spark On the ARCPEARC 17: Spark On the ARC
PEARC 17: Spark On the ARC
 
lecture1.ppt
lecture1.pptlecture1.ppt
lecture1.ppt
 
Hadoop tutorial
Hadoop tutorialHadoop tutorial
Hadoop tutorial
 
Seminar Presentation Hadoop
Seminar Presentation HadoopSeminar Presentation Hadoop
Seminar Presentation Hadoop
 
Hadoop Tutorial.ppt
Hadoop Tutorial.pptHadoop Tutorial.ppt
Hadoop Tutorial.ppt
 
Hadoop training-in-hyderabad
Hadoop training-in-hyderabadHadoop training-in-hyderabad
Hadoop training-in-hyderabad
 
Apache Tez : Accelerating Hadoop Query Processing
Apache Tez : Accelerating Hadoop Query ProcessingApache Tez : Accelerating Hadoop Query Processing
Apache Tez : Accelerating Hadoop Query Processing
 

More from Angelo Corsaro

zenoh: The Edge Data Fabric
zenoh: The Edge Data Fabriczenoh: The Edge Data Fabric
zenoh: The Edge Data FabricAngelo Corsaro
 
Data Decentralisation: Efficiency, Privacy and Fair Monetisation
Data Decentralisation: Efficiency, Privacy and Fair MonetisationData Decentralisation: Efficiency, Privacy and Fair Monetisation
Data Decentralisation: Efficiency, Privacy and Fair MonetisationAngelo Corsaro
 
zenoh: zero overhead pub/sub store/query compute
zenoh: zero overhead pub/sub store/query computezenoh: zero overhead pub/sub store/query compute
zenoh: zero overhead pub/sub store/query computeAngelo Corsaro
 
zenoh -- the ZEro Network OverHead protocol
zenoh -- the ZEro Network OverHead protocolzenoh -- the ZEro Network OverHead protocol
zenoh -- the ZEro Network OverHead protocolAngelo Corsaro
 
zenoh -- the ZEro Network OverHead protocol
zenoh -- the ZEro Network OverHead protocolzenoh -- the ZEro Network OverHead protocol
zenoh -- the ZEro Network OverHead protocolAngelo Corsaro
 
Breaking the Edge -- A Journey Through Cloud, Edge and Fog Computing
Breaking the Edge -- A Journey Through Cloud, Edge and Fog ComputingBreaking the Edge -- A Journey Through Cloud, Edge and Fog Computing
Breaking the Edge -- A Journey Through Cloud, Edge and Fog ComputingAngelo Corsaro
 
fog05: The Fog Computing Infrastructure
fog05: The Fog Computing Infrastructurefog05: The Fog Computing Infrastructure
fog05: The Fog Computing InfrastructureAngelo Corsaro
 
Cyclone DDS: Sharing Data in the IoT Age
Cyclone DDS: Sharing Data in the IoT AgeCyclone DDS: Sharing Data in the IoT Age
Cyclone DDS: Sharing Data in the IoT AgeAngelo Corsaro
 
fog05: The Fog Computing Platform
fog05: The Fog Computing Platformfog05: The Fog Computing Platform
fog05: The Fog Computing PlatformAngelo Corsaro
 
Programming in Scala - Lecture Four
Programming in Scala - Lecture FourProgramming in Scala - Lecture Four
Programming in Scala - Lecture FourAngelo Corsaro
 
Programming in Scala - Lecture Three
Programming in Scala - Lecture ThreeProgramming in Scala - Lecture Three
Programming in Scala - Lecture ThreeAngelo Corsaro
 
Programming in Scala - Lecture Two
Programming in Scala - Lecture TwoProgramming in Scala - Lecture Two
Programming in Scala - Lecture TwoAngelo Corsaro
 
Programming in Scala - Lecture One
Programming in Scala - Lecture OneProgramming in Scala - Lecture One
Programming in Scala - Lecture OneAngelo Corsaro
 
Data Sharing in Extremely Resource Constrained Envionrments
Data Sharing in Extremely Resource Constrained EnvionrmentsData Sharing in Extremely Resource Constrained Envionrments
Data Sharing in Extremely Resource Constrained EnvionrmentsAngelo Corsaro
 
The DDS Security Standard
The DDS Security StandardThe DDS Security Standard
The DDS Security StandardAngelo Corsaro
 
The Data Distribution Service
The Data Distribution ServiceThe Data Distribution Service
The Data Distribution ServiceAngelo Corsaro
 
RUSTing -- Partially Ordered Rust Programming Ruminations
RUSTing -- Partially Ordered Rust Programming RuminationsRUSTing -- Partially Ordered Rust Programming Ruminations
RUSTing -- Partially Ordered Rust Programming RuminationsAngelo Corsaro
 

More from Angelo Corsaro (20)

Zenoh: The Genesis
Zenoh: The GenesisZenoh: The Genesis
Zenoh: The Genesis
 
zenoh: The Edge Data Fabric
zenoh: The Edge Data Fabriczenoh: The Edge Data Fabric
zenoh: The Edge Data Fabric
 
Zenoh Tutorial
Zenoh TutorialZenoh Tutorial
Zenoh Tutorial
 
Data Decentralisation: Efficiency, Privacy and Fair Monetisation
Data Decentralisation: Efficiency, Privacy and Fair MonetisationData Decentralisation: Efficiency, Privacy and Fair Monetisation
Data Decentralisation: Efficiency, Privacy and Fair Monetisation
 
zenoh: zero overhead pub/sub store/query compute
zenoh: zero overhead pub/sub store/query computezenoh: zero overhead pub/sub store/query compute
zenoh: zero overhead pub/sub store/query compute
 
zenoh -- the ZEro Network OverHead protocol
zenoh -- the ZEro Network OverHead protocolzenoh -- the ZEro Network OverHead protocol
zenoh -- the ZEro Network OverHead protocol
 
zenoh -- the ZEro Network OverHead protocol
zenoh -- the ZEro Network OverHead protocolzenoh -- the ZEro Network OverHead protocol
zenoh -- the ZEro Network OverHead protocol
 
Breaking the Edge -- A Journey Through Cloud, Edge and Fog Computing
Breaking the Edge -- A Journey Through Cloud, Edge and Fog ComputingBreaking the Edge -- A Journey Through Cloud, Edge and Fog Computing
Breaking the Edge -- A Journey Through Cloud, Edge and Fog Computing
 
Eastern Sicily
Eastern SicilyEastern Sicily
Eastern Sicily
 
fog05: The Fog Computing Infrastructure
fog05: The Fog Computing Infrastructurefog05: The Fog Computing Infrastructure
fog05: The Fog Computing Infrastructure
 
Cyclone DDS: Sharing Data in the IoT Age
Cyclone DDS: Sharing Data in the IoT AgeCyclone DDS: Sharing Data in the IoT Age
Cyclone DDS: Sharing Data in the IoT Age
 
fog05: The Fog Computing Platform
fog05: The Fog Computing Platformfog05: The Fog Computing Platform
fog05: The Fog Computing Platform
 
Programming in Scala - Lecture Four
Programming in Scala - Lecture FourProgramming in Scala - Lecture Four
Programming in Scala - Lecture Four
 
Programming in Scala - Lecture Three
Programming in Scala - Lecture ThreeProgramming in Scala - Lecture Three
Programming in Scala - Lecture Three
 
Programming in Scala - Lecture Two
Programming in Scala - Lecture TwoProgramming in Scala - Lecture Two
Programming in Scala - Lecture Two
 
Programming in Scala - Lecture One
Programming in Scala - Lecture OneProgramming in Scala - Lecture One
Programming in Scala - Lecture One
 
Data Sharing in Extremely Resource Constrained Envionrments
Data Sharing in Extremely Resource Constrained EnvionrmentsData Sharing in Extremely Resource Constrained Envionrments
Data Sharing in Extremely Resource Constrained Envionrments
 
The DDS Security Standard
The DDS Security StandardThe DDS Security Standard
The DDS Security Standard
 
The Data Distribution Service
The Data Distribution ServiceThe Data Distribution Service
The Data Distribution Service
 
RUSTing -- Partially Ordered Rust Programming Ruminations
RUSTing -- Partially Ordered Rust Programming RuminationsRUSTing -- Partially Ordered Rust Programming Ruminations
RUSTing -- Partially Ordered Rust Programming Ruminations
 

Recently uploaded

DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyAlfredo García Lavilla
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxhariprasad279825
 
Vector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector DatabasesVector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector DatabasesZilliz
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Manik S Magar
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machinePadma Pradeep
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...Fwdays
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Enterprise Knowledge
 

Recently uploaded (20)

DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easy
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptx
 
Vector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector DatabasesVector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector Databases
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machine
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024
 

Scientific Applications of The Data Distribution Service

  • 1. Scientific Applications of Data Distribution Service Svetlana Shasharina#, Nanbor Wang, Rooparani Pundaleeka, James Matykiewicz and Steve Goldhaber http://www.txcorp.com # sveta@txcorp.com
  • 2. Outline • Introduction – Tech-X Corporation – Some driving scientific applications – Common requirements • QuIDS project • Workflow management • Security experiments • Python DDS implementation • Summary
  • 3. http://www.txcorp.com Tech-X Corporation • Founded in 1994, located in Boulder CO • 65 people (mostly computational physics, app math, applied computer science) • Have been merging CS (C++, CORBA, GRID, MPI, GPU, complex via and data management) and physics and looking at DDS • Funded by DOE, NASA, DOD and sales • Applications in – Plasma modeling (accelerators, lasers, fusion devices, semiconductors) and beam physics – Nanotechnology – Data analysis
  • 4. Large Synoptic Survey Telescope (LSST) • On ground digital camera to build in Chile to start in 2020 (?). Funded by DOE, NASA, university, private sector • Up to 2000 images/day + calibration data -> 30 TB/day • Processed locally and reprocessed and archived in Illinois (National Center for Supercomputing Applications) • Uses OpenSplice for control software • Can we help with data management: orchestration of steps and monitoring of data processing?
  • 5. NoVA • NoVA: NuMI Off-axis ve (electron neutrino) Appearance experiment • Will generate neutrino beams at FNAL and send it to a detector in Ash River, Minnesota (500 mile in < 2 ms) • DOE funded (many labs and universities) • RMS (Responsive Messaging System) is DDS-based system to pass control and status messages in the NoVA data acquisition system (two types of topics but has many actual topics to implement point-to-point communications) • Will eventually need to go over WAN, and provide 10 Hz status transmissions between ~100 applications • Simplifies OpenSplice using traits (like simd-cxx) to minimize the amount of data types and mapping topics to strings
  • 6. SciDAC-II LQCD • LQCD: Lattice Quantum Chromodynamics (computational version of QCD: a theory of strong interaction involving quarks and gluons making up hadrons like protons and neutrons) • DDS is used to perform monitoring of clusters doing LQCD calculations (detect job failures, evaluate nodes loads and performance, start/kill etc) • Topics for monitoring and controls of jobs and resources • Use OpenSplice
  • 7. Common themes for scientific apps and DDS • RT issues are not well estimated • Common usability needs – Support for scientific data formats and data products (from and out of topics): domain schemas and data transformation tools – Control and monitoring topics (can we come up with reusable schema?) – Simple APIs corresponding to expectations of scientists – Ease of modification (evolving systems not just production systems) – QoS cookbook (how to get correct combinations) – General education • Is DDS good for point-to-point (Bill talks only to Pete) • How one uses DDS without killing the system (memory etc) • Other requirements – Site and community specific security – WAN operation (Chicago and Berkeley, for example)
  • 8. Common extra expectations • Automated test harness: – How one tests for correct behavior and QoS – Avoid regression in rapidly evolving system modified by a team • Interacting with databases (all data should be archived and allow queries) • Can we do everything using DDS to minimize external dependencies? – Efficient bulk data transfer (usually point-to-point and BIG triangles :-) – Workflow engine (workflow: loosely coupled applications often through files and can be distributed, while simulations is typically tightly coupled on a HPC resource) • Interacting with Web interfaces and Web Services
  • 9. QuIDS: to address some issues • QuIDS: Quality Information Distribution System • Helping the applications above through Phase II SBIR from DOE (HEP office) • Collaboration of Tech-X and Fermilab • Goals (we will talk about the ones in red in rest of this talk): – Implement a DDS-based system to monitor distributed processing of astronomical data • Simplifying C++(done with simd-cxx?) and Python APIs • Support for FITS and monitoring and control data, images, histograms, spectra • Security • WAN • Testing harness – Investigate of of DDS for workflow management
  • 10. QuIDS, bird eye view (courtesy of FNAL)
  • 11. QuIDS at FNAL computational domain MCTopic SciTopic Monitor MCTopic W W W R R R R W CampaignManager MCTopic MCTopic Workflow Application(s) of apps SciTopic R W W Computa-onal  Domain
  • 12. Generic workflows: do we need all? • Workflow is something outside of HPC (loosely coupled and can tolerate even WAN, while simulation is something that goes to qsub…) • Kepler (de-facto workflow engine expected for DOE applications): – Support for complex workflows – Java based – Heavy and hard to learn – Not portable to future platforms (DOE supercomputers might not have Java at all) • Real workflows in astronomy are simple (do not expressivness of full programming language or pi-algebra) – Pipelines – DAGs • How one implements such workflows using DDS?
  • 13. Parallel pipeline: most of astronomy workflows Worker(0) Task(0) Task(1) Task(N-1) Initialize Task(0) Task(1) Task(N-1) Finalize Worker(2) Task(0) Task(1) Task(N-1) Tasks can be continued by different working processes: data can be passed between them (the Worker(1) performs Task(1) using data from Worker (0))
  • 14. ddspipe: current implementation of workflow engine • Parallel pipeline job consist of – Initialization phase (splitting data into manageable pieces) running on one node – Parallel worker processes doing data processing tasks (possible not sharing the same address space) – Finalization step (merging data into a final image or movie) • There is an expected order in tasks, so that tasks can be numbered and output data of a previous step as input to next • Design decisions for now: – Workers do not communicate to each other – Workers are given work by a mediating entity: tuple space manager (no self-organization) – No scheduling for now (round-robin: tasks and workers are queued in the server) – Workers can get data coming from a task completed by a different worker (do not address the problem of data transfer now) • All communication is via DDS topics while data to work on is available through local files to all workers
  • 15. GDS = Tuple Space but we want more (?) • Tickets: – Task ticket (id, indata, out data, status) – Task status: standby, ready, running, finished – Worker ticket (id, task ticket, status) – Worker status: ready, busy • Classic tuple space = set of task ticket and we could use only them but… instead of dealing with a self-organized (wild) system, we would like to implement – Workflow: M sequences of tasks with matching in and out data – Scheduling (based on policies, resources, location of workers) – Fault-tolerance (detecting and rescheduling of incomplete tasks) • Hence: we decided to have a class TupleSpaceServer to address these (currently just pipeline and queues and no FT)
  • 16. ddspipe classes: • Initializer – Splits data, possibly creates workers and workspaces, publishes (for all initial work tickets with correct specification of the workflow • TupleSpaceServer – Changes status in task tickets in accordance with the workflow order: once a worker reports that task n done, a ticket for task n-1 with <n-1 in-data> = <n out-data> is changed to ready and worker topic is published (with the worker id next in the queue). Once a worker reports that is doing this work, the status is changed to running etc. • Workers – Publish their status – Listen to task assignment (matching its id to the one in the worker ticket) • Finalizer – Whatever to finish up (merge data and clean)
  • 17. States of Tasks in Tuple Space Initial jobStatus,Run Eventual ticket ticket states states Taks executed sequentially Tuplespace Task0 Task0 internal Task0 Task0 Standby Ready Running Completed scheduling taskStatus,Task0,Completed Tuplespace Task1 Task1 internal Task1 Task1 Standby Ready Running Completed scheduling taskStatus,Taskn-1,Completed Tuplespace Taskn Taskn internal Taskn Taskn Standby Ready Running Completed scheduling Sequences proceed independently in parallel
  • 18. Tuple Space Manager Maintains Tasks Tickets and Schedules Tasks Tuple Space ticket Manager jobStatus Job Job Job task seq status Initializer jobStatus ID ID ID Finalizer workerStatus jobStatus workTicket Worker ticket Idle Workers Compl Dispos Run eted able Worker Worker Worker Worker Worker
  • 19. Status and next steps of ddspipe (beyond what bash can do :-) • Prototype working – Although we do have some issues with memory and bugs – I would like to experiment with no queuing: next task open for grabs if one of the tasks of the previous stage is finished • Next steps – User definition of workflow – Multiple jobs – Separation of worker manager from task manager? – Implementing workers doing slit-spectroscopy based on running R – DAG support – Some scheduling (balance between data transfer and mixing data between slow and fast workers?) – Data transfer implementation and in-memory data exchange
  • 20. Security for scientific projects: from nothing to everything • OpenSplice enterprise edition provides Secure Networking Service: – Security parameters (i.e., data encryption and authentication) are d fined in Security Profiles in the OpenSplice configuration file. – Node authentication and access control are also specified in the configuration profile. – Security Profiles are attached to partitions which set the security parameters in use for that partition. – Secure networking is activated as a domain Service • Scientists are not used to pay  • Used to: – Authenticate and authorize on connection – Rules are in admin area of a virtual organization (DDS should consult)
  • 21. Providing security in community edition of OpenSplice • Tried to replace the the lower networking layer of OpenSplice with OpenSSL and see how one can provide authentication, authorization and encryption • OpenSSL is an open source toolkit: – Secure Sockets Layer and Transport Layer Security (new standard, replacing SSL) – General purpose cryptography library – Public Key Infrastructure (PKI): e.g., certificate creation, signing and checking, CA management
  • 22. Switching to OpenSSL++ in the networking layer of community OpenSplice Applications Applications Data Centric Publish/Subscibe Data Centric Publish/Subscibe Real Time Publish/Subscribe Real Time Publish/Subscribe RT Net DDSi RT Net DDSi UDP / IP OpenSSL • The UDP/IP layer handles the interface to the operating- system’s socket interface • Switching to OpenSSL allowed us to establish secure tunnel between two sites • But the configuration should be done per each two nodes!
  • 23. Future Directions in Security Work • Waiting for new development and will be happy to use to implement what is expected by DOE labs (our collaborators) security • Explore user data etc fields to address applications specifics if this is not addressed by the security profile
  • 24. PyDDS: Python bindings for DDS communications • Started with SWIG for wrapping generated bindings: works fine but needed manual wrapping of multiple generated classes • Next worked with Boost.Python for wrapping of communication layer (set of classes that are used in the IDLPP generated code to call into OpenSplice for communication) so that there will be no need to wrap generated bindings • Problem: need to take care of inheritance manually and deal with several handlers that are unexposed C structs used in forward declarations
  • 25. Status and next steps for PyDDS • Hand-wrapping of C++ bindings using SWIG works: #!/usr/bin/python import time import TxddsPy qos = TxddsPy.Qos() qos.setTopicReliable() qos.setWriterReliable() writer = TxddsPy.RawImageWriter(”rawImage") writer.setTopicQos(qos) writer.setWriterQos(qos) data = TxddsPy.rawImage() writer.writeData(data) • Next: – Investigate wrapping communication classes so that we expose minimum for Boost.Python – Or develop tools for generating glue code needed to Boost.Python using a string list of topics
  • 26. QuIDS summary and future directions • We have prototyped DDS-bases solutions for astronomy data processing applications – Tools for bringing FITS data into DDS – Simple QoS scenarios (getting prepublished data) – Parallel pipeline workflows – Security studies – Python APIs • Possible next steps – Language to describe workflows for user input into the system – More complex workflows and concrete implementations – Implementing FNAL security requirements – WAN communication between FNAL and LBNL going through firewall – Streamlining glue code generation for Python – Testing harness – Bulk data transfers – Archiving data into databases – Web interfaces
  • 27. Acknowledgements • Ground Data System (GDS) team from Fermi National Accelerator Laboratory • PrismTech • Nikolay Malitsky (BNL) • OpenSplice mailing list and its contributors • US Department of Energy, Office of High Energy Physics