SlideShare a Scribd company logo
1 of 62
Wengines, Workflows, and 2
  years of advanced data
processing in Apache OODT

         Chris A. Mattmann
Senior Computer Scientist, NASA JPL
  Adjunct Assistant Professor, USC
Member, Apache Software Foundation
Agenda
• Apache OODT
• Workflow Support (Workflow1)
• Wengine features (NPP others)
• History and Status
• Where we‟re at




28-Feb-2013        ACNA2013-Mattmann   2
And you are?
                                             • Senior Computer Scientist
                                               at NASA JPL in
                                               Pasadena, CA USA
                                             • Software
                                               Architecture/Engineering
                                               Prof at Univ. of Southern
                                               California



              • Apache Executive Officer and Member involved in
                – OODT (PMC), Tika (PMC), Nutch (PMC), Incubator
                  (PMC), SIS (PMC), Gora (PMC), Airavata (PMC),
                  cTAKES (Mentor), lots of other projects
28-Feb-2013                    ACNA2013-Mattmann                    3
History of Apache OODT




“Oldies but goodies”      “Hard man”                    “Matt man and Crew”
information integration   2nd generation “better CAS”   Next generation CAS and
1st generation CAS        2003-2005                     open source@TheASF
1999-2003                                               2005-present


 28-Feb-2013                 ACNA2013-Mattmann                             4
Context
http://oodt.apache.org/components/maven/workfl
  ow/development/developer.html




28-Feb-2013        ACNA2013-Mattmann             5
Workflow Manager: some
                    terminology




28-Feb-2013           ACNA2013-Mattmann   6
“The Beginning of Workflow”
Chris and Paul learn about workflows - 2004
                        Raj Buyya A Taxonomy of
                        Workflow Management
                        Systems for Grid Computing


                         Workflow Patterns


                        http://workflowpatterns.com

28-Feb-2013             ACNA2013-Mattmann             7
“The Beginning: More”
Paul is initially more interested in workflows than
 Chris


Chris becomes interested in workflows b/c of this
 mission - http://oco.jpl.nasa.gov/




28-Feb-2013          ACNA2013-Mattmann                8
2005 – Oh No, a “mission!”
Was forced signed up to be the “Lead Process
 Control System (PCS) developer” for OCO


Was worried b/c existing CAS couldn‟t support
 OCO


Schemed brainstormed with Paul about what to
 do

28-Feb-2013            ACNA2013-Mattmann        9
What is Workflow Management?
Modeling, executing and monitoring groups of
 one or more Workflow Tasks
Tasks could be
     A script file
     A java process
     An external command
     A call to a web service
     Many more…


28-Feb-2013               ACNA2013-Mattmann    10
Workflow
Workflow has many definitions
     It‟s typically represented as a graph             Task B              Task E




                                    Task A                        Task D




                                                         Task C




     In traditional science data pipeline systems, this graph is constrained to be
        a sequential set of process nodes




28-Feb-2013                        ACNA2013-Mattmann                                 11
The State of Things
The existing CAS was able to handle sequential science data pipelines
  very well
     It handles them as a set of individual tasks that are mapped to a product
         type
     Tasks are kicked off on ingestion of a product
          Or by other tasks
However, the approach and process to executing pipelines and tasks
  was ad-hoc
     Task can kick off another task, but by communicating directly with the
       database to insert its “id” in the “next task” table
     Tasks are only grouped by product type, so you need to have a product type
       to have a group of associated tasks
Additionally, the approach didn‟t allow for parallel execution of tasks
     Tasks were put into a global queue
Also tasks from different “workflows” can compete against one another
   because the queue is global
28-Feb-2013                       ACNA2013-Mattmann                              12
Also control patterns are ad-hoc, does not support standard control flow
New Requirements and Drivers
Workflow should be represented as a graph. This will allow
 for true parallelism.
Workflow Management should support identified workflow
 patterns especially control-flow.
   The current level of support for control-flow has to a large extent
     been relegated to tasks. A collection of tasks is associated with a
     product ingestion and there is only a priority to sort out the order
     of execution.
Data-flow should be captured.
The workflow should be able to minimally hook together
  input and output streams between tasks.
Workflow need not have any interaction with a database
     What if I want to persist a workflow in XML?
     Or as a flat file, or some other lightweight format
28-Feb-2013                       ACNA2013-Mattmann                         13
Architectural Implications
Workflow Repositories
   Places to go and fetch and “abstract” workflow
     description from
Workflow Execution Engines
   Give it an abstract workflow, and let it rip
      Turns an abstract workflow into a “Workflow Instance”
   Should allow monitoring of the workflow instance
System interface
     Associate abstract workflows with “events”
     This way, workflows can be tied to things other than just
         product ingestion ACNA2013-Mattmann
28-Feb-2013                                                      14
How is this different from the
                     existing CAS?
The Workflow Repository need not be a relational Database
     It could be a flat file
     A (set of) XML file(s)
     An object database
     Factories create Workflow Repositories, which create Workflows
Tasks are associated with “Workflows”, not “Product Types”
     This decouples workflow from the File Management aspects of the
       CAS
Conditions can be pre, or post
     As opposed to the existing CAS where “Rules” are effectively pre-
       conditions on a task, and there is no concept of a post condition


28-Feb-2013                    ACNA2013-Mattmann                           15
How is this different from the
                     existing CAS?
Workflows are interfaces
     They could be backed by a (directed graph), or by an iterator (i.e., a
       sequential pipeline) or by a HashMap
Workflow Tasks have clearly separated out dynamic and
 static metadata, and they can share metadata
     Dynamic metadata is passed via the Workflow Engine between all
       the tasks in a workflow
          They can all read/write to it
     Static metadata is associated with each workflow task
Workflow Events are captured and delivered via Workflow
 Listeners, which are interfaces
     Many different backend implementations of Workflow Listeners
28-Feb-2013                         ACNA2013-Mattmann                         16
Workflow Execution
Once you‟ve got a Workflow, how do you
 execute it and turn it into a Workflow Instance?
You hand it off to a Workflow Engine




28-Feb-2013         ACNA2013-Mattmann               17
What does the Workflow Engine do?
Workflow Engine manages:
     A configurable, extensible thread pool
          “Worker Threads” are used to process the Workflow Instance
            they are each handed
     A queue of worker threads if they aren‟t any available
       workers in the thread pool to process a Workflow
     Monitoring which Workers are handling which Workflow
      Instances, and the state and status of each Workflow
      Instance
Workflow Engines execute instances of Workflows

28-Feb-2013                    ACNA2013-Mattmann                       18
What‟s the external interface to the
               system?
Event-based
     Event names come into the Workflow Manager
     The Workflow Manager looks up any Workflows
      associated with the event name
     The Workflow Manager then calls the Workflow
      Repository to obtain representations of the Workflow
     The Workflow Manager then hands off Workflow
      representations to the Workflow Engine for execution
Current implementation uses XML-RPC, but it‟s an
 interface, so it could use REST/HTTP/SOAP/etc.

28-Feb-2013               ACNA2013-Mattmann                  19
The Workflow Manager
So, how do we put all of these things together?
Well, something like:
     A Workflow Manager has
          One or more Workflow Repositories to obtain abstract
           Workflow descriptions from
          One or more Workflow Engines to execute Workflows on
          One or more external interfaces




28-Feb-2013                  ACNA2013-Mattmann                   20
We called this “Workflow1”
Worked great for OCO




28-Feb-2013            ACNA2013-Mattmann   21
Properties of Workflow1
ThreadPool Workflow Engine
  1 Thread per entire workflow instance
  Worked very well for routine production
  pipeline processing – we know that we will run
  A <= X <=B jobs per day where
     A is a good minimal bound on the max
  threads per JVM – totally OS dependent (256
  is a large number)
         B is the maximal number of threads that
    doesn‟t bound the JVM
28-Feb-2013             ACNA2013-Mattmann          22
ThreadPool was
http://svn.apache.org/repos/asf/oodt/trunk/workfl
  ow/src/main/resources/workflow.properties
Based on java.util.concurrent
ThreadPoolExecutor
Easily configurable
If you ran out of threads, scale horizontally and
   add more JVMs


28-Feb-2013           ACNA2013-Mattmann             23
Portion of workflow config for
                  ThreadPool Executor




28-Feb-2013              ACNA2013-Mattmann     24
Other Workflow1 Stuff
Branch and bounds was supported implicitly
    You want branch and bounds?
    1. Define N>1 Workflow that is mapped to an
    event name
         1a. Define N+1 workflow to be “reducer”
    2. It will be executed in parallel, hence the
    branch
    3. the Bounds is handled by a pre-condition on
    N+1 task
28-Feb-2013             ACNA2013-Mattmann            25
Metadata context keys

              Task T1     Task T2                  Task T3   Task T4




                        Workflow Instance "Shared Metadata
                                     Context"




                                     Task 1:
                               InputFiles: File1.txt
                               OutputFiles: File2.txt

                                     Task 2:
                               InputFiles: foobar.txt
                           OutputFiles: foo2.txt, foo1.txt

                                     Task 3:
                               OrbitNumber:900041
28-Feb-2013                  ACNA2013-Mattmann                         26
                                       Task4:
Problems with keys
Key naming collision
    Tasks needed to handle this explicitly in
    “production rules”
No grouping of keys
    Grouping was achieved using “_” key naming
    scheme
    PCS_InputFiles
    PCS_CrawlForDirs

28-Feb-2013            ACNA2013-Mattmann         27
Enter this guy

                               Not the one on the
                               left, that‟s my son


                           B Brian Foster
                                - now at Google,
                               curses!



28-Feb-2013      ACNA2013-Mattmann                   28
And this mission
http://npp.gsfc.nasa.gov


NPOESS Preparatory Project (NPP) now called
 Suomi NPP
    Sounder PEATE Testbed Element




28-Feb-2013         ACNA2013-Mattmann         29
They told Brian this
A little different than the OCO use case


So,.., the next THREE years worth of jobs, we‟d
 like to submit today…
    and then have your “workflow manager”
    manage the jobs for the next 3 years


This effectively blew up our thread pool workflow
 engine
28-Feb-2013          ACNA2013-Mattmann              30
Random David Woollard
                    sighting
                     David Woollard and Brian
                     Foster had to figure out how
                     to solve the NPP problem


                     Decided we need a new
                     workflow manager


                     …branch/fork/sigh

28-Feb-2013          ACNA2013-Mattmann              31
Not their fault
Paul R. and I and others didn‟t have time to fully
 watch this, and other OODT PMC members
 weren‟t really vested in those particular
 components


Brian was learning and doing great and we
  decided in the end that going off into a branch
  and not destroying Workflow1 users in the
  trunk was better than having to integrate
  everything…so we punted
28-Feb-2013          ACNA2013-Mattmann               32
NPP Pipeline – more SCF
                    than ops system
                                                       MetOpA IASI              MetOpA IASI
                                  IASI                                  IASI
                                                           L1C                  L1C Granule
                                GPolygon                                Map
                                                       GPolygon File              Map File

   MetOpA IASI
      L1C                                                                         MetOpA
                                                                       AMSU-A   AMSU-A L1B
                                           MetOpA                       Map     Granule Map
                    Orbit
                                             Orbit                                  File
                                           Boundary
                                             File



                                              MetOpA
     MetOpA                 AMSU-A
                                            AMSU-A L1B
   AMSU-A L1B               GPolygon
                                            GPolygon File
                                                                                MetOpA MHS
                                                      MetOpA MHS                L1B Granule
  MetOpA MHS                     MHS                                   MHS        Map File
                                                          L1B
      L1B                       GPolygon                               Map
                                                      GPolygon File




28-Feb-2013                                ACNA2013-Mattmann                                  33
Enter “Workflow2” or “Wengine”
What sucks about Workflow1?
    Can‟t explicitly model branch and bounds
       Fixed through “sequential” and “parallel”
    processors – Paul R.‟s idea OODT-70
    No global level workflow conditions
              Added them OODT-205
     Really only pre conditions in Workflow1
              Add post conditions OODT-502
28-Feb-2013                ACNA2013-Mattmann       34
More improvements
Condition timeouts
    OK it‟s timed out waiting for a file, run anyways
    OODT-207
Optional or required
     Allowing boolean OR based conditionals (test
    this and report its success, but don‟t block) –
    OODT-208
Better failure state reporting and checkpointing
OODT-206
28-Feb-2013            ACNA2013-Mattmann                35
Yes more improvements
Workflow Metadata keys
 https://oodt.jpl.nasa.gov/jira/browse/OODT-303
 (internal JPL JIRA -- was already fixed in ASF
 JIRA in 0.1-incubating)
   By Group, e.g.,
     PCS/InputFilesGroup/InputFiles
     PCS/Output/MetFileWriter
     PCS/FileManagerUrl
     Task1/SomeKey1
Collect all keys for a group
   wmet.search(“PCS”) -> all keys, can interrogate for values
28-Feb-2013                      ACNA2013-Mattmann              36
And more…
Workflow Lifecycle Management
    State-driven execution – inversion of control




What this literally means – in PCS stat and in
 PCS OPSUI you see more states
28-Feb-2013            ACNA2013-Mattmann            37
Runner Framework
Workflow1 had facilities to submit jobs to
 Resource Manager or to run them on its own
 locally
     Was a hack inside of
    IterativeWorkflowProcessorThread
Brian F. turned this into an explicit interface
Could hook Workflow directly to e.g., Hadoop
     I‟m not convinced this was the right way to do
    this, but I applaud the clean up of my code
28-Feb-2013            ACNA2013-Mattmann              38
Sub Workflows
Workflows whose sub-tasks can be other
 workflows (OODT-211)


Yes, this is recursive, and mind blowing


              Task T1                         Task T3   Task T4




                        workflow



28-Feb-2013                       ACNA2013-Mattmann               39
“Dynamic Workflows”
This is one of my favorites OODT-209


% ./wmgr-client --url
 http://localhost:9001 --operation
 --dynWorkflow --taskIds
 id1,id2,id3

     Task id1         Task id2             Task id3




28-Feb-2013            ACNA2013-Mattmann              40
Enough, how can I use all
                     this stuff?
Brian‟s code existed as forked and un-supported
  (by community) in NPP repo at JPL
Brian, by his own awesomeness, realizes before
  he leaves me for Google in 2011 that we need
  to push it to Apache
http://svn.apache.org/repos/asf/oodt/branches/w
  engine-branch - last working PEATE version



28-Feb-2013            ACNA2013-Mattmann          41
Chris spends 2 years figuring out
              what Brian did
OODT-215
     My initial “god” issue to solve everything in
    JIRA, tried to break the problem down into
    manageable steps
     Still took me 2 years – help from Paul R. and
    from Brian (even though he left for Google he
    still works on Apache OODT muwahahah)
OODT-491
     “Finish line tasks for Wengine”
28-Feb-2013             ACNA2013-Mattmann            42
Wengine support in trunk first
                   appears
In Apache OODT 0.4
    But was largely a work in progress, and
    well…didn‟t fully work
Apache OODT 0.5 happens
    back compat restored for “Workflow1” style
    engines
     Chris and Brian clean up a ton of the branch
    stuff, and finish most of OODT-491
Apache OODT 0.6 we finish for real real real
28-Feb-2013           ACNA2013-Mattmann             43
Who will use Wengine?
PEATE uses it today
      Their job processing requirements as an
    SCF are quite large


U.S. National Climate Assessment (NCA)
 project, “Snow Hydrology for the Western US
 and Alaska”
    will tell you about this on the next slides

28-Feb-2013             ACNA2013-Mattmann         44
Talk Part #2

Doing stuff with Wengine and why you
              should care
JPL Snow Server
http://snow.jpl.nasa.gov
Full bore processing and
  delivery system
     Near real time and
      historical processing
     Dust forcing and snow
      covered area products
     Tower data
     GIS interfaces
     CSV, JSON, GeoTIFF
      data format download
28-Feb-2013              ACNA2013-Mattmann   46
MODIS Snow Covered Area and
                  Grain Size (MODSCAG)
 JPL MODSCAG algorithm
 (Painter et al 2009)
 Spectral mixture analysis
 of MODIS Surface
 Reflectance products

 Daily 500 m coverage in
 late morning and early
 afternoon from NASA
 satellites Terra and Aqua
Credit: Tom Painter




                                            Upper Colorado River Basin
  28-Feb-2013                ACNA2013-Mattmann              March 9, 2009
                                                                     47
MODSCAG Processing: Two
                Products/ Two Inputs
MODIS tiles are defined by their horizontal and vertical tile IDs (the 2 characters
  after the h and the v respectively)


Historical Tiles over the Western United States (LPDAAC)
     Time Range: 2000 - Present
     h08v04, h08v05, h09v05, h09v04, h10v04
     LPDAAC is NASA Land Processes data center located at the USGS Earth
       Resources Observation and Science (EROS) Center in Sioux Falls, South Dako


MODIS Near Real-Time Products (LANCE MODIS NRT)
     Time Range: Dec 2011 - Present
     Western United States
     High Asia


28-Feb-2013                       ACNA2013-Mattmann                                 48
Credit: Cameron Goodale




28-Feb-2013               ACNA2013-Mattmann   49
Credit: Cameron Goodale




28-Feb-2013           ACNA2013-Mattmann   50
Dust Radiative Forcing




                                                                  (W/m2)
                                                                  Dust Radiative Forcing
                                                              300

                                                              200

                                                              100

                                                              0




MODDRFS
Dust Radiative Forcing in Snow from MODIS
  28-Feb-2013                        ACNA2013-Mattmann                   51
Painter and Bryant, 2012                                 17 May 2009
Now, what have I cooked up for
                  today?
I have an Orion SkyQuest XT8 Classic
   Dobsonian Telescope


I also have an iPhone 5




28-Feb-2013         ACNA2013-Mattmann   52
I had a few days of time for some
           great lunar science




28-Feb-2013      ACNA2013-Mattmann      53
As it turns out those images have
                 metadata




28-Feb-2013      ACNA2013-Mattmann      54
Add metadata
Geocoding, WGS84 lat, lng


Planetary met, TARGET=MOON, etc.




28-Feb-2013       ACNA2013-Mattmann   55
Found Hugin




28-Feb-2013     ACNA2013-Mattmann   56
Wanted to do something cool with it
Discovered enshape




Figured out how to make it combine images




28-Feb-2013          ACNA2013-Mattmann      57
Getting started
Workflow2 Quick Start on OODT Wiki
    https://cwiki.apache.org/OODT/workflow2-
    quick-start-guide.html


OODT documentation sucks! Check the wiki it‟s
 better there




28-Feb-2013          ACNA2013-Mattmann          58
Will now show you some workflow
                 stuff
Dreams of moon images, died


Will illustrate dynWorkflows




28-Feb-2013          ACNA2013-Mattmann   59
What‟s left?
Supporting looking up workflows by category
 (needed to say “give me all workflows that
 aren‟t „done‟) OODT-517


Fix the resource manager runner OODT-518


Fix all the wall clock and per task timing OODT-
  519

28-Feb-2013         ACNA2013-Mattmann              60
Want to help?
dev@oodt.apache.org


OODT-215 and OODT-491 homework


Get a beer with me or Brian


I bribe you?

28-Feb-2013         ACNA2013-Mattmann   61
Questions
Thanks!


Chris Mattmann
@chrismattmann
mattmann@apache.org




28-Feb-2013       ACNA2013-Mattmann   62

More Related Content

What's hot

Aucfanlab Datalake - Big Data Management Platform -
Aucfanlab Datalake - Big Data Management Platform -Aucfanlab Datalake - Big Data Management Platform -
Aucfanlab Datalake - Big Data Management Platform -Aucfan
 
PL/SQL Interview Questions
PL/SQL Interview QuestionsPL/SQL Interview Questions
PL/SQL Interview QuestionsSrinimf-Slides
 
The Evolution of the Oracle Database - Then, Now and Later (Fontys Hogeschool...
The Evolution of the Oracle Database - Then, Now and Later (Fontys Hogeschool...The Evolution of the Oracle Database - Then, Now and Later (Fontys Hogeschool...
The Evolution of the Oracle Database - Then, Now and Later (Fontys Hogeschool...Lucas Jellema
 
Big Data and Hadoop Guide
Big Data and Hadoop GuideBig Data and Hadoop Guide
Big Data and Hadoop GuideSimplilearn
 
Big Data - Hadoop Ecosystem
Big Data -  Hadoop Ecosystem Big Data -  Hadoop Ecosystem
Big Data - Hadoop Ecosystem nuriadelasheras
 
CS 542 Parallel DBs, NoSQL, MapReduce
CS 542 Parallel DBs, NoSQL, MapReduceCS 542 Parallel DBs, NoSQL, MapReduce
CS 542 Parallel DBs, NoSQL, MapReduceJ Singh
 
PLSQL Standards and Best Practices
PLSQL Standards and Best PracticesPLSQL Standards and Best Practices
PLSQL Standards and Best PracticesAlwyn D'Souza
 
Exploring Microsoft Azure Infrastructures
Exploring Microsoft Azure InfrastructuresExploring Microsoft Azure Infrastructures
Exploring Microsoft Azure InfrastructuresCCG
 
Oracle Exadata Interview Questions and Answers
Oracle Exadata Interview Questions and AnswersOracle Exadata Interview Questions and Answers
Oracle Exadata Interview Questions and AnswersExadatadba
 
Oracle vs. MS SQL Server
Oracle vs. MS SQL ServerOracle vs. MS SQL Server
Oracle vs. MS SQL ServerTeresa Rothaar
 
20140908 spark sql & catalyst
20140908 spark sql & catalyst20140908 spark sql & catalyst
20140908 spark sql & catalystTakuya UESHIN
 
Bringing OLTP woth OLAP: Lumos on Hadoop
Bringing OLTP woth OLAP: Lumos on HadoopBringing OLTP woth OLAP: Lumos on Hadoop
Bringing OLTP woth OLAP: Lumos on HadoopDataWorks Summit
 
Brief Introduction about Hadoop and Core Services.
Brief Introduction about Hadoop and Core Services.Brief Introduction about Hadoop and Core Services.
Brief Introduction about Hadoop and Core Services.Muthu Natarajan
 
Oracle restful api & data live charting by Oracle Apex - داشبورد آنلاین (داده...
Oracle restful api & data live charting by Oracle Apex - داشبورد آنلاین (داده...Oracle restful api & data live charting by Oracle Apex - داشبورد آنلاین (داده...
Oracle restful api & data live charting by Oracle Apex - داشبورد آنلاین (داده...mahdi ahmadi
 
Overview of oracle database
Overview of oracle databaseOverview of oracle database
Overview of oracle databaseSamar Prasad
 

What's hot (20)

Aucfanlab Datalake - Big Data Management Platform -
Aucfanlab Datalake - Big Data Management Platform -Aucfanlab Datalake - Big Data Management Platform -
Aucfanlab Datalake - Big Data Management Platform -
 
Oracle archi ppt
Oracle archi pptOracle archi ppt
Oracle archi ppt
 
PL/SQL Interview Questions
PL/SQL Interview QuestionsPL/SQL Interview Questions
PL/SQL Interview Questions
 
The Evolution of the Oracle Database - Then, Now and Later (Fontys Hogeschool...
The Evolution of the Oracle Database - Then, Now and Later (Fontys Hogeschool...The Evolution of the Oracle Database - Then, Now and Later (Fontys Hogeschool...
The Evolution of the Oracle Database - Then, Now and Later (Fontys Hogeschool...
 
Oracle Complete Interview Questions
Oracle Complete Interview QuestionsOracle Complete Interview Questions
Oracle Complete Interview Questions
 
Big Data and Hadoop Guide
Big Data and Hadoop GuideBig Data and Hadoop Guide
Big Data and Hadoop Guide
 
Big Data - Hadoop Ecosystem
Big Data -  Hadoop Ecosystem Big Data -  Hadoop Ecosystem
Big Data - Hadoop Ecosystem
 
CS 542 Parallel DBs, NoSQL, MapReduce
CS 542 Parallel DBs, NoSQL, MapReduceCS 542 Parallel DBs, NoSQL, MapReduce
CS 542 Parallel DBs, NoSQL, MapReduce
 
PLSQL Standards and Best Practices
PLSQL Standards and Best PracticesPLSQL Standards and Best Practices
PLSQL Standards and Best Practices
 
Exploring Microsoft Azure Infrastructures
Exploring Microsoft Azure InfrastructuresExploring Microsoft Azure Infrastructures
Exploring Microsoft Azure Infrastructures
 
Oracle Exadata Interview Questions and Answers
Oracle Exadata Interview Questions and AnswersOracle Exadata Interview Questions and Answers
Oracle Exadata Interview Questions and Answers
 
Oracle vs. MS SQL Server
Oracle vs. MS SQL ServerOracle vs. MS SQL Server
Oracle vs. MS SQL Server
 
20140908 spark sql & catalyst
20140908 spark sql & catalyst20140908 spark sql & catalyst
20140908 spark sql & catalyst
 
Oracle DB
Oracle DBOracle DB
Oracle DB
 
Bringing OLTP woth OLAP: Lumos on Hadoop
Bringing OLTP woth OLAP: Lumos on HadoopBringing OLTP woth OLAP: Lumos on Hadoop
Bringing OLTP woth OLAP: Lumos on Hadoop
 
Brief Introduction about Hadoop and Core Services.
Brief Introduction about Hadoop and Core Services.Brief Introduction about Hadoop and Core Services.
Brief Introduction about Hadoop and Core Services.
 
Oracle restful api & data live charting by Oracle Apex - داشبورد آنلاین (داده...
Oracle restful api & data live charting by Oracle Apex - داشبورد آنلاین (داده...Oracle restful api & data live charting by Oracle Apex - داشبورد آنلاین (داده...
Oracle restful api & data live charting by Oracle Apex - داشبورد آنلاین (داده...
 
Oracle database introduction
Oracle database introductionOracle database introduction
Oracle database introduction
 
Introduction to Oracle
Introduction to OracleIntroduction to Oracle
Introduction to Oracle
 
Overview of oracle database
Overview of oracle databaseOverview of oracle database
Overview of oracle database
 

Viewers also liked

绩效管理 1
绩效管理 1绩效管理 1
绩效管理 120004
 
改变从心态开始
改变从心态开始改变从心态开始
改变从心态开始20004
 
培育团队精神
培育团队精神培育团队精神
培育团队精神20004
 
Swot分析與生涯規劃
Swot分析與生涯規劃Swot分析與生涯規劃
Swot分析與生涯規劃20004
 
Emotional and Psychological Impact Of The Recession On Consumer Behavior Ge...
Emotional and Psychological Impact Of The Recession On Consumer Behavior   Ge...Emotional and Psychological Impact Of The Recession On Consumer Behavior   Ge...
Emotional and Psychological Impact Of The Recession On Consumer Behavior Ge...Val Srinivas
 
Not-So-Hidden Disability: Building Community Through Fashionable Technology
Not-So-Hidden Disability: Building Community Through Fashionable TechnologyNot-So-Hidden Disability: Building Community Through Fashionable Technology
Not-So-Hidden Disability: Building Community Through Fashionable Technologyflobotic
 
进阶策略销售培训
进阶策略销售培训进阶策略销售培训
进阶策略销售培训20004
 
应聘人员综合素质测试题(Pdf 9)
应聘人员综合素质测试题(Pdf 9)应聘人员综合素质测试题(Pdf 9)
应聘人员综合素质测试题(Pdf 9)20004
 
Nlp致胜行销学
Nlp致胜行销学Nlp致胜行销学
Nlp致胜行销学20004
 
Ccmt企业教练管理工作坊(下)
Ccmt企业教练管理工作坊(下)Ccmt企业教练管理工作坊(下)
Ccmt企业教练管理工作坊(下)20004
 
Christmas Carols
Christmas CarolsChristmas Carols
Christmas Carolsgymnasio
 
Prospering Patch Exhib Presentation
Prospering Patch Exhib PresentationProspering Patch Exhib Presentation
Prospering Patch Exhib Presentationprosperingpatch
 
七天学会时间管理
七天学会时间管理七天学会时间管理
七天学会时间管理20004
 
积极心态培训
积极心态培训积极心态培训
积极心态培训20004
 

Viewers also liked (20)

Mtv
MtvMtv
Mtv
 
Chiodo Watch Web Presentation
Chiodo Watch Web PresentationChiodo Watch Web Presentation
Chiodo Watch Web Presentation
 
绩效管理 1
绩效管理 1绩效管理 1
绩效管理 1
 
改变从心态开始
改变从心态开始改变从心态开始
改变从心态开始
 
培育团队精神
培育团队精神培育团队精神
培育团队精神
 
Swot分析與生涯規劃
Swot分析與生涯規劃Swot分析與生涯規劃
Swot分析與生涯規劃
 
Emotional and Psychological Impact Of The Recession On Consumer Behavior Ge...
Emotional and Psychological Impact Of The Recession On Consumer Behavior   Ge...Emotional and Psychological Impact Of The Recession On Consumer Behavior   Ge...
Emotional and Psychological Impact Of The Recession On Consumer Behavior Ge...
 
Not-So-Hidden Disability: Building Community Through Fashionable Technology
Not-So-Hidden Disability: Building Community Through Fashionable TechnologyNot-So-Hidden Disability: Building Community Through Fashionable Technology
Not-So-Hidden Disability: Building Community Through Fashionable Technology
 
Web Presen
Web PresenWeb Presen
Web Presen
 
进阶策略销售培训
进阶策略销售培训进阶策略销售培训
进阶策略销售培训
 
应聘人员综合素质测试题(Pdf 9)
应聘人员综合素质测试题(Pdf 9)应聘人员综合素质测试题(Pdf 9)
应聘人员综合素质测试题(Pdf 9)
 
Nlp致胜行销学
Nlp致胜行销学Nlp致胜行销学
Nlp致胜行销学
 
Ccmt企业教练管理工作坊(下)
Ccmt企业教练管理工作坊(下)Ccmt企业教练管理工作坊(下)
Ccmt企业教练管理工作坊(下)
 
Web Presen
Web PresenWeb Presen
Web Presen
 
Christmas Carols
Christmas CarolsChristmas Carols
Christmas Carols
 
Prospering Patch Exhib Presentation
Prospering Patch Exhib PresentationProspering Patch Exhib Presentation
Prospering Patch Exhib Presentation
 
6
66
6
 
七天学会时间管理
七天学会时间管理七天学会时间管理
七天学会时间管理
 
积极心态培训
积极心态培训积极心态培训
积极心态培训
 
Recycle
RecycleRecycle
Recycle
 

Similar to Wengines, Workflows, and 2 years of advanced data processing in Apache OODT

Scalable scheduling of updates in streaming data warehouses
Scalable scheduling of updates in streaming data warehousesScalable scheduling of updates in streaming data warehouses
Scalable scheduling of updates in streaming data warehousesFinalyear Projects
 
REAL TIME PROJECTS IEEE BASED PROJECTS EMBEDDED SYSTEMS PAPER PUBLICATIONS M...
REAL TIME PROJECTS  IEEE BASED PROJECTS EMBEDDED SYSTEMS PAPER PUBLICATIONS M...REAL TIME PROJECTS  IEEE BASED PROJECTS EMBEDDED SYSTEMS PAPER PUBLICATIONS M...
REAL TIME PROJECTS IEEE BASED PROJECTS EMBEDDED SYSTEMS PAPER PUBLICATIONS M...Finalyear Projects
 
report
reportreport
reportbutest
 
Scalable Parallel Performance Measurement with the Scalasca Toolset
Scalable Parallel Performance Measurement with the Scalasca ToolsetScalable Parallel Performance Measurement with the Scalasca Toolset
Scalable Parallel Performance Measurement with the Scalasca ToolsetIntel IT Center
 
Exploiting Web Technologies to connect business process management and engine...
Exploiting Web Technologies to connect business process management and engine...Exploiting Web Technologies to connect business process management and engine...
Exploiting Web Technologies to connect business process management and engine...Stefano Costanzo
 
Everything comes in 3's
Everything comes in 3'sEverything comes in 3's
Everything comes in 3'sdelagoya
 
An Overview of VIEW
An Overview of VIEWAn Overview of VIEW
An Overview of VIEWShiyong Lu
 
Tordatasci meetup-precima-retail-analytics-201901
Tordatasci meetup-precima-retail-analytics-201901Tordatasci meetup-precima-retail-analytics-201901
Tordatasci meetup-precima-retail-analytics-201901WeCloudData
 
Don't Cross The Streams - Data Streaming And Apache Flink
Don't Cross The Streams  - Data Streaming And Apache FlinkDon't Cross The Streams  - Data Streaming And Apache Flink
Don't Cross The Streams - Data Streaming And Apache FlinkJohn Gorman (BSc, CISSP)
 
Time Series Analysis… using an Event Streaming Platform
Time Series Analysis… using an Event Streaming PlatformTime Series Analysis… using an Event Streaming Platform
Time Series Analysis… using an Event Streaming Platformconfluent
 
Time Series Analysis Using an Event Streaming Platform
 Time Series Analysis Using an Event Streaming Platform Time Series Analysis Using an Event Streaming Platform
Time Series Analysis Using an Event Streaming PlatformDr. Mirko Kämpf
 
Possible Worlds Explorer: Datalog & Answer Set Programming for the Rest of Us
Possible Worlds Explorer: Datalog & Answer Set Programming for the Rest of UsPossible Worlds Explorer: Datalog & Answer Set Programming for the Rest of Us
Possible Worlds Explorer: Datalog & Answer Set Programming for the Rest of UsBertram Ludäscher
 
Azure Stream Analytics
Azure Stream AnalyticsAzure Stream Analytics
Azure Stream AnalyticsMarco Parenzan
 
WorkflowHub: Community Framework for Enabling Scientific Workflow Research a...
WorkflowHub: Community Framework for Enabling  Scientific Workflow Research a...WorkflowHub: Community Framework for Enabling  Scientific Workflow Research a...
WorkflowHub: Community Framework for Enabling Scientific Workflow Research a...Rafael Ferreira da Silva
 
Yaroslav Nedashkovsky "How to manage hundreds of pipelines for processing da...
Yaroslav Nedashkovsky  "How to manage hundreds of pipelines for processing da...Yaroslav Nedashkovsky  "How to manage hundreds of pipelines for processing da...
Yaroslav Nedashkovsky "How to manage hundreds of pipelines for processing da...Lviv Startup Club
 
Sql Server Performance Tuning
Sql Server Performance TuningSql Server Performance Tuning
Sql Server Performance TuningBala Subra
 
IRJET - Efficient Load Balancing in a Distributed Environment
IRJET -  	  Efficient Load Balancing in a Distributed EnvironmentIRJET -  	  Efficient Load Balancing in a Distributed Environment
IRJET - Efficient Load Balancing in a Distributed EnvironmentIRJET Journal
 
Crack the complexity of oracle applications r12 workload v2
Crack the complexity of oracle applications r12 workload v2Crack the complexity of oracle applications r12 workload v2
Crack the complexity of oracle applications r12 workload v2Ajith Narayanan
 
Performance Tuning With Oracle ASH and AWR. Part 1 How And What
Performance Tuning With Oracle ASH and AWR. Part 1 How And WhatPerformance Tuning With Oracle ASH and AWR. Part 1 How And What
Performance Tuning With Oracle ASH and AWR. Part 1 How And Whatudaymoogala
 

Similar to Wengines, Workflows, and 2 years of advanced data processing in Apache OODT (20)

Scalable scheduling of updates in streaming data warehouses
Scalable scheduling of updates in streaming data warehousesScalable scheduling of updates in streaming data warehouses
Scalable scheduling of updates in streaming data warehouses
 
REAL TIME PROJECTS IEEE BASED PROJECTS EMBEDDED SYSTEMS PAPER PUBLICATIONS M...
REAL TIME PROJECTS  IEEE BASED PROJECTS EMBEDDED SYSTEMS PAPER PUBLICATIONS M...REAL TIME PROJECTS  IEEE BASED PROJECTS EMBEDDED SYSTEMS PAPER PUBLICATIONS M...
REAL TIME PROJECTS IEEE BASED PROJECTS EMBEDDED SYSTEMS PAPER PUBLICATIONS M...
 
report
reportreport
report
 
Scalable Parallel Performance Measurement with the Scalasca Toolset
Scalable Parallel Performance Measurement with the Scalasca ToolsetScalable Parallel Performance Measurement with the Scalasca Toolset
Scalable Parallel Performance Measurement with the Scalasca Toolset
 
Exploiting Web Technologies to connect business process management and engine...
Exploiting Web Technologies to connect business process management and engine...Exploiting Web Technologies to connect business process management and engine...
Exploiting Web Technologies to connect business process management and engine...
 
Everything comes in 3's
Everything comes in 3'sEverything comes in 3's
Everything comes in 3's
 
An Overview of VIEW
An Overview of VIEWAn Overview of VIEW
An Overview of VIEW
 
Tordatasci meetup-precima-retail-analytics-201901
Tordatasci meetup-precima-retail-analytics-201901Tordatasci meetup-precima-retail-analytics-201901
Tordatasci meetup-precima-retail-analytics-201901
 
Actian Matrix Whitepaper
 Actian Matrix Whitepaper Actian Matrix Whitepaper
Actian Matrix Whitepaper
 
Don't Cross The Streams - Data Streaming And Apache Flink
Don't Cross The Streams  - Data Streaming And Apache FlinkDon't Cross The Streams  - Data Streaming And Apache Flink
Don't Cross The Streams - Data Streaming And Apache Flink
 
Time Series Analysis… using an Event Streaming Platform
Time Series Analysis… using an Event Streaming PlatformTime Series Analysis… using an Event Streaming Platform
Time Series Analysis… using an Event Streaming Platform
 
Time Series Analysis Using an Event Streaming Platform
 Time Series Analysis Using an Event Streaming Platform Time Series Analysis Using an Event Streaming Platform
Time Series Analysis Using an Event Streaming Platform
 
Possible Worlds Explorer: Datalog & Answer Set Programming for the Rest of Us
Possible Worlds Explorer: Datalog & Answer Set Programming for the Rest of UsPossible Worlds Explorer: Datalog & Answer Set Programming for the Rest of Us
Possible Worlds Explorer: Datalog & Answer Set Programming for the Rest of Us
 
Azure Stream Analytics
Azure Stream AnalyticsAzure Stream Analytics
Azure Stream Analytics
 
WorkflowHub: Community Framework for Enabling Scientific Workflow Research a...
WorkflowHub: Community Framework for Enabling  Scientific Workflow Research a...WorkflowHub: Community Framework for Enabling  Scientific Workflow Research a...
WorkflowHub: Community Framework for Enabling Scientific Workflow Research a...
 
Yaroslav Nedashkovsky "How to manage hundreds of pipelines for processing da...
Yaroslav Nedashkovsky  "How to manage hundreds of pipelines for processing da...Yaroslav Nedashkovsky  "How to manage hundreds of pipelines for processing da...
Yaroslav Nedashkovsky "How to manage hundreds of pipelines for processing da...
 
Sql Server Performance Tuning
Sql Server Performance TuningSql Server Performance Tuning
Sql Server Performance Tuning
 
IRJET - Efficient Load Balancing in a Distributed Environment
IRJET -  	  Efficient Load Balancing in a Distributed EnvironmentIRJET -  	  Efficient Load Balancing in a Distributed Environment
IRJET - Efficient Load Balancing in a Distributed Environment
 
Crack the complexity of oracle applications r12 workload v2
Crack the complexity of oracle applications r12 workload v2Crack the complexity of oracle applications r12 workload v2
Crack the complexity of oracle applications r12 workload v2
 
Performance Tuning With Oracle ASH and AWR. Part 1 How And What
Performance Tuning With Oracle ASH and AWR. Part 1 How And WhatPerformance Tuning With Oracle ASH and AWR. Part 1 How And What
Performance Tuning With Oracle ASH and AWR. Part 1 How And What
 

More from Chris Mattmann

Scalable Data Mining and Archiving in the Era of the Square Kilometre Array
Scalable Data Mining and Archiving in the Era of the Square Kilometre ArrayScalable Data Mining and Archiving in the Era of the Square Kilometre Array
Scalable Data Mining and Archiving in the Era of the Square Kilometre ArrayChris Mattmann
 
Teaching NASA to Open Source its Software the Apache Way
Teaching NASA to Open Source its Software the Apache WayTeaching NASA to Open Source its Software the Apache Way
Teaching NASA to Open Source its Software the Apache WayChris Mattmann
 
Apache Tika: 1 point Oh!
Apache Tika: 1 point Oh!Apache Tika: 1 point Oh!
Apache Tika: 1 point Oh!Chris Mattmann
 
Supercharging your Apache OODT deployments with the Process Control System
Supercharging your Apache OODT deployments with the Process Control SystemSupercharging your Apache OODT deployments with the Process Control System
Supercharging your Apache OODT deployments with the Process Control SystemChris Mattmann
 
Understanding the Meaningful Use of Open Source Software
Understanding the Meaningful Use of Open Source SoftwareUnderstanding the Meaningful Use of Open Source Software
Understanding the Meaningful Use of Open Source SoftwareChris Mattmann
 
An Open Source Strategy for NASA
An Open Source Strategy for NASAAn Open Source Strategy for NASA
An Open Source Strategy for NASAChris Mattmann
 
Lessons Learned in the Development of a Web-scale Search Engine: Nutch2 and b...
Lessons Learned in the Development of a Web-scale Search Engine: Nutch2 and b...Lessons Learned in the Development of a Web-scale Search Engine: Nutch2 and b...
Lessons Learned in the Development of a Web-scale Search Engine: Nutch2 and b...Chris Mattmann
 
Lessons Learned in the Development of a Web-scale Search Engine: Nutch2 and b...
Lessons Learned in the Development of a Web-scale Search Engine: Nutch2 and b...Lessons Learned in the Development of a Web-scale Search Engine: Nutch2 and b...
Lessons Learned in the Development of a Web-scale Search Engine: Nutch2 and b...Chris Mattmann
 
Scientific data curation and processing with Apache Tika
Scientific data curation and processing with Apache TikaScientific data curation and processing with Apache Tika
Scientific data curation and processing with Apache TikaChris Mattmann
 

More from Chris Mattmann (9)

Scalable Data Mining and Archiving in the Era of the Square Kilometre Array
Scalable Data Mining and Archiving in the Era of the Square Kilometre ArrayScalable Data Mining and Archiving in the Era of the Square Kilometre Array
Scalable Data Mining and Archiving in the Era of the Square Kilometre Array
 
Teaching NASA to Open Source its Software the Apache Way
Teaching NASA to Open Source its Software the Apache WayTeaching NASA to Open Source its Software the Apache Way
Teaching NASA to Open Source its Software the Apache Way
 
Apache Tika: 1 point Oh!
Apache Tika: 1 point Oh!Apache Tika: 1 point Oh!
Apache Tika: 1 point Oh!
 
Supercharging your Apache OODT deployments with the Process Control System
Supercharging your Apache OODT deployments with the Process Control SystemSupercharging your Apache OODT deployments with the Process Control System
Supercharging your Apache OODT deployments with the Process Control System
 
Understanding the Meaningful Use of Open Source Software
Understanding the Meaningful Use of Open Source SoftwareUnderstanding the Meaningful Use of Open Source Software
Understanding the Meaningful Use of Open Source Software
 
An Open Source Strategy for NASA
An Open Source Strategy for NASAAn Open Source Strategy for NASA
An Open Source Strategy for NASA
 
Lessons Learned in the Development of a Web-scale Search Engine: Nutch2 and b...
Lessons Learned in the Development of a Web-scale Search Engine: Nutch2 and b...Lessons Learned in the Development of a Web-scale Search Engine: Nutch2 and b...
Lessons Learned in the Development of a Web-scale Search Engine: Nutch2 and b...
 
Lessons Learned in the Development of a Web-scale Search Engine: Nutch2 and b...
Lessons Learned in the Development of a Web-scale Search Engine: Nutch2 and b...Lessons Learned in the Development of a Web-scale Search Engine: Nutch2 and b...
Lessons Learned in the Development of a Web-scale Search Engine: Nutch2 and b...
 
Scientific data curation and processing with Apache Tika
Scientific data curation and processing with Apache TikaScientific data curation and processing with Apache Tika
Scientific data curation and processing with Apache Tika
 

Recently uploaded

WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Wonjun Hwang
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxNavinnSomaal
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machinePadma Pradeep
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostZilliz
 

Recently uploaded (20)

WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptx
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machine
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
 

Wengines, Workflows, and 2 years of advanced data processing in Apache OODT

  • 1. Wengines, Workflows, and 2 years of advanced data processing in Apache OODT Chris A. Mattmann Senior Computer Scientist, NASA JPL Adjunct Assistant Professor, USC Member, Apache Software Foundation
  • 2. Agenda • Apache OODT • Workflow Support (Workflow1) • Wengine features (NPP others) • History and Status • Where we‟re at 28-Feb-2013 ACNA2013-Mattmann 2
  • 3. And you are? • Senior Computer Scientist at NASA JPL in Pasadena, CA USA • Software Architecture/Engineering Prof at Univ. of Southern California • Apache Executive Officer and Member involved in – OODT (PMC), Tika (PMC), Nutch (PMC), Incubator (PMC), SIS (PMC), Gora (PMC), Airavata (PMC), cTAKES (Mentor), lots of other projects 28-Feb-2013 ACNA2013-Mattmann 3
  • 4. History of Apache OODT “Oldies but goodies” “Hard man” “Matt man and Crew” information integration 2nd generation “better CAS” Next generation CAS and 1st generation CAS 2003-2005 open source@TheASF 1999-2003 2005-present 28-Feb-2013 ACNA2013-Mattmann 4
  • 6. Workflow Manager: some terminology 28-Feb-2013 ACNA2013-Mattmann 6
  • 7. “The Beginning of Workflow” Chris and Paul learn about workflows - 2004 Raj Buyya A Taxonomy of Workflow Management Systems for Grid Computing Workflow Patterns http://workflowpatterns.com 28-Feb-2013 ACNA2013-Mattmann 7
  • 8. “The Beginning: More” Paul is initially more interested in workflows than Chris Chris becomes interested in workflows b/c of this mission - http://oco.jpl.nasa.gov/ 28-Feb-2013 ACNA2013-Mattmann 8
  • 9. 2005 – Oh No, a “mission!” Was forced signed up to be the “Lead Process Control System (PCS) developer” for OCO Was worried b/c existing CAS couldn‟t support OCO Schemed brainstormed with Paul about what to do 28-Feb-2013 ACNA2013-Mattmann 9
  • 10. What is Workflow Management? Modeling, executing and monitoring groups of one or more Workflow Tasks Tasks could be A script file A java process An external command A call to a web service Many more… 28-Feb-2013 ACNA2013-Mattmann 10
  • 11. Workflow Workflow has many definitions It‟s typically represented as a graph Task B Task E Task A Task D Task C In traditional science data pipeline systems, this graph is constrained to be a sequential set of process nodes 28-Feb-2013 ACNA2013-Mattmann 11
  • 12. The State of Things The existing CAS was able to handle sequential science data pipelines very well It handles them as a set of individual tasks that are mapped to a product type Tasks are kicked off on ingestion of a product Or by other tasks However, the approach and process to executing pipelines and tasks was ad-hoc Task can kick off another task, but by communicating directly with the database to insert its “id” in the “next task” table Tasks are only grouped by product type, so you need to have a product type to have a group of associated tasks Additionally, the approach didn‟t allow for parallel execution of tasks Tasks were put into a global queue Also tasks from different “workflows” can compete against one another because the queue is global 28-Feb-2013 ACNA2013-Mattmann 12 Also control patterns are ad-hoc, does not support standard control flow
  • 13. New Requirements and Drivers Workflow should be represented as a graph. This will allow for true parallelism. Workflow Management should support identified workflow patterns especially control-flow. The current level of support for control-flow has to a large extent been relegated to tasks. A collection of tasks is associated with a product ingestion and there is only a priority to sort out the order of execution. Data-flow should be captured. The workflow should be able to minimally hook together input and output streams between tasks. Workflow need not have any interaction with a database What if I want to persist a workflow in XML? Or as a flat file, or some other lightweight format 28-Feb-2013 ACNA2013-Mattmann 13
  • 14. Architectural Implications Workflow Repositories Places to go and fetch and “abstract” workflow description from Workflow Execution Engines Give it an abstract workflow, and let it rip Turns an abstract workflow into a “Workflow Instance” Should allow monitoring of the workflow instance System interface Associate abstract workflows with “events” This way, workflows can be tied to things other than just product ingestion ACNA2013-Mattmann 28-Feb-2013 14
  • 15. How is this different from the existing CAS? The Workflow Repository need not be a relational Database It could be a flat file A (set of) XML file(s) An object database Factories create Workflow Repositories, which create Workflows Tasks are associated with “Workflows”, not “Product Types” This decouples workflow from the File Management aspects of the CAS Conditions can be pre, or post As opposed to the existing CAS where “Rules” are effectively pre- conditions on a task, and there is no concept of a post condition 28-Feb-2013 ACNA2013-Mattmann 15
  • 16. How is this different from the existing CAS? Workflows are interfaces They could be backed by a (directed graph), or by an iterator (i.e., a sequential pipeline) or by a HashMap Workflow Tasks have clearly separated out dynamic and static metadata, and they can share metadata Dynamic metadata is passed via the Workflow Engine between all the tasks in a workflow They can all read/write to it Static metadata is associated with each workflow task Workflow Events are captured and delivered via Workflow Listeners, which are interfaces Many different backend implementations of Workflow Listeners 28-Feb-2013 ACNA2013-Mattmann 16
  • 17. Workflow Execution Once you‟ve got a Workflow, how do you execute it and turn it into a Workflow Instance? You hand it off to a Workflow Engine 28-Feb-2013 ACNA2013-Mattmann 17
  • 18. What does the Workflow Engine do? Workflow Engine manages: A configurable, extensible thread pool “Worker Threads” are used to process the Workflow Instance they are each handed A queue of worker threads if they aren‟t any available workers in the thread pool to process a Workflow Monitoring which Workers are handling which Workflow Instances, and the state and status of each Workflow Instance Workflow Engines execute instances of Workflows 28-Feb-2013 ACNA2013-Mattmann 18
  • 19. What‟s the external interface to the system? Event-based Event names come into the Workflow Manager The Workflow Manager looks up any Workflows associated with the event name The Workflow Manager then calls the Workflow Repository to obtain representations of the Workflow The Workflow Manager then hands off Workflow representations to the Workflow Engine for execution Current implementation uses XML-RPC, but it‟s an interface, so it could use REST/HTTP/SOAP/etc. 28-Feb-2013 ACNA2013-Mattmann 19
  • 20. The Workflow Manager So, how do we put all of these things together? Well, something like: A Workflow Manager has One or more Workflow Repositories to obtain abstract Workflow descriptions from One or more Workflow Engines to execute Workflows on One or more external interfaces 28-Feb-2013 ACNA2013-Mattmann 20
  • 21. We called this “Workflow1” Worked great for OCO 28-Feb-2013 ACNA2013-Mattmann 21
  • 22. Properties of Workflow1 ThreadPool Workflow Engine 1 Thread per entire workflow instance Worked very well for routine production pipeline processing – we know that we will run A <= X <=B jobs per day where A is a good minimal bound on the max threads per JVM – totally OS dependent (256 is a large number) B is the maximal number of threads that doesn‟t bound the JVM 28-Feb-2013 ACNA2013-Mattmann 22
  • 23. ThreadPool was http://svn.apache.org/repos/asf/oodt/trunk/workfl ow/src/main/resources/workflow.properties Based on java.util.concurrent ThreadPoolExecutor Easily configurable If you ran out of threads, scale horizontally and add more JVMs 28-Feb-2013 ACNA2013-Mattmann 23
  • 24. Portion of workflow config for ThreadPool Executor 28-Feb-2013 ACNA2013-Mattmann 24
  • 25. Other Workflow1 Stuff Branch and bounds was supported implicitly You want branch and bounds? 1. Define N>1 Workflow that is mapped to an event name 1a. Define N+1 workflow to be “reducer” 2. It will be executed in parallel, hence the branch 3. the Bounds is handled by a pre-condition on N+1 task 28-Feb-2013 ACNA2013-Mattmann 25
  • 26. Metadata context keys Task T1 Task T2 Task T3 Task T4 Workflow Instance "Shared Metadata Context" Task 1: InputFiles: File1.txt OutputFiles: File2.txt Task 2: InputFiles: foobar.txt OutputFiles: foo2.txt, foo1.txt Task 3: OrbitNumber:900041 28-Feb-2013 ACNA2013-Mattmann 26 Task4:
  • 27. Problems with keys Key naming collision Tasks needed to handle this explicitly in “production rules” No grouping of keys Grouping was achieved using “_” key naming scheme PCS_InputFiles PCS_CrawlForDirs 28-Feb-2013 ACNA2013-Mattmann 27
  • 28. Enter this guy Not the one on the left, that‟s my son B Brian Foster - now at Google, curses! 28-Feb-2013 ACNA2013-Mattmann 28
  • 29. And this mission http://npp.gsfc.nasa.gov NPOESS Preparatory Project (NPP) now called Suomi NPP Sounder PEATE Testbed Element 28-Feb-2013 ACNA2013-Mattmann 29
  • 30. They told Brian this A little different than the OCO use case So,.., the next THREE years worth of jobs, we‟d like to submit today… and then have your “workflow manager” manage the jobs for the next 3 years This effectively blew up our thread pool workflow engine 28-Feb-2013 ACNA2013-Mattmann 30
  • 31. Random David Woollard sighting David Woollard and Brian Foster had to figure out how to solve the NPP problem Decided we need a new workflow manager …branch/fork/sigh 28-Feb-2013 ACNA2013-Mattmann 31
  • 32. Not their fault Paul R. and I and others didn‟t have time to fully watch this, and other OODT PMC members weren‟t really vested in those particular components Brian was learning and doing great and we decided in the end that going off into a branch and not destroying Workflow1 users in the trunk was better than having to integrate everything…so we punted 28-Feb-2013 ACNA2013-Mattmann 32
  • 33. NPP Pipeline – more SCF than ops system MetOpA IASI MetOpA IASI IASI IASI L1C L1C Granule GPolygon Map GPolygon File Map File MetOpA IASI L1C MetOpA AMSU-A AMSU-A L1B MetOpA Map Granule Map Orbit Orbit File Boundary File MetOpA MetOpA AMSU-A AMSU-A L1B AMSU-A L1B GPolygon GPolygon File MetOpA MHS MetOpA MHS L1B Granule MetOpA MHS MHS MHS Map File L1B L1B GPolygon Map GPolygon File 28-Feb-2013 ACNA2013-Mattmann 33
  • 34. Enter “Workflow2” or “Wengine” What sucks about Workflow1? Can‟t explicitly model branch and bounds Fixed through “sequential” and “parallel” processors – Paul R.‟s idea OODT-70 No global level workflow conditions Added them OODT-205 Really only pre conditions in Workflow1 Add post conditions OODT-502 28-Feb-2013 ACNA2013-Mattmann 34
  • 35. More improvements Condition timeouts OK it‟s timed out waiting for a file, run anyways OODT-207 Optional or required Allowing boolean OR based conditionals (test this and report its success, but don‟t block) – OODT-208 Better failure state reporting and checkpointing OODT-206 28-Feb-2013 ACNA2013-Mattmann 35
  • 36. Yes more improvements Workflow Metadata keys https://oodt.jpl.nasa.gov/jira/browse/OODT-303 (internal JPL JIRA -- was already fixed in ASF JIRA in 0.1-incubating) By Group, e.g., PCS/InputFilesGroup/InputFiles PCS/Output/MetFileWriter PCS/FileManagerUrl Task1/SomeKey1 Collect all keys for a group wmet.search(“PCS”) -> all keys, can interrogate for values 28-Feb-2013 ACNA2013-Mattmann 36
  • 37. And more… Workflow Lifecycle Management State-driven execution – inversion of control What this literally means – in PCS stat and in PCS OPSUI you see more states 28-Feb-2013 ACNA2013-Mattmann 37
  • 38. Runner Framework Workflow1 had facilities to submit jobs to Resource Manager or to run them on its own locally Was a hack inside of IterativeWorkflowProcessorThread Brian F. turned this into an explicit interface Could hook Workflow directly to e.g., Hadoop I‟m not convinced this was the right way to do this, but I applaud the clean up of my code 28-Feb-2013 ACNA2013-Mattmann 38
  • 39. Sub Workflows Workflows whose sub-tasks can be other workflows (OODT-211) Yes, this is recursive, and mind blowing Task T1 Task T3 Task T4 workflow 28-Feb-2013 ACNA2013-Mattmann 39
  • 40. “Dynamic Workflows” This is one of my favorites OODT-209 % ./wmgr-client --url http://localhost:9001 --operation --dynWorkflow --taskIds id1,id2,id3 Task id1 Task id2 Task id3 28-Feb-2013 ACNA2013-Mattmann 40
  • 41. Enough, how can I use all this stuff? Brian‟s code existed as forked and un-supported (by community) in NPP repo at JPL Brian, by his own awesomeness, realizes before he leaves me for Google in 2011 that we need to push it to Apache http://svn.apache.org/repos/asf/oodt/branches/w engine-branch - last working PEATE version 28-Feb-2013 ACNA2013-Mattmann 41
  • 42. Chris spends 2 years figuring out what Brian did OODT-215 My initial “god” issue to solve everything in JIRA, tried to break the problem down into manageable steps Still took me 2 years – help from Paul R. and from Brian (even though he left for Google he still works on Apache OODT muwahahah) OODT-491 “Finish line tasks for Wengine” 28-Feb-2013 ACNA2013-Mattmann 42
  • 43. Wengine support in trunk first appears In Apache OODT 0.4 But was largely a work in progress, and well…didn‟t fully work Apache OODT 0.5 happens back compat restored for “Workflow1” style engines Chris and Brian clean up a ton of the branch stuff, and finish most of OODT-491 Apache OODT 0.6 we finish for real real real 28-Feb-2013 ACNA2013-Mattmann 43
  • 44. Who will use Wengine? PEATE uses it today Their job processing requirements as an SCF are quite large U.S. National Climate Assessment (NCA) project, “Snow Hydrology for the Western US and Alaska” will tell you about this on the next slides 28-Feb-2013 ACNA2013-Mattmann 44
  • 45. Talk Part #2 Doing stuff with Wengine and why you should care
  • 46. JPL Snow Server http://snow.jpl.nasa.gov Full bore processing and delivery system Near real time and historical processing Dust forcing and snow covered area products Tower data GIS interfaces CSV, JSON, GeoTIFF data format download 28-Feb-2013 ACNA2013-Mattmann 46
  • 47. MODIS Snow Covered Area and Grain Size (MODSCAG) JPL MODSCAG algorithm (Painter et al 2009) Spectral mixture analysis of MODIS Surface Reflectance products Daily 500 m coverage in late morning and early afternoon from NASA satellites Terra and Aqua Credit: Tom Painter Upper Colorado River Basin 28-Feb-2013 ACNA2013-Mattmann March 9, 2009 47
  • 48. MODSCAG Processing: Two Products/ Two Inputs MODIS tiles are defined by their horizontal and vertical tile IDs (the 2 characters after the h and the v respectively) Historical Tiles over the Western United States (LPDAAC) Time Range: 2000 - Present h08v04, h08v05, h09v05, h09v04, h10v04 LPDAAC is NASA Land Processes data center located at the USGS Earth Resources Observation and Science (EROS) Center in Sioux Falls, South Dako MODIS Near Real-Time Products (LANCE MODIS NRT) Time Range: Dec 2011 - Present Western United States High Asia 28-Feb-2013 ACNA2013-Mattmann 48
  • 49. Credit: Cameron Goodale 28-Feb-2013 ACNA2013-Mattmann 49
  • 50. Credit: Cameron Goodale 28-Feb-2013 ACNA2013-Mattmann 50
  • 51. Dust Radiative Forcing (W/m2) Dust Radiative Forcing 300 200 100 0 MODDRFS Dust Radiative Forcing in Snow from MODIS 28-Feb-2013 ACNA2013-Mattmann 51 Painter and Bryant, 2012 17 May 2009
  • 52. Now, what have I cooked up for today? I have an Orion SkyQuest XT8 Classic Dobsonian Telescope I also have an iPhone 5 28-Feb-2013 ACNA2013-Mattmann 52
  • 53. I had a few days of time for some great lunar science 28-Feb-2013 ACNA2013-Mattmann 53
  • 54. As it turns out those images have metadata 28-Feb-2013 ACNA2013-Mattmann 54
  • 55. Add metadata Geocoding, WGS84 lat, lng Planetary met, TARGET=MOON, etc. 28-Feb-2013 ACNA2013-Mattmann 55
  • 56. Found Hugin 28-Feb-2013 ACNA2013-Mattmann 56
  • 57. Wanted to do something cool with it Discovered enshape Figured out how to make it combine images 28-Feb-2013 ACNA2013-Mattmann 57
  • 58. Getting started Workflow2 Quick Start on OODT Wiki https://cwiki.apache.org/OODT/workflow2- quick-start-guide.html OODT documentation sucks! Check the wiki it‟s better there 28-Feb-2013 ACNA2013-Mattmann 58
  • 59. Will now show you some workflow stuff Dreams of moon images, died Will illustrate dynWorkflows 28-Feb-2013 ACNA2013-Mattmann 59
  • 60. What‟s left? Supporting looking up workflows by category (needed to say “give me all workflows that aren‟t „done‟) OODT-517 Fix the resource manager runner OODT-518 Fix all the wall clock and per task timing OODT- 519 28-Feb-2013 ACNA2013-Mattmann 60
  • 61. Want to help? dev@oodt.apache.org OODT-215 and OODT-491 homework Get a beer with me or Brian I bribe you? 28-Feb-2013 ACNA2013-Mattmann 61