Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Stream Execution with Clojure and Fork/join


Published on

One of the greatest benefits of Clojure is its ability to create simple, powerful abstractions that operate at the level of the problem while also operating at the level of the language.
This talk discusses a query processing engine built in Clojure that leverages this abstraction power to combine streams of data for efficient concurrent execution.
* Representing processing trees as s-expressions
* Streams as sequences of data
* Optimizing processing trees by manipulating s-expressions
* Direct execution of s-expression trees
* Compilation of s-expressions into nodes and pipes
* Concurrent processing nodes and pipes using a fork/join pool

Published in: Technology
  • Login to see the comments

Stream Execution with Clojure and Fork/join

  1. 1. Stream Executionwith Clojure and Fork/JoinAlex Miller - @puredangerRevelytix -
  2. 2. Contents• Query execution - the problem• Plan representation - plans in our program• Processing components - building blocks• Processing execution - executing plans 2
  3. 3. Query Execution
  4. 4. Relational Data & Queries SELECT NAME NAME AGE FROM PERSON Joe 30 WHERE AGE > 20 4
  5. 5. RDF"Resource Description Framework" - a fine-grained graph representation of data o/ a ge 30 http : / / d em http://data/Joe http:// d emo/na "Joe" meSubject Predicate Object http://data/Joe http://demo/age 30 http://data/Joe http://demo/name "Joe" 5
  6. 6. SPARQL queriesSPARQL is a query language for RDF PREFIX demo: <http://demo/> SELECT ?name A "triple pattern" WHERE { ?person demo:age ?age. Natural join on ?person ?person demo:name ?name. FILTER (?age > 20) } 6
  7. 7. Relational-to-RDF• W3C R2RML mappings define how to virtually map a relational db into RDF PREFIX demo: <http://demo/> SELECT ?name SELECT NAME WHERE { FROM PERSON ?person demo:age ?age. WHERE AGE > 20 ?person demo:name ?name. FILTER (?age > 20) } NAME AGE emo/ag e 30 Joe 30 http://d http://data/Joe 7 http:// d emo/na "Joe" me
  8. 8. Enterprise federation• Model domain at enterprise level• Map into data sources• Federate across the enterprise (and beyond) SPARQL Enterprise SPARQL SPARQL SPARQL SQL SQL SQL 8
  9. 9. Query pipeline• How does a query engine work? SQL ST an an an A Pl Pl Pl Parse Plan Resolve Optimize Process Metadata Results! 9
  10. 10. Trees! SQL Trees! ST n n n A Pla Pla Pla Parse Plan Resolve Optimize Process Metadata Results! 10
  11. 11. PlanRepresentation
  12. 12. SQL query plans SELECT Name, DeptName FROM Person, Dept WHERE Person.DeptID = Dept.DeptID AND Age > 20 Person Name Age DeptID join filter project DeptID Age > 20 Name, DeptName Dept DeptID DeptName 12
  13. 13. SPARQL query plans SELECT ?Name WHERE { ?Person :Name ?Name . ?Person :Age ?Age . FILTER (?Age > 20) } TP1 { ?Person :Name ?Name } join filter project ?Person ?Age > 20 ?Name TP2 { ?Person :Age ?Age } 13
  14. 14. Common modelStreams of tuples flowing through a network ofprocessing nodes node node node node node 14
  15. 15. What kind of nodes?• Tuple generators (leaves) – In SQL: a table or view – In SPARQL: a triple pattern• Combinations (multiple children) – Join – Union• Transformations – Filter – Project – Dup removal – Slice (limit / offset) – Sort – etc – Grouping 15
  16. 16. RepresentationTree data structure with nodes and attributes PlanNode childNodes TableNode Table JoinNode joinType Java joinCriteria FilterNode criteria ProjectNode projectExpressions SliceNode limit offset 16
  17. 17. s-expressionsTree data structure with nodes and attributes (* (+ 2 3) (- 6 5) ) 17
  18. 18. List representationTree data structure with nodes and attributes (project+ [Name DeptName] (filter+ (> Age 20) (join+ (table+ Empl [Name Age DeptID]) (table+ Dept [DeptID DeptName])))) 18
  19. 19. Query optimizationExample - pushing criteria down (project+ [Name DeptName] (filter+ (> Age 20) (join+ (project+ [Name Age DeptID] (bind+ [Age (- (now) Birth)] (table+ Empl [Name Birth DeptID]))) (table+ Dept [DeptID DeptName])))) 19
  20. 20. Query optimizationExample - rewritten (project+ [Name DeptName] (join+ (project+ [Name DeptID] (filter+ (> (- (now) Birth) 20) (table+ Empl [Name Birth DeptID]))) (table+ Dept [DeptID DeptName]))) 20
  21. 21. Hash join conversion left tree join+right treeleft tree preduce+ first+ hashes hash-tuples let+ mapcat tuple-matches right tree 21
  22. 22. Hash join conversion (join+ _left _right) (let+ [hashes (first+ (preduce+ (hash-tuple join-vars {} #(merge-with concat %1 %2)) _left))] (mapcat (fn [tuple] (tuple-matches hashes join-vars tuple)) _right))) 22
  23. 23. Processing trees • Compile abstract nodes into more concrete stream operations: – map+ – pmap+ – number+ – mapcat+ – pmapcat+ – reorder+ – filter+ – pfilter+ – rechunk+ – preduce+ – first+ – pmap-chunk+ – mux+ – preduce-chunk+ – let+ – let-stream+ 23
  24. 24. Summary• SPARQL and SQL query plans have essentially the same underlying algebra• Model is a tree of nodes where tuples flow from leaves to the root• A natural representation of this tree in Clojure is as a tree of s-expressions, just like our code• We can manipulate this tree to provide – Optimizations – Differing levels of abstraction 24
  25. 25. ProcessingComponents
  26. 26. PipesPipes are streams of dataProducer Consumer Pipe(enqueue pipe item) (dequeue pipe item)(enqueue-all pipe items) (dequeue-all pipe items)(close pipe) (closed? pipe)(error pipe exception) (error? pipe) 26
  27. 27. Pipe callbacksEvents on the pipe trigger callbacks which areexecuted on the callers thread 27
  28. 28. Pipe callbacksEvents on the pipe trigger callbacks which areexecuted on the callers thread callback-fn1. (add-callback pipe callback-fn) 27
  29. 29. Pipe callbacksEvents on the pipe trigger callbacks which areexecuted on the callers thread callback-fn1. (add-callback pipe callback-fn) 27
  30. 30. Pipe callbacksEvents on the pipe trigger callbacks which areexecuted on the callers thread callback-fn1. (add-callback pipe callback-fn)2. (enqueue pipe "foo") 27
  31. 31. Pipe callbacksEvents on the pipe trigger callbacks which areexecuted on the callers thread callback-fn1. (add-callback pipe callback-fn)2. (enqueue pipe "foo") 27
  32. 32. Pipe callbacksEvents on the pipe trigger callbacks which areexecuted on the callers thread callback-fn1. (add-callback pipe callback-fn)2. (enqueue pipe "foo")3. (callback-fn "foo") ;; during enqueue 27
  33. 33. PipesPipes are thread-safe functional data structures 28
  34. 34. PipesPipes are thread-safe functional data structures callback-fn 28
  35. 35. Batched tuples• To a pipe, data is just data. We actually pass data in batches through the pipe for efficiency. [ {:Name "Alex" :Eyes "Blue" } {:Name "Jeff" :Eyes "Brown"} {:Name "Eric" :Eyes "Hazel" } {:Name "Joe" :Eyes "Blue"} {:Name "Lisa" :Eyes "Blue" } {:Name "Glen" :Eyes "Brown"} ] 29
  36. 36. Pipe multiplexerCompose multiple pipes into one 30
  37. 37. Pipe teeSend output to multiple destinations 31
  38. 38. Nodes• Nodes transform tuples from the input pipe and puts results on output pipe. fn Input Pipe Node Output Pipe •input-pipe •output-pipe •task-fn •state •concurrency 32
  39. 39. Processing Trees• Tree of nodes and pipes fn fn fn fn fn fn Data flow 33
  40. 40. SPARQL query example SELECT ?Name WHERE { ?Person :Name ?Name . ?Person :Age ?Age . FILTER (?Age > 20) } TP1 { ?Person :Name ?Name } join filter project ?Person ?Age > 20 ?Name TP2 (project+ [?Name] { ?Person (filter+ (> ?Age 20) :Age (join+ [?Person] ?Age } (triple+ [?Person :Name ?Name]) (triple+ [?Person :Age ?Age])))) 34
  41. 41. Processing tree { ?Person :Name TP1 ?Name } preduce+ hash-tuples first+ hashes let+ filter project mapcat ?Age > 20 ?Name tuple-matches TP2 { ?Person :Age ?Age } 35
  42. 42. riple pattern Mapping to nodes • An obvious mapping to nodes and pipes triple pattern fn preduce+ fn first+ fn fn fn fn let+ filter+ project+ fn triple pattern 36
  43. 43. riple pattern Mapping to nodes • Choosing between compilation and evaluation fn preduce+ filter project fn ?Age > 20 ?Name first+ fn fn fn let+ eval fn triple pattern 37
  44. 44. Compile vs eval• We can evaluate our expressions – Directly on streams of Clojure data using Clojure – Indirectly via pipes and nodes (more on that next)• Final step before processing makes decision – Plan nodes that combine data are real nodes – Plan nodes that allow parallelism (p*) are real nodes – Most other plan nodes can be merged into single eval – Many leaf nodes actually rolled up, sent to a database – Lots more work to do on where these splits occur 38
  45. 45. Processing Execution
  46. 46. Execution requirements• Parallelism – Across plans – Across nodes in a plan – Within a parallelizable node in a plan• Memory management – Allow arbitrary intermediate results sets w/o OOME• Ops – Cancellation – Timeouts – Monitoring 40
  47. 47. Event-driven processing• Dedicated I/O thread pools stream data into plan fn fn fn fn fn fnI/Othreads Compute threads 41
  48. 48. Task creation1.Callback fires when data added to input pipe2.Callback takes the fn associated with the node and bundles it into a task3.Task is scheduled with the compute thread pool fn callback Node 42
  49. 49. Fork/join vs Executors• Fork/join thread pool vs classic Executors – Optimized for finer-grained tasks – Optimized for larger numbers of tasks – Optimized for more cores – Works well on tasks with dependencies – No contention on a single queue – Work stealing for load balancing Compute threads 43
  50. 50. Task execution1.Pull next chunk from input pipe2.Execute task function with access to nodes state3.Optionally, output one or more chunks to output pipe - this triggers the upstream callback4.If data still available, schedule a new task, simulating a new callback on the current node fn callback 44
  51. 51. Concurrency• Delicate balance between Clojure refs and STM and Java concurrency primitives• Clojure refs - managed by STM – Input pipe – Output pipe – Node state• Java concurrency – Semaphore - "permits" to limit tasks per node – Per-node scheduling lock• Key integration constraint – Clojure transactions can fail and retry! 45
  52. 52. Concurrency mechanisms Acquire Dequeue Input Create invoke Result empty sempahore Yes input message Data task task message empty Close Data Close enqueue Input data on set closed closed? output pipe = true process-input Yes Yes set closed Closes = true output? close-output closed && !closed_done No Yes run-task close set run-task Yes acquire all output- closed_do w/ nil semaphores pipe ne = true msg No release Blue outline = Java lock all = under Java semaphore release 1 all semaphore semaphor es Green outline = Cloj txn Blue shading = Cloj atom
  53. 53. Memory management• Pipes are all on the heap• How do we avoid OutOfMemory? 47
  54. 54. Buffered pipes• When heap space is low, store pipe data on disk• Data is serialized / deserialized to/from disk• Memory-mapped files are used to improve I/O fn fn fn fn 0100 …. 48
  55. 55. Memory monitoring• JMX memory beans – To detect when memory is tight -> writing to disk • Use memory pool threshold notifications – To detect when memory is ok -> write to memory • Use polling (no notification on decrease)• Composite pipes – Build a logical pipe out of many segments – As memory conditions go up and down, each segment is written to the fastest place. We never move data. 49
  56. 56. Cancellation• Pool keeps track of what nodes belong to which plan• All nodes check for cancellation during execution• Cancellation can be caused by: – Error during execution – User intervention from admin UI – Timeout from query settings 50
  57. 57. Summary• Data flow architecture – Event-driven by arrival of data – Compute threads never block – Fork/join to handle scheduling of work• Clojure as abstraction tool – Expression tree lets us express plans concisely – Also lets us manipulate them with tools in Clojure – Lines of code • Fork/join pool, nodes, pipes - 1200 • Buffer, serialization, memory monitor - 970 • Processor, compiler, eval - 1900• Open source? Hmmmmmmmmmmm……. 51
  58. 58. Thanks... Alex Miller @puredanger Revelytix, Inc.