This document describes gascheduler, a library for executing distributed tasks across Erlang nodes. It provides a scheduler that distributes tasks to worker nodes with available capacity. The scheduler aims to be generic, simple, and operations friendly. It executes callbacks asynchronously, sending status messages to the client. Tasks should be side-effect free or idempotent for consistency. The scheduler separates business logic from infrastructure code and has few dependencies, allowing for reuse. Potential improvements include multi-master support and allowing clients to stop cleanly.
2. Introduction
What are we trying to do?
● parallel and distributed computation
How do we do it?
● that is what this talk is about
3. Parallel Execution in Erlang
> lists:foreach(fun(N) ->
spawn(fun() ->
io:format("hello world ~p~n", [N])
end)
end, lists:seq(1,10)).
hello world 1
hello world 2
hello world 3
hello world 4
hello world 5
hello world 6
hello world 7
hello world 8
hello world 9
hello world 10
4. gascheduler
A generic library for executing distributed tasks
scheduler
● pending queue
● running queue
worker
node 1
worker
node n
...client
spawn(callback)
stats
execute(callback)
add_worker_node(node)
ok
error
node down
max retries
ok
retry
5. Why our own scheduler?
● task manager rather than process manager
o we start the execution later
● asynchronous
o task status sent to client via messages
● multiple node distributed execution
● bounds on concurrent tasks
7. Scheduling
execute callback
● node with least running tasks
pending queue
● unbounded
running queue per node
● bounded
worker retries on exception
● except for permanent failure
● possibly infinite times
8. Tasks
● The scheduler executes a callback
o like the map of map reduce
o e.g. count word occurrence in a string
● What are the requirements of a task?
o Ideally function should be side effect free
o Or idempotent
f(f(state)) = f(state)
o Otherwise consistency must be handled externally
9. Starting tasks
Name = test, %% Each gascheduler has its own name. There can be multiple gaschedulers.
Nodes = [...], %% A list of nodes to execute work on. See also erlang:nodes().
MaxWorkers = 10, %% Maximum number of workers per node.
MaxRetries = 10, %% Maximum number of retries for a worker, i.e. it throws some exception.
Client = self(), %% Where to send scheduler status messages to.
%% Start the scheduler.
{ok, _} = gascheduler:start_link(Name, Nodes, Client, MaxWorkers, MaxRetries),
%% Execute hello:world(1) asynchronously. In the hello module exists, world(N) -> N.
ok = gascheduler:execute(Name, {hello, world, [1]}),
.....
10. Handling task status
%% Receive a single status message from a particular scheduler.
receive
{Name, {ok, Result}, Node, MFA = {Mod, Fun, Args}} ->
io:format(“hello world ~p from ~p~n”, [Result, Node]);
{Name, {error, Reason}, Node, MFA = {Mod, Fun, Args}} ->
io:format(“task ~p failed on ~p because ~p”, [MFA, Node, Reason])
end
%% Task completed successfully.
hello world 1 from slave1@worker1
%% Task failed.
task {hello, world, [1]} failed on slave1@worker1 because max_retries
task {hello, world, [1]} failed on slave1@worker1 because permanent_failure
12. Possible Improvements
multi master
● distributed consensus required
allow client to stop cleanly in a generic way
● clients currently implements clean stop