This talk was given at ScalaIO 2019.
It explains how you can manage errors in a systematic way in your applications, and show how we did it in Rudder with the functional library ZIO.
It presents 4 big principles which direct my devloper job:
- 1/ Our work as developers is to discover and assess failure modes.
- 2/ ERRORS are a SOCIAL construction to give AGENCY to the receiver of the error.
- 3/ An application has always at least 3 kinds of users: users; devs; and ops. Don’t forget any.
- 4/ It’s YOUR work to choose the SEMANTIC between nominal case and error and KEEP your PROMISES.
The talk gives 5 guidelines to help you implement these principles. It also introduces a very light glimpse on system thinking that you can explore in more details in the related article "Understand things as interacting systems": https://medium.com/@fanf42/understand-things-as-interacting-systems-b273bdba5dec
If you have any questions, please ask: there is several way to contact me at the end of the deck (slide 87)!
4. Developer ? ● Model the world into code
○ Try to make it useful
4
5. Developer ? ● Model the world into code
○ Try to make it useful
● Nominal case necessary (of course)
5
6. Developer ? ● Model the world into code
○ Try to make it useful
● Nominal case necessary (of course)
● But not sufficient (models are false)
○ Bugs
○ Misunderstanding of needs
○ open world
○ Damn users using your app
■ often “me, 3 days in the future”
6
7. This talk ● systematic management of errors
● caveat emptor:
○ I’m a scala dev, mainly
■ expect Scala terminologie
■ statically typed language with union types, interfaces
○ application, not library
■ closer world (genericity is not the main goal)
7
8. This talk ● It's an important talk for me
● Much harder to do than expected
○ based on lots of deeply rooted, fuzzy,
experimental knowledge
● Please, please, I beg you: if anything
unclear, come chat with me / ask
questions (whatever the medium)
8
15. 15
Assess failure modes.
Give agency to your users
and don’t forget any of them.
You are responsible to keep
promises made.
16. 16
Pure, total functions
Explicit error channel
Program to strict
interfaces and protocols
Composition and tooling
1.
2.
4.
5.
Failures vs Errors
3.
Assess failure modes.
Give agency to your users
and don’t forget any of them.
You are responsible to keep
promises made.
17. 17
1.
2.
4.
5.
These points are also important and
cans be translated at
architecture / UX / team / ecosystem
levels.
But let’s keep it simple with code.
3.
Assess failure modes.
Give agency to your users
and don’t forget any of them.
You are responsible to keep
promises made.
21. Don’t lie!
21
Divide By Zero ?
● non total functions are a lie
○ your promises are unsound
○ your users can’t react appropriately
divide(a: Int, b: Int): Int
24. Don’t lie!
24
No such user ? (non total)
DB connexion error?
getUserFromDB(id: UserId): User
25. Don’t lie!
25
No such user ? (non total)
DB connexion error?
● non pure functions are a lie
○ your promises are unsound
○ your users can’t react appropriately
getUserFromDB(id: UserId): User
26. Sound
promises
26
● use total functions
○ or make them total with union return type
● use pure functions
○ or make them pure with IO monad
● Don’t lie to your users,
● allow them to react efficiently:
29. ● Don’t assume what’s obvious
● It’s an open world out there
● Don’t force users to
revert-engineer possible cases
29
It’s a signal
make it
unambiguous
give agency
30. Which intent is less ambiguous?
30
blobzurg(a: Int, b: Int): Option[Int]
blobzurg(a: Int, b: Int): PureResult[DivideByZero, Int]
It’s a signal
make it
unambiguous
give agency
31. 31
It’s a signal
make it
unambiguous
give agency
automate it
● Use the type system to automate
classification of errors?
32. 32
A type system is a tractable syntactic method for
proving the absence of certain program behaviors by
classifying phrases according to the kinds of values
they compute.
Benjamin Pierce
It’s a signal
make it
unambiguous
give agency
automate it
● Use the type system to automate
classification of errors?
33. 33
By definition, a type system automatically categorize results
⟹ need for a dedicated error chanel + a common error trait
A type system is a tractable syntactic method for
proving the absence of certain program behaviors by
classifying phrases according to the kinds of values
they compute.
Benjamin Pierce
It’s a signal
make it
unambiguous
give agency
automate it
def divide(a: Int, b: Int): PureResult[Int]
34. 34
A type system is a tractable syntactic method for
proving the absence of certain program behaviors by
classifying phrases according to the kinds of values
they compute.
Benjamin Pierce
trait MyAppError // common properties of errors
type PureResult[A] = Either[MyAppError, A]
It’s a signal
make it
unambiguous
give agency
automate it
def divide(a: Int, b: Int): PureResult[Int]
By definition, a type system automatically categorize results
⟹ need for a dedicated error chanel + a common error trait
35. 35
It’s a signal
make it
unambiguous
give agency
automate it
def getUser(id: UserId): IOResult[User]
By definition, a type system automatically categorize results
⟹ need for a dedicated error chanel + a common error trait
Same for effectful functions!
36. Same for effectful functions!
36
trait MyAppError // common properties of errors
type IOResult[A] = IO[MyAppError, A]
It’s a signal
make it
unambiguous
give agency
automate it
def getUser(id: UserId): IOResult[User]
By definition, a type system automatically categorize results
⟹ need for a dedicated error chanel + a common error trait
37. 37
It’s a signal
make it
unambiguous
give agency
automate it
● Use a dedicated error channel
○ ~ Either[E, A] for pure code,
○ else ~ IO[E, A] monad
● use a parent trait for common error
properties…
● and for automatic categorization
of errors by compiler
42. Systems? Need for a systematic approach to error management
42
A school of systems
43. Systems? Need for a systematic approach to error management
43
○ BOUNDED group of
things
○ with a NAME
Interacting
○ with others
systems
A school of systems
46. Errors
vs
Failures
46
Errors
● expected non
nominal case
● signal for users
● social construction:
you choose
alternative or error
● reflected in types
Failures
● unexpected case: by
definition, application
is in an unknown
state
● only choice is stop as
cleanly as possible
● not reflected in types
48. Horizon limit
is your choice
-
by definition
48
java.lang.SecurityException?
execScript(js: String): IOResult[String]
In Rudder, we have a JS engine (JS from users):
49. Horizon limit
is your choice
-
by definition
49
java.lang.SecurityException?
execScript(js: String): IOResult[String]
In Rudder, we have a JS engine (JS from users):
⟹ SecurityException is an expected error case here
50. Horizon limit
is your choice
-
by definition
50
java.lang.SecurityException?
execScript(js: String): IOResult[String]
In Rudder, we have a JS engine (JS from users):
⟹ SecurityException is an expected error case here
… but nowhere else in Rudder. By our choice.
52. Need for a systematic approach to error management
52
○ BOUNDED group of
things
○ with a NAME
Interacting
○ with others
systems
A school of systems
A bit more
about
systems
53. A bit more
about
systems
Need for a systematic approach to error management
53
○ BOUNDED group of
things
○ with a NAME
Interacting
○ via INTERFACES
○ by a PROTOCOL
with other systems
○ And PROMISING
to have a behavior
A school of systems
56. Example?
56
Typical web application. How to keep contradictory promises?
Promises to third parties
about REST behaviour
Promises to business and
developers about code
manageability
59. Make
promises,
Keep them
59
● systems allow to bound responsibilities
Business Core sub-system:
● own ADT / logic (mostly pure)
● lifecycle bounded to developers
understanding of needs (rapid
changes)
60. Make
promises,
Keep them
60
● systems allow to bound responsibilities
Business Core sub-system:
● own ADT / logic (mostly pure)
● lifecycle bounded to developers
understanding of needs (rapid
changes)
Pattern: “A pure heart (core)
surrounded by side effects”*
* works better in French: “un coeur pur
encerclé par les effets de bords”
61. Make
promises,
Keep them
61
● systems allow to bound responsibilities
Users of the API want stability and
to know what errors can happen
Business Core sub-system:
● own ADT / logic (mostly pure)
● lifecycle bounded to developers
understanding of needs (rapid
changes)
62. Make
promises,
Keep them
62
● systems allow to bound responsibilities
Business Core sub-system:
● own ADT / logic (mostly pure)
● lifecycle bounded to developers
understanding of needs (rapid
changes)
REST sub-system :
● own ADT / logic (mostly effects)
● lifecycle bounded to REST contract:
strict versioning, changes are
breaking changes
Users of the API want stability and
to know what errors can happen
63. Make
promises,
Keep them
63
● systems allow to bound responsibilities
Business Core sub-system:
● own ADT / logic (mostly pure)
● lifecycle bounded to developers
understanding of needs (rapid
changes)
REST sub-system :
● own ADT / logic (mostly effects)
● lifecycle bounded to REST contract:
strict versioning, changes are
breaking changes
Stable API : interface, strict protocol &
promises (nominal cases + errors)
Users of the API have agency
(able to react efficiently)
64. Make
promises,
Keep them
64
● systems allow to bound responsibilities
Business Core sub-system:
● own ADT / logic (mostly pure)
● lifecycle bounded to developers
understanding of needs (rapid
changes)
REST sub-system :
● own ADT / logic (mostly effects)
● lifecycle bounded to REST contract:
strict versioning, changes are
breaking changes
Stable API : interface, strict protocol &
promises (nominal cases + errors)
Users of the API have agency
(able to react efficiently)
Translation between sub-systems:
API: interface, protocol & promises!
65. Make
promises,
Keep them
65
● systems allow to bound responsibilities
● translate errors between sub-systems
○ make errors relevant to their users
● It’s a model, it’s false
○ there is NO definitive answer.
○ discuss, share, iterate
● the bigger the promises, the stricter the API
70. What’s
missing for
good error
management
in code ?
● signal must be unambiguous
○ exception are a pile of ambiguity
● exceptions are A PAIN to use
○ no tooling, no inference, nothing
■ you need to be able to manipulate errors like normal code
■ where are our higher order functions like map, fold, etc ?
○ no composition
■ loose referential transparency*
70
* the single biggest win regarding code comprehension
71. Make it a joy!
71
● managing error should be enjoyable !
○ automatic (in for loop + inference)
○ or as expressive as nominal case!
● safely, easely managing error should be the default !
○ composition (referential transparency…)
○ higher level resource management: bracket, etc
● make the code extremely readable
○ add all the combinators you need!
○ it’s cheap with pure, total functions
73. Why ZIO ?
73
● you still have to think in systems by yourself
74. Why ZIO ?
74
● you still have to think in systems by yourself
● then ZIO provides :
○ effect management
○ with an explicit error channel
○
IO[+E, +A]
val pureCode = IO.effect(effectfulCode)
75. Why ZIO ?
75
● you still have to think in systems by yourself
● then ZIO provides :
○ debuggable failures
Complex error composition Async code trace
76. Why ZIO ?
76
● you still have to think in systems by yourself
● then ZIO provides :
○ tons of convenience to manipulate errors
■ create: from Option, Either, value...
■ transform: mapError, fold, foldM, ..
■ recovery: total, partial, or else
○ composable effects
■ .bracket / Managed, asyncqueues, STM, etc
● safe, composable resource management
77. Why ZIO ?
77
● you still have to think in system by yourself
● then ZIO provides :
○ effect management
○ with an explicit error channel
○ debuggable failures
○ tons of convenience to manipulate errors
○ composable
78. Why ZIO ?
78
● you still have to think in system by yourself
● then ZIO provides :
○ effect management
○ with an explicit error channel
○ debuggable failures
○ tons of convenience to manipulate errors
○ composable
● Everything work in parallel, asynchronous code too!
● Inference just work!
79. Why ZIO ?
79
● you still have to think in system by yourself
● then ZIO provides :
○ effect management
○ with an explicit error channel
○ debuggable failures
○ tons of convenience to manipulate errors
○ composable
● Everything work in parallel, concurrent code too!
● Inference just work!
Lots of details: “Error Management: Future vs ZIO”
https://www.slideshare.net/jdegoes/error-management-future-vs-zio
85. Full example
85
● inference just works
● each sub-system add relevant information
(None, msg) => Unexpected(msg)
PureResult[A] => IOResult[A]
(err: RudderError[A], msg) => Chained(msg, err)
error contextualisation between systems
86. 86
Pure, total functions
don’t lie about your promises
Explicit error channel
make it unambiguous in your types
Program to strict
interfaces and protocols
use systems to materialize promises
Composition and tooling
make it extremely convenient to use
Assess failure modes.
Give agency to your users
and don’t forget any of them.
You are responsible to keep
promises made.
1.
2.
4.
5.
Failures vs Errors
models are false by construction3.
87. Question?
Contact me /
Chat with me!
https://twitter.com/fanf42
https://github.com/fanf
https://keybase.io/fanf42
irc/freenode: fanf
francois@rudder.io
87
Ressources
○ Error management: future vs ZIO
A much more detailed presentation of ZIO error management capabilities
https://www.slideshare.net/jdegoes/error-management-future-vs-zio
○ Understand Things As Interacting Systems
More insights on systems.
https://medium.com/@fanf42/understand-things-as-interacting-systems-b273bdba5dec
○ Stay Up!
Journey of a Free Software Company. One decade in search for a sustainable model
https://medium.com/@fanf42/stay-up-5b780511109d
88. Some
questions
asked after
the talk
88
● Is SystemError used to catch / materialize failure ?
○ no, SystemError is here to translate Error that need to be dealts
with (like connection error to DB, FS related problem, etc) but
are encoded in Java with an Exception. SystemError is not used
to catch Java “OutOfMemoryError”. These exception kills
Rudder. We use the JVM Thread.setDefaultUncaughtExceptionHandler to try
to give more information to dev/ops and clean things before
killing the app.
89. Some
questions
asked after
the talk
89
● You have only one parent type for errors. Don’t you lose a lot of details with all special
errors in subsystems losing the specificities when they are seen as RudderError?
○ this is a very pertinent question, and we spend a log of time pondering between the
current design and one where all sub-systems would have their own error type (with no
common super type). In the end, we settled on the current design because:
■ no common super type means no automatic inference. You need to guide it with
transformer, and even if ZIO provide tooling to map errors, that means a lot of useless
boilerplate that pollute the readability of your code.
■ there is common tooling that you really want to have in all errors (Chained, SystemError,
but also “notOptional”, etc). You don’t want to rewrite them. Yes type class could be a
solution, but you still have to write them, for no clear gain here.
■ you are fighting the automatic categorization done by the compiler in place of
leveraging it.
■ The gain (detailed error) is actually almost never needed. When we switched to “only
one super class for all error”, we saw that “Chained” is sufficient to deals with general
trans-system cases, and in some very, very rare cases, you case build ad-hoc
combinators when needed, it’s cheap.
○ So all in all, the wins in convenience and joy of just having evering working without
boilerplate clearly outpaced the not clear gain of having different error hierarchies.
○ The problem would have been different if Rudder was not one monolithic app with a
need of separated compilation between services. I think we would have made an
“error” lib in that case.
90. Some
questions
asked after
the talk
90
● We use Future[Either[E,A]] + MTL, why should we switch to ZIO?
○ Well, the decision to switch is yours, and I don’t know the
specific context of your company to give an advice on that.
Nonetheless, here is my personal opinion:
■ ZIO stack seems simpler (less concepts) and work
perfectly with inference. Thus it may be simpler to teach it
to new people, and to maintain. YMMV.
■ ZIO perf are excellent, especially regarding concurrent
code. Fibers are a very nice abstraction to work with.
■ ZIO enforce pure code, which is generally simpler to
compose/refactor.
■ ZIO tooling and linked construction (Managed resources,
Async Queues, STM, etc) are a joy to code with. It removes
a lot of pains in tedious, boring, complicated tasks (closing
resources correctly, sync between concurrent access, etc)
■ pertinent stack trace in concurrent code is a major win
● But at the end of the day, you decide!
91. Some
questions
asked after
the talk
91
● How long did it took to port Rudder to ZIO?
○ It’s complicated :). 1 month of part time (me), plus lots more time
for teaching, refactoring, understanding new paradigm limits, etc
■ 1/ we didn’t started from nowhere. We were using Box from
liftweb, and a lot of the code in Rudder was already “shaped” to
deal with errors as explain in the talk (see
https://issues.rudder.io/issues/14870 for context)
■ 2/ we didn’t ported all Rudder to ZIO. I estimated that we
ported ~ 40% of the code (60k-70k lines ?).
■ 3/ we did some major refactoring along the lines, using new
combinators and higher level structures (like async queues)
■ 4/ we started in end of 2018, when ZIO code was still moving a
lot and we switch to new things we when became available
(ZIO 1.0.0 is around the corner and it as been quite stable for
months now)
■ we spent quite some time looking for the best choice for errors
between sub-system (see other question)