Supervision

Build fault-tolerant systems with OTP-style supervision trees.

What is Supervision?

Supervision is a strategy for handling failures in actor systems. Instead of crashing the entire application when something goes wrong, supervisors can restart failed actors, isolating failures to small parts of the system.

cineyma's supervision is inspired by Erlang/OTP's "let it crash" philosophy:

  • Actors form parent-child hierarchies
  • Parents supervise their children
  • Failures are isolated and can trigger restarts
  • Panics never crash the runtime

Supervisor Strategies

cineyma provides three supervision strategies:

1
2
3
4
5
6
7
8
9
10
11
12
13
14

Strategy Details

Stop (Default)

When an actor fails, it stops permanently. The parent receives a Terminated message.

1
2
3
4
5
6
7
8

Restart

Restart the actor up to N times within a time window. If the limit is exceeded, the actor stops permanently.

1
2
3
4
5
6
7
8

The factory function || Worker::new() is called on each restart to create a fresh actor instance with clean state.

Escalate

Propagate the failure to the parent supervisor, which then handles it according to its own strategy. This creates fault-tolerance hierarchies.

1
2
3
4
5
6
7
8
9
10
11

Creating Supervision Trees

Build hierarchical supervision structures:

supervision_tree.rs
rust
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46

Panic Handling

cineyma catches all panics at actor boundaries. A panic in a handler triggers the supervision strategy:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25

When an actor panics during a send() call, the caller receives a MailboxError::MailboxClosed error. Design your system to handle this gracefully.

Best Practices

Use supervision for I/O actors

Actors that interact with external systems (databases, APIs) should have restart strategies to recover from transient failures.

Keep state minimal in restartable actors

Actors that might restart should minimize in-memory state. Store important state externally or reconstruct it on startup.

Use Escalate for critical failures

When a child failure indicates a broader problem, use Escalate to let the parent decide how to handle it.

Set appropriate restart limits

Too many restarts might indicate a persistent problem. Set limits that make sense for your failure modes.

Supervision Tree Example

Supervision hierarchy: Workers escalate to OrderDepartment, which restarts via RootSupervisor

When a Worker panics:

  1. Worker stops
  2. OrderDepartment receives Terminated
  3. If escalate: OrderDepartment treated as failed
  4. RootSupervisor restarts OrderDepartment
  5. Fresh OrderDepartment spawns new Workers

Next Steps