# Introduction

This book aims to describe a superior approach to build heavily asynchronous apps based on an actor model. Major part of the book is about the elfo crate and was illustrated with best practices of its usage.

The second part of the book tells you about the best corporate practices of async apps architecture. Most of this knowledge can't be applied right away and you should develop your own solution suitable for your task.

# Installation

Add to your Cargo.toml:

elfo = "0.1.0"


# Dumping

## Introduction

Dumping is the process of storing incoming and outgoing messages for every actor, including ones from mailboxes and all other sources like timers, streams, and so on. The primary purpose is future tracing, but it also can be used for regression testing.

Dumping has a lot of common with logging, but it's more efficient and optimized for storing a lot of messages, so it has high throughput requirements instead of low latency like in the logging task, where records are supposed to be delivered as soon as possible, especially warnings and errors.

## Usage

### Enable dumping

Dumping is disabled by default until the topology contains the system.dumpers group.

Elfo provides the default implementation for such group, that's available with the full feature and exported as elfo::dumper:

let topology = elfo::Topology::empty();
let dumpers = topology.local("system.dumpers");

// ...

dumpers.mount(elfo::dumper::new());


Besides this, the path to a dump file must be specified in the config:

[system.dumpers]
path = “path/to/dump/file.dump"


### Disable dumping for the entire actor group

Put system.dumping.disabled = true under the section for the corresponding actor group. Note that the settings can be changed and applied on the fly:

[some_actor_group]
system.dumping.disabled = true


### Disable dumping for a message

Simply add the message(dumping = "disabled") attribute to the message. Another and default value of the attribute is "full".

#[message(dumping = "disabled")]
pub struct SomethingHappened {
// ...
}


### Short messages

Sometimes the content of messages is too large, for instance, in writing a backend for graph plotting, where every response can contain thousands of points. We don't want to lose additional information about responses, but saving whole messages is very expensive in this case.

For this situation, elfo provides a helper to hide specified fields during serialization, but only in the dumping context. So, these messages still will be properly sent over the network, where serialization is used too.

#[message]
pub struct ChunkArrived {
#[serde(serialize_with = "elfo::dumping::hide")]
pub points: Vec<(f64, f64)>,
pub is_end: bool,
}


Such messages cannot be deserialized properly; that's ok until they are used as input for regression testing.

## Local storage

The default implementation of dumpers writes all dumps to a file on the local file system.

Even home-purpose SSDs can achieve 3GiB/s in 2021, which should be more than enough to avoid a bottleneck in this place.

Dumps are stored in an uncompressed way so that they can take a lot of space. So, it's essential to rotate the dump file timely and delete outdated ones.

Note that message ordering between actor groups (and even inside the same actor) can be easily violated because of implementation details. Therefore, in the case of reading from local dump files, you should sort rows by the timestamp field.

## The structure of dump files

Dump files contain messages in the newline-delimited JSON format. Each line is object containing the following properties:

• g — an actor group's name
• k — an optional actor's key
• nnode_no
• ssequence_no, unique inside an actor group
• ttrace_id
• ts — timestamp
• d — direction, "In" or "Out"
• cl — an optional class
• mn — a message's name
• mp — a message's protocol, usually a crate, which contains the message
• mk — a message's kind, "Regular", "Request" or "Response"
• m — a nullable message's body
• c — an optional correlation id, which links requests with corresponding responses

## Dump file rotation

elfo::dumper doesn't use any kind of partitioning and relies on an external file rotation mechanism instead. It means some additional configuration is required, but it provides more flexibility and simplifies the dumper.

The dumper listens to the SIGHUP signal to reopen the active dump file. Besides this, the dumper accepts the elfo::dumper::ReopenDumpFile command.

The most popular solution for file rotation is, of course, logrotate.

TODO: logrotate config

Tip: if dumps are not supposed to be delivered to DB, use hard links to save the dump file for later discovery and avoid deletion.

## Remote storage

Depending on your goals, you may or may not want to send your dump files to remote file storage. It can highly improve search capabilities (primarily because of indexing trace_id) and space usage but requires additional infrastructure for dumping. Usually, it's ok for many services to use only local storage. Dumps are stored in a time-ordered way and, thanks to the structure of trace_id, can be used for good enough search. However, elfo doesn't provide the utility to search these files for now.

Ok, so you want to store dumps in DB. The right choice, if you can afford it. What should you do?

The common schema looks like

## Implentation details

At a top level, dumping is separated into two parts: the dumping subsystem and the dumper.

The dumping subsystem is based on sharded in-memory storage containing a limited queue of messages. We use a predefined number of shards for now, but we will likely use the number of available cores in the future. Every thread writes to its dedicated shard. Such an approach reduces contention and false sharing between threads.

The dumper sequentially, in a round-robin way, replaces the shard's queue with the extra one, then reads and serializes all messages and writes them to the dump file. All this work happens on a timer tick. Firstly, it's one of the simplest ways to get appropriate batching. Secondly, because the dumper uses tokio::task::spawn_blocking and blocking writes insides, that's more effective than using async tokio::fs directly. The timer approach allows us to reduce the impact on the tokio executor. However, this behavior is going to be improved for environments with io_uring in the future.

The dumper holds the lock only for a small amount of time to replace the queue inside a shard with another one, which was drained by the dumper on the previous tick. Thus, the actual number of queues is one more than shards.

All actors in a group share the same handler with some common things like sequence_no generator and rate limiter. Whenever an actor sends or receives a message, the handler is used to push message to the shard according to the current thread. Thus, messages produced by the same actor can reorder if it's migrated to another thread by a scheduler.

Yep, we can restore order in the dumper, but don't do it now because remote DB is doing it anyway. However, we can add the corresponding option in the future. It's not trivial, although.

# Pattern: ID Generation

It should be noted that ID generation approach which was chosen in your app affects your app's architecture and overall performance. Here we discuss how to design good ID and show a couple of great ID designs.

## Choosing the domain for your IDs

Despite of wide spread of 64-bit computer architecture several popular technologies still don't support 64-bit unsigned integer numbers completely. For example several popular versions of Java have only i64 type so you can't have positive integer numbers bigger than $$2^{63} - 1$$ without excess overhead. Therefore a lot of Java-powered apps (e.g. Graylog) have limitations on IDs' domain.

At some external apps you can gain a couple of problems because of IDs less than 1, e.g. value 0 may be explicitly forbidden. Avoiding 0 value in IDs simplifies some popular memory optimizations and allows you to use zero values instead of Nullable columns which can improve performance in some databases (e.g. ClickHouse).

This leads us to the following integer numbers' domain for IDs:

MIN = 1
MAX = 2 ^ 63 - 1 =
= 9_223_372_036_854_775_807 =
= 0x7fff_ffff_ffff_ffff ≈
≈ 9.2e18


Probably today you don't need to be compatible with such domain-limiting systems but in most cases such limitations are very easy to withstand and there's no reason for your project not to be ready for integration with such obsolete systems in the future. Such domain allows you to store $$2^{63} - 1$$ entities which is bigger than $$9.2 \dot{} 10^{18}$$ and is enough to uniquely identify entities in almost every possible practical task. You shouldn't forget though that a good ID generation scheme is just a trade-off between a dense domain usage and robust sharded unique IDs generation algorithm.

That's why we recommend using 63-bit positive integers as IDs.

### Systems with ECMAScript

In theory JSON allows you to use numbers of any length, but in a bunch of scenarios it will be great to process or display your data using ECMAScript (JavaScript, widely spread in web browsers) and it has only f64 primitive type for numbers so you can't have positive integer numbers bigger than $$2^{53} - 1$$ without excess overhead. This leads us to the following integer numbers' domain for IDs in systems with ECMAScript:

MIN = 1
MAX = 2 ^ 53 - 1 =
= 9_007_199_254_740_991 =
= 0x1f_ffff_ffff_ffff ≈
≈ 9.0e12


From the other hand you can add an extra serialization step storing your IDs in JSON as strings which will be a decent workaround for this problem.

## Properties of your ID generation algorithm

We call generated IDs sequence monotonic if every subsequent ID is bigger than a previous one as a number.

### Why monotonic IDs are so great

The app should gain parameters for ID generation algorithm after every restart taking into account previously issued IDs. Monotonic IDs generation allows you to take into account only the most recently generated IDs or don't bother about existing entries at all.

Most of the modern storages allow you to take a notable performance advance if your data was put down in an ordered or partially-ordered form. The storage lays down the data according to sorting key which may differ from the primary key (ID). To perform time series queries on partially-ordered data very quickly one can use the cheapest form of data skipping indices — the minmax index.

Partially-monotonic data can be compressed using specialized codecs very effectively, thanks to storing deltas instead of absolute values. Effective data compression is able to lower storage consumption and increase DB throughput significantly.

This means that you should have a good reason to use non-monotonic IDs.

### Level of monotonicity

Depending on the length of monotonic segments in the generated IDs sequence we can divide ID generation algorithms to:

• Fully monotonic — generated ID sequence is monotonic during the whole lifetime of the system (see also IDs produced by DB).
• Periodically monotonic — ID generation algorithm reaches the maximal value for ID several times during the lifetime of the system and starts counting from the beginning of the domain. This means that the storage would contain a couple of monotonic segments e.g. with data for several months long each (see also TraceId).
• With sharded monotonicity — the app generates IDs monotonically for several segments frequently switching between these segments. E.g. every actor generates only monotonic IDs but because of concurrent nature of actors' work the storage should handle a lot of intersecting consecutive IDs inserts (see also DecimalId).

### Monotonicity source

There're several means to gain monotonic IDs. Your choice of them would affect application restart strategy:

### The reasons for non-monotonic IDs

If ID generation algorithm shouldn't look predictive because of information security concerns it's a good reason to use multiplicative hashing or random permutations to generate your IDs.

It's usually a great idea to apply randomness tests to your IDs in such scenarios.

Please note that really secure IDs would require security algorithms (or at least some versions of UUID). Algorithms mentioned above are for IDs that should just look random.

### Common or separate domains for several entities' IDs

Should we use common ID generator for several entities?

In such scenario IDs for various entities will never intersect so you have no chance to successfully query entity A by ID of entity B by mistake and you essentially have “fail fast” approach on entities' querying.

Unfortunately common ID generator for several entities will exhaust ID domain more quickly.

## Great IDs case study

### IDs produced by DB

Probably you did already use AUTO_INCREMENT SQL DDL instruction for primary keys. The general idea with it is to never generate IDs by ourselves and instead rely on the counter inside the database as a single source of truth. This means that you don't know the IDs of entities you'd like to insert before you actually insert them so you should essentially support separate schema just for entities' “construction stubs” and your app's latency is bound by DB latency by design. If you need a cycle reference between entities you'd like to insert you're most likely in trouble. If you need database replication you'll be in trouble too.

Alternative — is to use atomic compare-and-swap integrated into several DB engines, for example Cassandra's lightweight transactions. IDs produced by DB are great for storing entities appearing with relatively low frequency (probably less than 10000 per second): like users or accounts.

Such approach gives us fully monotonic IDs and uses the whole 63-bit space to store ID's positive value right away.

### TraceId

This ID is periodically monotonic and fits great as a primary key for tracing entries. It was accomplished by using rarely (approx. once a year) wrapping timestamp ID component.

TraceId essentially is:

$\operatorname{trace\_id} = \operatorname{timestamp} \dot{} 2^{38} + \operatorname{node\_no} \dot{} 2^{18} + \operatorname{thread\_id\_and\_counter}$

This formula's parameters were chosen carefully to leave a good stock of spare space for TraceId's components inside the available set of 63-bit positive integers. You may see the general idea by looking at the bits distribution table. From MSB to LSB:

BitsDescriptionRangeSource
1Reserved0-
25timestamp in secs0..=33_554_431Clock at runtime
16node_no0..=65_535Externally specified node (process) configuration
22thread_id and counter1..=4_194_303Not synchronized runtime parameters

The code generating TraceId of course can be optimized by using bit shifts for fast multiplication on a power of two:

trace_id = (timestamp << 38) | (component << 33) | (node_no << 18) | counter


TraceId uses time as its monotonicity source so timestamp is probably the most important part of the ID. Note that timestamp has pretty rough resolution — in seconds. How long you can count seconds inside 25 bits?

$\operatorname{timestamp}_{max} = \frac{2^{25} - 1}{60 \dot{} 60 \dot{} 24} = \frac{33554431}{86400} \approx 388 \text{ days}$

Which is almost a year plus 23 days. What happens when this almost-one-year term ends? timestamp starts counting from 0 once again:

TIMESTAMP_MAX = (1 << 25) - 1 // 0x1ff_ffff
timestamp = now_s() & TIMESTAMP_MAX


This means that primary key is guaranteed to act as a unique identifier for one year. To keep an order of entries between monotonic periods of TraceId use some fully monotonic (but not necessarily unique) field as a sorting key for your entries (created_at or seq_no / sequential_number).

TraceId with such timestamp part is well optimized to produce a lot of entities worth querying for only a limited period of time: logs, dumps or tracing entries. However you're able to store such entries as long as you like and use data skipping indices for quick time series queries.

To make IDs unique across several instances of your system without any synchronization between them node_no ID component should be externally specified from the system deployment configuration.

thread_id shares bit space with counter. At the start of the system you should determine how much threads your node could possibly have and choose appropriate $$\operatorname{counter}_{max}$$ according to that.

Previous components segregated entries produced by separate threads on every instance of the system at different seconds. To create multiple unique IDs during a single second inside a single thread we use counter ID component.

Let's calculate how much records per second ($$\operatorname{RPS}_{max}$$) allows us to produce counter with the bounds we've chosen. Assuming we have 32 threads:

$\operatorname{RPS}_{max} = \frac{2^{22} - 1}{\operatorname{threads\_count}} = \frac{2^{22} - 1}{2^{5}} \approx 2^{17} \approx 1.3 \dot{} 10^{5} \text{ } \frac{\text{records}}{\text{second}}$

Which seems more than enough for the most of the applications. Note that counter is limited to be at least 1 to keep the invariant: $$\operatorname{id} \geqslant 1$$. Every other component of TraceId could be zero.

### DecimalId

Probably your application already has some kind of launch log storing information about release ticket in the issue tracking system, launch time, user initiated the launch, etc. If you have such entity in your system DecimalId will probably serve to you really good. This ID has sharded monotonicity. The primary idea behind DecimalId is to increase ID's legibility by storing its parts in a separate decimal places.

DecimalId essentially is:

$\operatorname{decimal\_id} = \operatorname{counter} \dot{} 10^{10} + \operatorname{generator\_id} \dot{} 10^{5} + \operatorname{launch\_id}$

This formula corresponds to the following decimal places distribution table:

DigitsDescriptionRangeSource
9counter1..=922_337_202Not synchronized runtime parameter
5generator_id0..=99_999Synchronized runtime parameter
5launch_id1..=99_999, 0 for testsExternally specified and consistent across the whole system during a single launch

The code implementing DecimalId generation can't use bit shift hack as we did for TraceId, but IDs generated using such schema are much more legible. You're able to read their decimal representation like this: cc*ggggglllll (counter, generator ID, launch ID — respectively). For instance:

               ID 14150009200065
ccccggggglllll
▲    ▲    ▲
counter (1415) ─┘    │    │
generator_id (92) ──────┘    │
launch_id (65) ───────────┘


Maximal possible ID value is 922_337_203___68_547___75_807 however counter is limited by 922_337_202 (please note the decrement). Such trick prevents limiting generator_id (next ID component) by 68_547 (now it's limited by 99_999). Note that $$\operatorname{counter}_{min} = 1$$ to keep the invariant: $$\operatorname{id} \geqslant 1$$ because every other conmonent of DecimalId could be zero.

Every time system produces more than $$\operatorname{counter}_{max}$$ elements it requests persistent synchronized counter global for the whole system for the next generator_id. This increment happens at the every start of the app and only once for every $$9.2 \dot{} 10^{8}$$ records so contention on it is negligible.

At every launch of your app launch_id should be taken from the persistent launch log of the system. To make deployment more robust we recommend generating launch_id outside of the app in the deployment configuration.

launch_id = 0 should never appear in production records and may only be used for testing.