State Machines in Go, Elixir, Swift, and Zig

Why a Go retry loop ran forever because the attempt counter lived on the loop instead of the state, and what the runtime guarantees of Elixir, Swift, and Zig change about which state-machine idioms are honest in each.

By Jovani Pink April 27, 2026 13 min — Systems & Complexity Notes

Outcome focus: Reader can pick the right state-machine idiom for their language by recognizing which runtime guarantees the language ships, distinguish a true finite-state machine from unidirectional data flow, and avoid the cross-language mistake of treating one language's idiom as the universal pattern.

Part 4 of 4. Part 1: When the State Chart Pays Off. Part 2: XState, Actors, and What the Stately Argument Actually Buys. Part 3: State Machines in Python: from xstate-python to LangGraph.

A Go service ran payment retries in a loop. The shape was something like for attempt := 0; attempt < 3; attempt++ with a switch on the current state inside. When the payment failed transiently, the code exited the inner switch, returned a sentinel, and the outer caller restarted the whole function. The function declared attempt := 0 again at the top. The attempt counter never reached the limit. The retry loop ran for hours against a degraded payment provider until someone noticed the burn.

The bug was not a Go idiom problem. It was a state-modeling problem expressed through a Go idiom. The retry counter belonged on the state machine, not on the call stack. Any language can produce this mistake; what makes it Go-shaped is that Go encourages writing state machines as nested switches inside loops, and that pattern naturally puts the counter in the wrong place.

This post is about how state machines look in Go, Elixir, Swift, and Zig, and about why each language's runtime guarantees change which idioms are honest. The four languages were not chosen randomly. They span a real range: Go has goroutines but no built-in state machine concept; Elixir's BEAM gives you supervision and actor-style processes for free; Swift has rich UI state primitives and at least one prominent library people think is a state machine library but is not; and Zig has no library worth recommending and forces the discipline into the type system. The same canonical machine looks materially different in each one, and the differences are the lesson.

The canonical machine for this post is a payment workflow with five states: idle, charging, succeeded, failed, refunded. Events: CHARGE, CHARGE_OK, CHARGE_ERR, REFUND. Context: the amount, the attempt counter, the idempotency key. This is a deliberately small machine; the goal is to expose how each language wants to express it, not to ship a production payments service.

Go: Two Libraries and a Switch#

Go has no first-party state machine support. The two community libraries worth knowing are looplab/fsm (3.4k stars, callback-table API, version 1.0.x) and qmuntal/stateless (UML-statechart-based, fluent builder API, version 1.8.x). Either is fine; they have different mental models.

The looplab/fsm shape is event-centric: you declare events that move between source and destination states, and you attach callbacks to lifecycle hooks like enter_state or before_event.

package main
 
import (
	"context"
	"fmt"
	"github.com/looplab/fsm"
)
 
func main() {
	payment := fsm.NewFSM(
		"idle",
		fsm.Events{
			{Name: "charge", Src: []string{"idle", "failed"}, Dst: "charging"},
			{Name: "charge_ok", Src: []string{"charging"}, Dst: "succeeded"},
			{Name: "charge_err", Src: []string{"charging"}, Dst: "failed"},
			{Name: "refund", Src: []string{"succeeded"}, Dst: "refunded"},
		},
		fsm.Callbacks{
			"enter_charging": func(_ context.Context, e *fsm.Event) {
				fmt.Println("attempt", e.Args[0])
			},
		},
	)
	_ = payment.Event(context.Background(), "charge", 1)
}

The qmuntal/stateless shape is state-centric: you configure each state and declare which triggers it permits, with optional guards.

package main
 
import (
	"context"
	"github.com/qmuntal/stateless"
)
 
func main() {
	payment := stateless.NewStateMachine("idle")
	payment.Configure("idle").Permit("charge", "charging")
	payment.Configure("charging").
		Permit("charge_ok", "succeeded").
		Permit("charge_err", "failed")
	payment.Configure("failed").
		PermitIf("charge", "charging", func(ctx context.Context, _ ...any) bool {
			return getAttemptCount(ctx) < 3
		})
	payment.Configure("succeeded").Permit("refund", "refunded")
 
	_ = payment.Fire(context.Background(), "charge")
}

The retry-counter bug from the opening is structurally prevented in either library because the attempt count lives on the machine context (or in the callback closure), not on the call stack of an outer loop. The wrong shape is to write a hand-rolled switch on a state field inside a for loop and treat the loop variable as the attempt counter; the right shape is to put the counter on the machine and either guard the charge transition with a counter check (qmuntal/stateless PermitIf) or refuse to register the transition once the counter is above the limit. Either library forces the counter into the right place by making the state explicit; the loop-shaped version of the same code does not.

A team that does not want a library can still get the discipline with a struct, an enum-like set of constants, and a Transition(state State, event Event) (State, error) function that names every legal pair. The library buys callbacks, hierarchy (in qmuntal/stateless), and visualization; for a small machine, a struct is enough. The library is not a substitute for the modeling work. It is a place to put the modeling work after it is done.

Elixir: gen_statem and the Runtime Doing the Work#

Elixir is the language where the runtime gives you the most for free. The Erlang/OTP :gen_statem behaviour, available in Elixir as :gen_statem directly or wrapped in libraries, is a state-machine construct built into the platform. It runs in its own BEAM process, supports callbacks per state (or one universal handler), gets supervision and restart semantics from OTP, and handles state-internal state (timers, postponed events, replies) as first-class operations.

The payment machine in Elixir using :gen_statem directly:

defmodule PaymentFsm do
  @behaviour :gen_statem
 
  def start_link(opts \\ []), do:
    :gen_statem.start_link(__MODULE__, opts, [])
 
  def charge(pid, amount), do: :gen_statem.call(pid, {:charge, amount})
  def refund(pid), do: :gen_statem.call(pid, :refund)
 
  @impl :gen_statem
  def callback_mode, do: :state_functions
 
  @impl :gen_statem
  def init(_opts), do: {:ok, :idle, %{attempt: 0}}
 
  def idle({:call, from}, {:charge, amount}, data) do
    {:next_state, :charging, %{data | attempt: data.attempt + 1},
     [{:reply, from, :ok}, {:state_timeout, 30_000, :charge_timeout}]}
  end
 
  def charging(:info, {:charge_result, :ok}, data) do
    {:next_state, :succeeded, data}
  end
 
  def charging(:info, {:charge_result, :err}, %{attempt: n} = data) when n < 3 do
    {:next_state, :failed, data}
  end
 
  def charging(:info, {:charge_result, :err}, data) do
    {:next_state, :failed_terminal, data}
  end
 
  def failed({:call, from}, {:charge, _amount}, %{attempt: n}) when n >= 3 do
    {:keep_state_and_data, [{:reply, from, {:error, :max_attempts}}]}
  end
 
  def failed({:call, from}, {:charge, amount}, data) do
    {:next_state, :charging, %{data | attempt: data.attempt + 1},
     [{:reply, from, :ok}]}
  end
 
  def succeeded({:call, from}, :refund, data) do
    {:next_state, :refunded, data, [{:reply, from, :ok}]}
  end
end

Two things this code does that no other language in this post does for free.

First, the state_timeout action in the idle state's reply tuple is a built-in timer that the BEAM tracks. If charging does not transition within 30 seconds, the runtime delivers a {:state_timeout, ...} event automatically. The state-machine code does not have to wire up timers, manage their lifecycle, or worry about timer leaks; the platform does it. This eliminates an entire class of bugs.

Second, the process is a BEAM process. Putting it under a supervisor with :transient restart semantics means a crash mid-charge restarts the process with its initial state. With persistent state (a :gen_statem callback that loads from a database on init/1), the restart resumes from the last persisted state. The supervision tree is the durability story; the state machine is the formalism. They compose.

The honest framing for Elixir is that the language hands the team most of the actor model that XState's documentation gestures at and that Carl Hewitt described in 1973. The BEAM is one of two production-deployed implementations of that model; OTP's process and supervision primitives turn the theoretical guarantees into operational reality. A team building a workflow that would benefit from a state machine and would benefit from process isolation, supervision, and timer correctness should reach for Elixir or Erlang directly. The runtime guarantees are the product.

The previous post in this series, the post on borrowed runtimes for Parquet, was about treating the BEAM as a serialization boundary on the way to Polars. This post is about the BEAM doing work that other languages have to bolt on. Both framings are honest; they describe different parts of the platform.

Swift: SwiftUI Observable, and Why TCA Is Not a State Machine#

Swift is the language where the most popular library people think is a state machine library is something else. The Composable Architecture (TCA) from Point-Free is well-loved, well-maintained (version 1.25.x as of April 2026), and explicitly inspired by Elm and Redux. It is unidirectional data flow with a reducer, a store, and effects. It is not a finite-state machine in the formal sense.

The distinction matters. A finite-state machine has a finite set of states and a transition function that, given a state and an event, produces exactly one next state. The transition function is total within the events it accepts, and any event-state pair that has no transition is either rejected or explicitly ignored. A reducer in TCA (or Redux, or Elm) is a function from (State, Action) -> (State, Effect) that always returns a state. There is no enforced finite enumeration of states; the state is whatever the struct holds. There is no compile-time check that "the action Refund is illegal when the state is Idle"; the reducer is free to handle that case any way it wants, including by silently ignoring it or by transitioning into an "impossible" state.

In practice, TCA is excellent for the problems it was designed for: SwiftUI applications that need composable, testable feature reducers with effect management. It is not the right tool when the team needs to enforce that certain transitions are illegal. A team using TCA for a payment flow will have to add the transition discipline by hand, usually as exhaustive switch statements inside the reducer that make the illegal cases compile errors.

A small TCA reducer for the payment machine, with the discipline added manually:

import ComposableArchitecture
 
@Reducer
struct PaymentFeature {
  @ObservableState
  struct State: Equatable {
    enum Status: Equatable {
      case idle
      case charging(attempt: Int)
      case succeeded
      case failed(attempt: Int)
      case refunded
    }
    var status: Status = .idle
    var amountCents: Int = 0
  }
 
  enum Action: Equatable {
    case charge
    case chargeResult(Result<Void, ChargeError>)
    case refund
  }
 
  enum ChargeError: Error, Equatable { case transient, terminal }
 
  var body: some Reducer<State, Action> {
    Reduce { state, action in
      switch (state.status, action) {
      case (.idle, .charge):
        state.status = .charging(attempt: 1)
        return .run { send in
          await send(.chargeResult(await processCharge()))
        }
 
      case (.failed(let attempt), .charge) where attempt < 3:
        state.status = .charging(attempt: attempt + 1)
        return .run { send in
          await send(.chargeResult(await processCharge()))
        }
 
      case (.charging, .chargeResult(.success)):
        state.status = .succeeded
        return .none
 
      case (.charging(let attempt), .chargeResult(.failure)):
        state.status = .failed(attempt: attempt)
        return .none
 
      case (.succeeded, .refund):
        state.status = .refunded
        return .none
 
      // Explicit illegal cases. These compile to the same as a default case
      // that ignores the action, but naming them prevents the developer from
      // accidentally adding a transition that should not exist.
      case (.idle, .refund),
           (.idle, .chargeResult),
           (.charging, .charge),
           (.charging, .refund),
           (.succeeded, .charge),
           (.succeeded, .chargeResult),
           (.failed, .chargeResult),
           (.failed, .refund) where state.status == .failed(attempt: 3),
           (.refunded, _):
        return .none
      }
    }
  }
}

The reducer enforces the state machine discipline by exhaustive switching on the (state, action) tuple. Swift's compiler will warn if the cases are not exhaustive (assuming Status and Action are well-formed), which approximates the totality property of a real state machine transition function. It is more verbose than :gen_statem and less rigorous than XState's setup({ types }) because the discipline is in the developer's hands, not in the type system.

For Swift teams that need formal state machines, the realistic choice is to write the discipline by hand inside a TCA reducer, or to use one of the smaller community FSM libraries (which have small communities and uncertain maintenance). I have not found a Swift state-machine library that I would recommend with the confidence I have for :gen_statem in Elixir or qmuntal/stateless in Go. SwiftUI's @Observable macro makes the per-property change tracking ergonomic; it does not give you state-machine semantics.

Zig: Tagged Union and Switch#

Zig has no state-machine library worth recommending. The idiom is a tagged union (Zig's union with an enum tag) and a switch statement that exhaustively handles every state-event pair.

const std = @import("std");
 
const PaymentState = union(enum) {
    idle,
    charging: struct { attempt: u32 },
    succeeded,
    failed: struct { attempt: u32 },
    refunded,
};
 
const PaymentEvent = enum {
    charge,
    charge_ok,
    charge_err,
    refund,
};
 
const TransitionError = error{ IllegalTransition, MaxAttemptsExceeded };
 
fn transition(state: PaymentState, event: PaymentEvent) TransitionError!PaymentState {
    return switch (state) {
        .idle => switch (event) {
            .charge => PaymentState{ .charging = .{ .attempt = 1 } },
            else => TransitionError.IllegalTransition,
        },
        .charging => |c| switch (event) {
            .charge_ok => .succeeded,
            .charge_err => PaymentState{ .failed = .{ .attempt = c.attempt } },
            else => TransitionError.IllegalTransition,
        },
        .failed => |f| switch (event) {
            .charge => if (f.attempt >= 3)
                TransitionError.MaxAttemptsExceeded
            else
                PaymentState{ .charging = .{ .attempt = f.attempt + 1 } },
            else => TransitionError.IllegalTransition,
        },
        .succeeded => switch (event) {
            .refund => .refunded,
            else => TransitionError.IllegalTransition,
        },
        .refunded => TransitionError.IllegalTransition,
    };
}

Zig's compiler enforces switch exhaustiveness and forces the developer to handle every variant of the tagged union. The IllegalTransition and MaxAttemptsExceeded errors are explicit; nothing silently transitions. The state machine code is the type system. There is no library and no runtime; there is a function from (state, event) to state (or an error), and the compiler refuses to build code that does not handle every case.

This is the smallest, most rigorous expression of a state machine in this post. It is also the one that does the least for you. There are no callbacks, no entry/exit hooks, no parallel regions, no hierarchy, no timers. Zig's discipline is the discipline of the type system; everything else (the side effects, the persistence, the supervision) belongs in code that calls this function. For workflows where the formalism is what matters and the rest is solved elsewhere, the tagged-union shape is a good fit. For workflows that need any of the missing features, Zig is not the right language for the state machine, regardless of how well it fits the rest of the system.

Where the Runtime Guarantees Change the Idiom#

The same five-state payment machine looks different in each language because the languages are giving you different guarantees underneath. The honest comparison:

LanguageState machine patternWhat the runtime gives you for freeWhat you have to bolt on
GoLibrary (looplab/fsm or qmuntal/stateless) or struct + switchGoroutines for concurrency; nothing for state machinesPersistence, supervision, timers, hierarchy (in looplab)
Elixir:gen_statemProcess isolation, supervision, restart semantics, timers, message orderingPersistence (via init callback)
SwiftTCA reducer (with manual discipline) or hand-rolledProperty observation via @Observable; reducer composition via TCATransition totality, hierarchy, timers, persistence
ZigTagged union + exhaustive switchCompile-time exhaustiveness; nothing elseSide effects, persistence, supervision, timers, hierarchy

The pattern is that the more the language gives you at the runtime level, the simpler the user-space state machine code is. Elixir's :gen_statem is short because the BEAM is doing the heavy lifting. Zig's tagged union is short for a different reason: the compiler is doing the heavy lifting, and everything that is not a transition is somewhere else. Go and Swift sit in the middle: enough runtime to need a library, not enough runtime to make the library trivial.

The cross-language mistake to avoid is treating one language's idiom as the universal pattern. A Swift developer who reads about XState's actor model and tries to build a Swift "actor system" with TCA reducers is not building actors; they are building reducers in actor-shaped vocabulary. A Go developer who writes a struct-and-switch state machine and calls it a "supervised process" is not getting supervision. Naming what each language gives you, and naming what each language does not, is the precondition for picking the right tool.

Close#

This post closes the four-part series on state machines. Part 1 made the case for the discipline; Part 2 covered XState in TypeScript and React; Part 3 covered the Python ecosystem and the xstate-python contribution roadmap; this post covered four other languages and what their runtime guarantees buy.

The takeaway is the same as the start: state machines pay for themselves when the workflow has interacting states, conditional transitions, or lifecycle steps that need to be resumable. The library or runtime is the place to put the modeling work after it is done; the language's runtime guarantees decide how much work the library has to do. A Go team and an Elixir team facing the same workflow should not write the same code, because the runtimes are not the same. The honest discipline is to pick the formalism for the workflow first, and then pick the language whose runtime makes that formalism cheapest to ship.

For teams shipping in TypeScript or Python, the move this week is the one named in Part 1's decision matrix. Pick one workflow. Draw the chart. Build the machine. The first one is the one that teaches the team where the discipline pays off; the next ten are easier.

Back to all writing
On this page
  1. Go: Two Libraries and a Switch
  2. Elixir: gen_statem and the Runtime Doing the Work
  3. Swift: SwiftUI Observable, and Why TCA Is Not a State Machine
  4. Zig: Tagged Union and Switch
  5. Where the Runtime Guarantees Change the Idiom
  6. Close