#fleet #nixos #dag #orchestration

app pleme-fleet

NixOS fleet lifecycle CLI with DAG workflow orchestration

1 unstable release

0.1.0 Mar 1, 2026

#462 in Development tools

MIT license

58KB
1.5K SLoC

Fleet

NixOS fleet lifecycle CLI with DAG workflow orchestration.

Fleet manages NixOS machines — deploy, build, diff, status, rollback, reboot — with tag-based targeting and composable multi-step workflows defined in YAML.

Install

# Nix flake
nix run github:pleme-io/fleet

# Or add as a flake input
fleet = {
  url = "github:pleme-io/fleet";
  inputs.nixpkgs.follows = "nixpkgs";
};

Quick start

Fleet reads its node registry from the FLEET_NODES environment variable (JSON). The typical setup is a Nix wrapper that injects this at runtime:

fleet-wrapped = pkgs.writeShellScript "fleet" ''
  export FLEET_NODES='${builtins.toJSON nodeRegistry}'
  export FLEET_FLAKE_DIR="$(git rev-parse --show-toplevel 2>/dev/null || echo "$PWD")"
  exec ${fleet}/bin/fleet "$@"
'';

The node registry is a JSON object:

{
  "web1": { "hostname": "10.0.0.1", "ssh_user": "root", "system": "x86_64-linux", "tags": ["production", "k3s"] },
  "web2": { "hostname": "10.0.0.2", "ssh_user": "root", "system": "x86_64-linux", "tags": ["production", "k3s"] },
  "staging": { "hostname": "10.0.0.10", "ssh_user": "root", "system": "x86_64-linux", "tags": ["staging"] }
}

Commands

fleet deploy <targets>     Deploy NixOS configurations (deploy-rs / colmena)
fleet build <targets>      Build without activating
fleet diff <targets>       Show closure diff (current vs. new)
fleet status [targets]     Show generation, uptime, kernel, NixOS version
fleet ping [targets]       Check SSH connectivity
fleet exec <targets> -- <cmd>  Run command on remote nodes
fleet rollback <targets>   Rollback to previous generation
fleet reboot <targets>     Reboot nodes
fleet ssh <node>           Open interactive SSH session
fleet info                 Print node registry
fleet flow list            List defined workflows
fleet flow run <name>      Execute a workflow

Targeting

Targets can be node names, @tag selectors, or --all:

fleet deploy web1                # single node (uses deploy-rs)
fleet deploy web1 web2           # multiple nodes (uses colmena)
fleet deploy @production         # all nodes tagged "production"
fleet status --all               # every node in the registry
fleet exec @k3s -- kubectl get nodes

Configuration

Fleet reads fleet.yaml from FLEET_FLAKE_DIR (or the current directory). All sections are optional.

# Global SSH defaults
ssh:
  connect_timeout: 5
  strict_host_key: accept-new
  options:
    ServerAliveInterval: "60"
    ServerAliveCountMax: "3"

# Deploy defaults
deploy:
  show_trace: false
  magic_rollback: true

# Per-node overrides
nodes:
  bastion:
    ssh:
      connect_timeout: 15
      options:
        ProxyJump: "jump.example.com"

# Lifecycle hooks
hooks:
  deploy:
    pre: "echo 'deploying $FLEET_NODE'"
    post: "echo 'deployed $FLEET_NODE'"
  build:
    pre: "git diff --quiet HEAD || echo 'WARNING: uncommitted changes'"

Hooks

Hooks run shell commands before (pre) and after (post) fleet operations. Available for: deploy, build, diff, exec, rollback, reboot.

  • Pre-hooks abort the operation on failure.
  • Post-hooks warn but continue.

Environment variables set during hook execution:

Variable Description
FLEET_NODE Node name
FLEET_HOST Hostname
FLEET_USER SSH user

Flows

Flows are named DAG workflows defined in fleet.yaml. Steps declare dependencies and fleet resolves them into topological execution levels.

flows:
  deploy-cluster:
    description: "Build, diff, then rolling deploy with health checks"
    steps:
      - id: build
        action: { type: build, show_trace: true }
        targets: [server, agent]

      - id: diff
        action: { type: diff }
        targets: [server, agent]
        depends_on: [build]

      - id: confirm
        action:
          type: shell
          command: |
            echo "Review the diff above. Press enter or Ctrl-C to abort."
            read
        depends_on: [diff]

      - id: deploy-server
        action: { type: deploy }
        targets: [server]
        depends_on: [confirm]

      - id: health-check
        action:
          type: shell
          command: "ssh root@server kubectl get nodes >/dev/null && echo 'healthy'"
        depends_on: [deploy-server]

      - id: deploy-agent
        action: { type: deploy }
        targets: [agent]
        depends_on: [health-check]

Action types

Type Description
build nix build / colmena build
deploy deploy-rs / colmena apply
diff Closure diff (current vs. new)
status Node status
ping SSH connectivity check
exec Remote command (command: ["systemctl", "status"])
shell Local shell command (command: "echo hello")
rollback Rollback NixOS generation
reboot Reboot nodes
darwin-rebuild Run nix run .#darwin-rebuild
home-manager-rebuild Run nix run .#home-manager-rebuild
flake-update Run nix flake update (optional inputs: [...])

Dry-run

Preview the execution plan without running anything:

fleet flow run deploy-cluster --dry-run
Execution plan (dry-run):

  Level 1:
    build [build] targets: server, agent
  Level 2:
    diff [diff] targets: server, agent
      depends_on: build
  Level 3:
    confirm [shell] targets: (inherit CLI targets)
      depends_on: diff
  Level 4:
    deploy-server [deploy] targets: server
      depends_on: confirm
  Level 5:
    health-check [shell] targets: (inherit CLI targets)
      depends_on: deploy-server
  Level 6:
    deploy-agent [deploy] targets: agent
      depends_on: health-check

Conditions

Steps can have a condition — a shell command that must succeed for the step to execute. If the condition fails, the step is skipped.

- id: deploy-canary
  action: { type: deploy }
  targets: [canary]
  condition:
    command: "test -f .canary-enabled"
  depends_on: [build]

Backend tools

Fleet dispatches to existing NixOS deployment tools:

  • Single nodedeploy-rs (magic rollback)
  • Multiple nodescolmena (parallel apply)
  • Diffnix store diff-closures
  • Buildnix build / colmena build

These must be on PATH when fleet runs. The Nix wrapper typically handles this.

License

MIT

Dependencies

~3–14MB
~131K SLoC