Architecture
This section outlines the primary services, storage layers, and communication paths in a self-hosted Dreadnode stack.
System diagram
Section titled “System diagram” Internet │ ▼ ┌───────────────────────┐ │ Caddy Proxy │ │ /api/* → API │ │ /* → Frontend │ └───────────┬───────────┘ ┌────────┴────────┐ ▼ ▼┌──────────────────────────┐ ┌──────────────────────────┐│ Frontend (SvelteKit) │ │ API Server (FastAPI) ││ packages/frontend/ │ │ packages/api/ ││ │ │ ││ │ │ ┌──────────────────────┐ ││ │ │ │ Training Worker │ ││ │ │ │ Evaluation Worker │ ││ │ │ └──────────────────────┘ │└──────────────────────────┘ └─────────────┬─────────────┘ │ │ │ │ ┌───────────┘ │ │ └──────────┐ ▼ ▼ ▼ ▼ ┌───────────┐ ┌────────┐ ┌──────┐ ┌───────────┐ │PostgreSQL │ │ Click- │ │ S3/ │ │ LiteLLM │ │ │ │ House │ │MinIO │ │ │ │Users, Orgs│ │ │ │ │ │ LLM │ │RBAC, Meta │ │Traces │ │Pkgs │ │ Proxy │ │Jobs │ │Metrics │ │Artif.│ │ │ └───────────┘ └────────┘ └──────┘ └───────────┘
┌──────────────────────────────────────────────────┐ │ Sandbox Proxy │ │ *.sandbox.<domain> → Docker / E2B sandboxes │ └──────────────────────────────────────────────────┘Components
Section titled “Components”| Component | Purpose | Technology |
|---|---|---|
| Caddy Proxy | Routes requests to API and Frontend | Caddy 2 |
| API | Backend service, business logic, worker host | FastAPI, SQLAlchemy, Pydantic |
| Training Worker | In-process training job executor (config-gated) | Runs inside API process |
| Evaluation Worker | In-process evaluation job executor (config-gated) | Runs inside API process |
| Frontend | Web UI for users | SvelteKit, TypeScript |
| LiteLLM | LLM inference proxy for dn/ model aliases | LiteLLM |
| Sandbox Proxy | Public wildcard proxy for sandboxes | Caddy + AWS ALB/ECS |
| Agent Sandbox | On-demand compute for running AI agents | Docker or E2B |
| PostgreSQL | State data (users, orgs, RBAC, jobs) | Postgres 16 |
| ClickHouse | Event data, OTEL traces, telemetry | ClickHouse 24.x |
| S3/MinIO | Object storage for packages and artifacts | AWS S3 or MinIO |
Data flow
Section titled “Data flow”- State Data (PostgreSQL): users, organizations, projects, RBAC metadata, training/evaluation job records.
- Event Data (ClickHouse): OTEL traces, run telemetry, metrics, high-volume logs.
- Object Storage (S3/MinIO): packages, artifacts, file uploads, training checkpoints.
In-process workers
Section titled “In-process workers”The API server hosts two optional background workers that poll for and execute jobs. These are not separate services — they run as async loops inside the API process.
- Training Worker: Enabled via
TRAINING_IN_PROCESS_WORKER_ENABLED. Polls for training jobs and dispatches them to a backend (Tinker, Ray). Configurable concurrency and poll interval. - Evaluation Worker: Enabled via
EVALUATION_IN_PROCESS_WORKER_ENABLED. Polls for evaluation jobs, provisions sandboxes, and runs evaluation items. Configurable concurrency and poll interval.
See Configuration for all worker environment variables.
API architecture (DDD)
Section titled “API architecture (DDD)”The API follows a Domain-Driven Design layout. Each domain is isolated under app/[domain]/:
app/├── api/v1/ # Router aggregation only├── core/ # Foundational infrastructure (no external deps)├── infra/ # External integrations (DB, S3, ClickHouse)└── [domain]/ # Business domains (auth, users, projects, etc.) ├── models.py # SQLAlchemy models ├── schemas.py # Pydantic schemas ├── service.py # Business logic ├── repository.py # Data access └── router_v1.py # HTTP routesPackage overview
Section titled “Package overview”| Package | Purpose | Technology |
|---|---|---|
packages/api | Backend API server | FastAPI, SQLAlchemy, Pydantic |
packages/sdk | Python client SDK | httpx, Pydantic |
packages/frontend | Web application | SvelteKit, TypeScript, Tailwind |
platform/pulumi | AWS infrastructure | Pulumi (Python) |
Communication paths
Section titled “Communication paths”- The Caddy proxy is the entry point:
/api/*routes to the API server, everything else to the Frontend. - The Frontend communicates with the API via HTTP/REST through the Caddy proxy.
- The API reads and writes state data in Postgres, event data in ClickHouse, and objects in S3/MinIO.
- The API calls LiteLLM for
dn/-prefixed model inference. LiteLLM proxies to upstream LLM providers. - The Sandbox Proxy routes
*.sandbox.<domain>wildcard subdomains to sandbox backends (Docker or E2B). - Agent Sandboxes are provisioned on-demand by the API for runtimes, evaluations, and training jobs.