Aether Desktop · Beta · Apple Silicon
A local AI gateway for Apple Silicon Macs. One OpenAI-compatible endpoint. Local models, cloud providers. Your prompts never leave your machine.
% curl -fsSL https://aether.ufrik.com/desktop/macos/install.sh | sh
macOS 13+ · Apple Silicon · Sparkle auto-updates
/v1/chat/completions, /v1/messages, /v1/responses — any client works immediately
The product
Aether Desktop packages the gateway, companion app, local model catalog, managed Inference Server, and a monitoring dashboard into a native macOS app. No Postgres, no Docker, no config files to edit.
Built in
Test any configured model from the Chat tab without writing code. Works across local and cloud providers from a single interface — useful for prompt engineering, model comparisons, and quick sanity checks before wiring up a client.
Agentic coding
Point opencode at http://127.0.0.1:8181/v1 and run full agentic coding sessions against your local models. File edits, shell commands, multi-step reasoning — no cloud required, no prompts leaving your machine.
One /v1 endpoint that Claude Code, Codex, Cursor, OpenCode, and any other OpenAI-compatible client connects to immediately.
Chat completions, Anthropic Messages, and OpenAI Responses API from a single port. No adapter config needed.
Download curated GGUF models from the app. Aether handles the runtime package, model files, start/stop, and auto-start.
Add local servers or cloud providers by pasting an API key. All exposed through the same endpoint and key.
Token usage by model and key, request latency, error breakdown, and full request history — stored locally in SQLite.
Create, revoke, and copy connection snippets for any client. Keys are stored as SHA-256 hashes. Plaintext shown once.
Dashboard showing backend health, request latency percentiles, Mac hardware specs, and daemon event log.
Surface actual endpoints, Mac power-state warnings that affect inference, and Inference Server logs in real time.
New versions install silently in the background. No package manager, no brew upgrade, no manual DMG downloads.
Inside the app
The Usage tab surfaces token consumption, request timing, cost equivalence, and per-model breakdowns — all stored locally. No cloud dashboard, no data shared.
No terminal setup, no YAML files, no Docker. Download, open, pick a model. Aether handles the rest — runtime packages, key management, updates. If you can install an app, you can run local AI.
Eight live metrics: total tokens, cloud-cost equivalent, request count, avg latency, top model, error count, provider health, and which client sent the most requests — opencode, Claude Code, curl, or anything else.
Every token routed to a local model costs nothing. The Est. Cost counter stays at zero. The cloud equivalent column shows what the same volume would cost on a hosted API — the difference is what stays in your pocket.
Compatibility
Aether speaks three protocol surfaces from one port. Any client that understands OpenAI works immediately — no adapters, no configuration changes.
| Protocol | Tools that use it | Notes |
|---|---|---|
POST /v1/chat/completions |
Claude Code, Cursor, OpenCode, curl, Python openai, OpenWebUI |
Standard OpenAI format. Default for most tools. |
POST /v1/messages |
Anthropic SDK, Claude Code (Anthropic auth) | Native Anthropic Messages format. Translated to chat completions internally. |
POST /v1/responses |
Codex CLI 0.141+, OpenAI Responses SDK |
Stateless. previous_response_id rejected cleanly so clients retry with full context.
|
Coverage
Aether keeps local model adoption frictionless while giving you one place to route cloud models — behind the same API key and endpoint your local models use.
| Provider | Type | Register with |
|---|---|---|
| Aether local models | Managed local | Download from catalog. Aether handles the runtime. |
| llama.cpp | Local |
Base URL of llama-server (without /v1)
|
| Ollama | Local | http://localhost:11434 |
| MLX-LM | Local | Base URL of the MLX server |
| LM Studio | Local | Same engine as llama.cpp |
| vLLM | Local | Base URL of the vLLM server |
| OpenAI | Cloud | API key from platform.openai.com |
| OpenRouter | Cloud | API key; hundreds of hosted models |
| NVIDIA | Cloud | API key; HuggingFace-style model IDs |
| DeepSeek · Kimi · MiniMax | Cloud |
API key; base URL without /v1
|
Getting started
Run the command. Downloads the signed DMG, installs Aether.app into Applications, and launches it.
Pick a local model from the catalog or add an existing provider — a local server URL or a cloud API key.
Aether downloads the Inference Server package if needed and starts it. The system log shows daemon events in real time.
Point any client at http://127.0.0.1:8181/v1. Or open opencode, set the Aether endpoint, and start a full Agent session — code edits, shell commands, multi-step reasoning — powered by your local models. No cloud account needed.
Security by default
"Your data never leaves" is not a marketing claim — it is a consequence of the architecture.
Bearer tokens are stored as SHA-256 hashes. The plaintext is shown once at creation and never again.
Cloud API keys are stored in the macOS Keychain, not in a plaintext config file.
Requests to local models never leave the machine. No Aether server, no usage reporting, no telemetry.
All audit logs, token counts, and request history are written to a local SQLite database. No cloud database.
The gateway binds to 127.0.0.1:8181 by default. Network binding is an explicit opt-in, not a default.
No multi-tenant attack surface. The companion manages one daemon and one key store for the current user.
One command. Downloads the signed DMG, installs into Applications, launches the app.
% curl -fsSL https://aether.ufrik.com/desktop/macos/install.sh | sh
Requires macOS 13+ on Apple Silicon · Sparkle auto-updates · v0.3.1.42
:8181/v1, rewrites them for the selected backend, and streams the response back…