Aether Desktop · Beta · Apple Silicon

Your AI.
Your data.
No middlemen.

A local AI gateway for Apple Silicon Macs. One OpenAI-compatible endpoint. Local models, cloud providers. Your prompts never leave your machine.

Terminal — zsh

% curl -fsSL https://aether.ufrik.com/desktop/macos/install.sh | sh

Copied!

macOS 13+ · Apple Silicon · Sparkle auto-updates

127.0.0.1:8181

Dashboard

Status

Online

Requests

1,482

Latency p50

84 ms

Active models

qwopus-3.6-35blocal

gpt-4ocloud

claude-sonnet-4-6cloud

deepseek-r1cloud

Protocol surfaces — /v1/chat/completions, /v1/messages, /v1/responses — any client works immediately

12+

Providers — local engines and cloud APIs behind a single endpoint and API key

External bytes for local inference — prompts, tokens, and responses stay on your hardware. Air-gap capable.

The product

A desktop control plane for AI.

Aether Desktop packages the gateway, companion app, local model catalog, managed Inference Server, and a monitoring dashboard into a native macOS app. No Postgres, no Docker, no config files to edit.

Built in

Chat

Test any configured model from the Chat tab without writing code. Works across local and cloud providers from a single interface — useful for prompt engineering, model comparisons, and quick sanity checks before wiring up a client.

Agentic coding

Agent

Point opencode at http://127.0.0.1:8181/v1 and run full agentic coding sessions against your local models. File edits, shell commands, multi-step reasoning — no cloud required, no prompts leaving your machine.

OpenAI-compatible gateway

One /v1 endpoint that Claude Code, Codex, Cursor, OpenCode, and any other OpenAI-compatible client connects to immediately.

Three protocol surfaces

Chat completions, Anthropic Messages, and OpenAI Responses API from a single port. No adapter config needed.

Managed local model catalog

Download curated GGUF models from the app. Aether handles the runtime package, model files, start/stop, and auto-start.

Provider routing

Add local servers or cloud providers by pasting an API key. All exposed through the same endpoint and key.

Usage and audit

Token usage by model and key, request latency, error breakdown, and full request history — stored locally in SQLite.

API key management

Create, revoke, and copy connection snippets for any client. Keys are stored as SHA-256 hashes. Plaintext shown once.

Live system status

Dashboard showing backend health, request latency percentiles, Mac hardware specs, and daemon event log.

Desktop safety signals

Surface actual endpoints, Mac power-state warnings that affect inference, and Inference Server logs in real time.

Sparkle auto-updates

New versions install silently in the background. No package manager, no brew upgrade, no manual DMG downloads.

Inside the app

Everything your inference session is doing, at a glance.

The Usage tab surfaces token consumption, request timing, cost equivalence, and per-model breakdowns — all stored locally. No cloud dashboard, no data shared.

Zero configuration

Works out of the box. For everyone.

No terminal setup, no YAML files, no Docker. Download, open, pick a model. Aether handles the rest — runtime packages, key management, updates. If you can install an app, you can run local AI.

Usage analytics

Tokens, requests, latency, top client.

Eight live metrics: total tokens, cloud-cost equivalent, request count, avg latency, top model, error count, provider health, and which client sent the most requests — opencode, Claude Code, curl, or anything else.

Local inference

Millions of tokens. $0.

Every token routed to a local model costs nothing. The Est. Cost counter stays at zero. The cloud equivalent column shows what the same volume would cost on a hosted API — the difference is what stays in your pocket.

Compatibility

Works with the tools you already use.

Aether speaks three protocol surfaces from one port. Any client that understands OpenAI works immediately — no adapters, no configuration changes.

Protocol	Tools that use it	Notes
`POST /v1/chat/completions`	Claude Code, Cursor, OpenCode, curl, Python `openai`, OpenWebUI	Standard OpenAI format. Default for most tools.
`POST /v1/messages`	Anthropic SDK, Claude Code (Anthropic auth)	Native Anthropic Messages format. Translated to chat completions internally.
`POST /v1/responses`	Codex CLI 0.141+, OpenAI Responses SDK	Stateless. `previous_response_id` rejected cleanly so clients retry with full context.

Coverage

Local first. Cloud when it helps.

Aether keeps local model adoption frictionless while giving you one place to route cloud models — behind the same API key and endpoint your local models use.

Provider	Type	Register with
Aether local models	Managed local	Download from catalog. Aether handles the runtime.
llama.cpp	Local	Base URL of `llama-server` (without `/v1`)
Ollama	Local	`http://localhost:11434`
MLX-LM	Local	Base URL of the MLX server
LM Studio	Local	Same engine as llama.cpp
vLLM	Local	Base URL of the vLLM server
OpenAI	Cloud	API key from platform.openai.com
OpenRouter	Cloud	API key; hundreds of hosted models
NVIDIA	Cloud	API key; HuggingFace-style model IDs
DeepSeek · Kimi · MiniMax	Cloud	API key; base URL without `/v1`

Getting started

Install to a running model in four steps.

Install

Run the command. Downloads the signed DMG, installs Aether.app into Applications, and launches it.

Choose

Pick a local model from the catalog or add an existing provider — a local server URL or a cloud API key.

Run

Aether downloads the Inference Server package if needed and starts it. The system log shows daemon events in real time.

Batteries included

Connect or run

Point any client at http://127.0.0.1:8181/v1. Or open opencode, set the Aether endpoint, and start a full Agent session — code edits, shell commands, multi-step reasoning — powered by your local models. No cloud account needed.

Security by default

What stays private, and how.

"Your data never leaves" is not a marketing claim — it is a consequence of the architecture.

API keys stored as hashes

Bearer tokens are stored as SHA-256 hashes. The plaintext is shown once at creation and never again.

Provider credentials in the keychain

Cloud API keys are stored in the macOS Keychain, not in a plaintext config file.

Local inference: zero egress

Requests to local models never leave the machine. No Aether server, no usage reporting, no telemetry.

Local-only storage

All audit logs, token counts, and request history are written to a local SQLite database. No cloud database.

Localhost by default

The gateway binds to 127.0.0.1:8181 by default. Network binding is an explicit opt-in, not a default.

Single-user, single-process

No multi-tenant attack surface. The companion manages one daemon and one key store for the current user.

Install Aether Desktop.

One command. Downloads the signed DMG, installs into Applications, launches the app.

Terminal — zsh

% curl -fsSL https://aether.ufrik.com/desktop/macos/install.sh | sh

Copied!

Requires macOS 13+ on Apple Silicon · Sparkle auto-updates · v0.3.1.42

127.0.0.1:8181 — Chat

Chat

qwopus-3.6-35b · local Explain the proxy architecture in one paragraph.

Assistant Aether Desktop embeds a Rust proxy that accepts requests on :8181/v1, rewrites them for the selected backend, and streams the response back…

Ask anything…

Send

Your AI.Your data.No middlemen.

A desktop control plane for AI.

Chat

Agent

OpenAI-compatible gateway

Three protocol surfaces

Managed local model catalog

Provider routing

Usage and audit

API key management

Live system status

Desktop safety signals

Sparkle auto-updates

Everything your inference session is doing, at a glance.

Works out of the box. For everyone.

Tokens, requests, latency, top client.

Millions of tokens. $0.

Works with the tools you already use.

Local first. Cloud when it helps.

Install to a running model in four steps.

Install

Choose

Run

Connect or run

What stays private, and how.

API keys stored as hashes

Provider credentials in the keychain

Local inference: zero egress

Local-only storage

Localhost by default

Single-user, single-process

Install Aether Desktop.

Your AI.
Your data.
No middlemen.