Aether Desktop · Beta · Apple Silicon

Your AI.
Your data.
No middlemen.

A local AI gateway for Apple Silicon Macs. One OpenAI-compatible endpoint. Local models, cloud providers. Your prompts never leave your machine.

Terminal — zsh
% curl -fsSL https://aether.ufrik.com/desktop/macos/install.sh | sh
Copied!

macOS 13+ · Apple Silicon · Sparkle auto-updates

127.0.0.1:8181
Dashboard
Models
Providers
Usage
Keys
Chat
Dashboard
Status
Online
Requests
1,482
Latency p50
84 ms
Active models
qwopus-3.6-35blocal
gpt-4ocloud
claude-sonnet-4-6cloud
deepseek-r1cloud
3
Protocol surfaces — /v1/chat/completions, /v1/messages, /v1/responses — any client works immediately
12+
Providers — local engines and cloud APIs behind a single endpoint and API key
0
External bytes for local inference — prompts, tokens, and responses stay on your hardware. Air-gap capable.

The product

A desktop control plane for AI.

Aether Desktop packages the gateway, companion app, local model catalog, managed Inference Server, and a monitoring dashboard into a native macOS app. No Postgres, no Docker, no config files to edit.

Built in

Chat

Test any configured model from the Chat tab without writing code. Works across local and cloud providers from a single interface — useful for prompt engineering, model comparisons, and quick sanity checks before wiring up a client.

Agentic coding

Agent

Point opencode at http://127.0.0.1:8181/v1 and run full agentic coding sessions against your local models. File edits, shell commands, multi-step reasoning — no cloud required, no prompts leaving your machine.

OpenAI-compatible gateway

One /v1 endpoint that Claude Code, Codex, Cursor, OpenCode, and any other OpenAI-compatible client connects to immediately.

Three protocol surfaces

Chat completions, Anthropic Messages, and OpenAI Responses API from a single port. No adapter config needed.

Managed local model catalog

Download curated GGUF models from the app. Aether handles the runtime package, model files, start/stop, and auto-start.

Provider routing

Add local servers or cloud providers by pasting an API key. All exposed through the same endpoint and key.

Usage and audit

Token usage by model and key, request latency, error breakdown, and full request history — stored locally in SQLite.

API key management

Create, revoke, and copy connection snippets for any client. Keys are stored as SHA-256 hashes. Plaintext shown once.

Live system status

Dashboard showing backend health, request latency percentiles, Mac hardware specs, and daemon event log.

Desktop safety signals

Surface actual endpoints, Mac power-state warnings that affect inference, and Inference Server logs in real time.

Sparkle auto-updates

New versions install silently in the background. No package manager, no brew upgrade, no manual DMG downloads.

Inside the app

Everything your inference session is doing, at a glance.

The Usage tab surfaces token consumption, request timing, cost equivalence, and per-model breakdowns — all stored locally. No cloud dashboard, no data shared.

Aether Desktop — Usage dashboard
Zero configuration

Works out of the box. For everyone.

No terminal setup, no YAML files, no Docker. Download, open, pick a model. Aether handles the rest — runtime packages, key management, updates. If you can install an app, you can run local AI.

Usage analytics

Tokens, requests, latency, top client.

Eight live metrics: total tokens, cloud-cost equivalent, request count, avg latency, top model, error count, provider health, and which client sent the most requests — opencode, Claude Code, curl, or anything else.

Local inference

Millions of tokens. $0.

Every token routed to a local model costs nothing. The Est. Cost counter stays at zero. The cloud equivalent column shows what the same volume would cost on a hosted API — the difference is what stays in your pocket.

Compatibility

Works with the tools you already use.

Aether speaks three protocol surfaces from one port. Any client that understands OpenAI works immediately — no adapters, no configuration changes.

Protocol Tools that use it Notes
POST /v1/chat/completions Claude Code, Cursor, OpenCode, curl, Python openai, OpenWebUI Standard OpenAI format. Default for most tools.
POST /v1/messages Anthropic SDK, Claude Code (Anthropic auth) Native Anthropic Messages format. Translated to chat completions internally.
POST /v1/responses Codex CLI 0.141+, OpenAI Responses SDK Stateless. previous_response_id rejected cleanly so clients retry with full context.

Coverage

Local first. Cloud when it helps.

Aether keeps local model adoption frictionless while giving you one place to route cloud models — behind the same API key and endpoint your local models use.

Provider Type Register with
Aether local models Managed local Download from catalog. Aether handles the runtime.
llama.cpp Local Base URL of llama-server (without /v1)
Ollama Local http://localhost:11434
MLX-LM Local Base URL of the MLX server
LM Studio Local Same engine as llama.cpp
vLLM Local Base URL of the vLLM server
OpenAI Cloud API key from platform.openai.com
OpenRouter Cloud API key; hundreds of hosted models
NVIDIA Cloud API key; HuggingFace-style model IDs
DeepSeek · Kimi · MiniMax Cloud API key; base URL without /v1

Getting started

Install to a running model in four steps.

01

Install

Run the command. Downloads the signed DMG, installs Aether.app into Applications, and launches it.

02

Choose

Pick a local model from the catalog or add an existing provider — a local server URL or a cloud API key.

03

Run

Aether downloads the Inference Server package if needed and starts it. The system log shows daemon events in real time.

04
Batteries included

Connect or run

Point any client at http://127.0.0.1:8181/v1. Or open opencode, set the Aether endpoint, and start a full Agent session — code edits, shell commands, multi-step reasoning — powered by your local models. No cloud account needed.

Security by default

What stays private, and how.

"Your data never leaves" is not a marketing claim — it is a consequence of the architecture.

API keys stored as hashes

Bearer tokens are stored as SHA-256 hashes. The plaintext is shown once at creation and never again.

Provider credentials in the keychain

Cloud API keys are stored in the macOS Keychain, not in a plaintext config file.

Local inference: zero egress

Requests to local models never leave the machine. No Aether server, no usage reporting, no telemetry.

Local-only storage

All audit logs, token counts, and request history are written to a local SQLite database. No cloud database.

Localhost by default

The gateway binds to 127.0.0.1:8181 by default. Network binding is an explicit opt-in, not a default.

Single-user, single-process

No multi-tenant attack surface. The companion manages one daemon and one key store for the current user.

Install Aether Desktop.

One command. Downloads the signed DMG, installs into Applications, launches the app.

Terminal — zsh
% curl -fsSL https://aether.ufrik.com/desktop/macos/install.sh | sh
Copied!

Requires macOS 13+ on Apple Silicon · Sparkle auto-updates · v0.3.1.42

127.0.0.1:8181 — Chat
Dashboard
Models
Providers
Usage
Keys
Chat
Chat
qwopus-3.6-35b · local Explain the proxy architecture in one paragraph.
Assistant Aether Desktop embeds a Rust proxy that accepts requests on :8181/v1, rewrites them for the selected backend, and streams the response back…
Ask anything…
Send