Now in private beta · macOS 14+

Your Mac's private
AI layer.

Run any open-source model directly on your laptop — or bring your own API key for the jobs that need a bigger brain. Either way, you stay in control.

Download for Mac See how it works

Apple Silicon native MLX accelerated No account required

What it does

Simple things, done locally.

OnPrem sits quietly in your menubar. Select text anywhere on your Mac — an email draft, a Notes page, a PDF — and hand it to a model running on your own hardware.

Edit selection

Summarize, rewrite, tighten, or make professional. One hotkey on any selected text, anywhere.

Ask OnPrem

Ask about what you've selected. The model reasons over it and returns an answer — no context copied to the cloud.

Custom shortcuts

Build your own Edit configs — a translation step, a tone check, a code explainer — each bound to its own hotkey.

Select

Works on anything you can highlight.

No extensions, no API keys, no new app to paste into. If your Mac can select it, OnPrem can work on it.

Notes, Mail, Safari, Pages, Xcode
Preserves selection and replaces in place
Undo always available

Selecting text with the OnPrem prompt bar

Reason

See the model think — if you want to.

Reasoning traces stream inline. Inspect the model's work, stop it mid-flight, or let it finish and stay out of your way.

Hidden reasoning toggle
Interrupt anytime
Streamed, not batched

Apply

Replaces in place — never out-of-band.

Output lands exactly where your selection was. A quiet toast confirms the change with a one-click undo.

In-document replacement
Non-destructive, always reversible
No floating windows to manage

Make Professional applied toast with Undo

Models

On-device by default. Cloud when you need it.

Pick any MLX-compatible model on the Hugging Face Hub — thousands of them — and run it locally. Or flip a toggle, bring an API key, and route to OpenAI, Anthropic, or any OpenAI-compatible endpoint.

Local · MLX

Anything on Hugging Face. Running on your Mac.

Paste an org prefix — mlx-community/ — and OnPrem surfaces the full catalog. We flag what's compatible, show quantization and size, and load it straight into the runtime.

Thousands of MLX-ready models on the Hub
Compatibility + quantization detected automatically
README and metadata inline, no browser trip

Add a cloud provider with your own API key

BYO key · any provider

Not every job fits on a laptop. That's fine.

When local isn't the right call — bigger context, a frontier model, a custom internal endpoint — drop in an API key and route through it. Your key stays in the Keychain; OnPrem never sees it.

OpenAI, Anthropic, Google, Groq, and OpenAI-compatible
Custom base URLs for your own deployments
Per-shortcut model selection — mix local and cloud

Runtime

MLX-native. Built for Apple Silicon.

The local runtime uses MLX to run quantized open-source models efficiently on your Mac's Neural Engine and GPU. Warm models stay in memory; cold ones load in seconds.

MLX Local Runtime — built in
4-bit / 8-bit quantization supported
Hot-swap between local and cloud per shortcut

Edit shortcuts

Bind your most-used actions to a key.

Every shortcut is a saved Edit Configuration — a name, an icon, a system prompt, a hotkey. Build new ones in seconds. No config files, no CLI, no restart.

Summarize

⌥⌘P

Make professional

⌃⌘P

Simplify

⌃⌘S

Bullet points

⌃⌘B

Translate to English

⌃⌘T

Explain this code

⌃⌘E

Privacy, by default

Local-first. Always your choice.

Local mode sends nothing to anyone. Switch to a provider and your key lives in the Keychain — OnPrem itself never reads or stores it.

Bytes sent in Local mode

With an on-device model selected, prompts and outputs never leave the machine. Works offline.

BYO

Keys, never ours

API keys are stored in the macOS Keychain. OnPrem has no backend to send them through.

∞

Context, yours

No rate limits on local runs. Run it for hours — it's just your laptop.

OSS

Open weights supported

Any MLX-compatible model on the Hugging Face Hub. Bring your own, fine-tune your own.

Questions

Everything else.

Yes, completely. Once a model is downloaded, OnPrem never needs the internet — inference, shortcuts, and reasoning all run from your machine's memory and storage.

Any Apple Silicon Mac (M1 or newer) on macOS 14+. Larger models benefit from 16 GB of unified memory; 4-bit quantized models run comfortably on 8 GB machines.

Yes — any MLX-compatible model. Drop it into the Models tab, set a quantization preset, and it shows up in every shortcut's model picker.

Only crash reports, opt-in. No prompts, no selections, no model usage data. You can disable it entirely in Settings.

Open the Edit tab, hit "New Configuration," fill in the name, icon, system prompt, and keyboard shortcut. It's live immediately — no restart, no config files.

Your Mac's private AI layer.