Now in private beta · macOS 14+

Your Mac's private
AI layer.

Run any open-source model directly on your laptop — or bring your own API key for the jobs that need a bigger brain. Either way, you stay in control.

Apple Silicon native MLX accelerated No account required
OnPrem app overview

Simple things, done locally.

OnPrem sits quietly in your menubar. Select text anywhere on your Mac — an email draft, a Notes page, a PDF — and hand it to a model running on your own hardware.

Edit selection

Summarize, rewrite, tighten, or make professional. One hotkey on any selected text, anywhere.

Ask OnPrem

Ask about what you've selected. The model reasons over it and returns an answer — no context copied to the cloud.

Custom shortcuts

Build your own Edit configs — a translation step, a tone check, a code explainer — each bound to its own hotkey.

Select

Works on anything you can highlight.

No extensions, no API keys, no new app to paste into. If your Mac can select it, OnPrem can work on it.

  • Notes, Mail, Safari, Pages, Xcode
  • Preserves selection and replaces in place
  • Undo always available
Selecting text with the OnPrem prompt bar
Ask OnPrem showing hidden reasoning trace
Reason

See the model think — if you want to.

Reasoning traces stream inline. Inspect the model's work, stop it mid-flight, or let it finish and stay out of your way.

  • Hidden reasoning toggle
  • Interrupt anytime
  • Streamed, not batched
Apply

Replaces in place — never out-of-band.

Output lands exactly where your selection was. A quiet toast confirms the change with a one-click undo.

  • In-document replacement
  • Non-destructive, always reversible
  • No floating windows to manage
Make Professional applied toast with Undo

On-device by default. Cloud when you need it.

Pick any MLX-compatible model on the Hugging Face Hub — thousands of them — and run it locally. Or flip a toggle, bring an API key, and route to OpenAI, Anthropic, or any OpenAI-compatible endpoint.

Local · MLX

Anything on Hugging Face. Running on your Mac.

Paste an org prefix — mlx-community/ — and OnPrem surfaces the full catalog. We flag what's compatible, show quantization and size, and load it straight into the runtime.

  • Thousands of MLX-ready models on the Hub
  • Compatibility + quantization detected automatically
  • README and metadata inline, no browser trip
Add model from Hugging Face Hub
Add a cloud provider with your own API key
BYO key · any provider

Not every job fits on a laptop. That's fine.

When local isn't the right call — bigger context, a frontier model, a custom internal endpoint — drop in an API key and route through it. Your key stays in the Keychain; OnPrem never sees it.

  • OpenAI, Anthropic, Google, Groq, and OpenAI-compatible
  • Custom base URLs for your own deployments
  • Per-shortcut model selection — mix local and cloud
Runtime

MLX-native. Built for Apple Silicon.

The local runtime uses MLX to run quantized open-source models efficiently on your Mac's Neural Engine and GPU. Warm models stay in memory; cold ones load in seconds.

  • MLX Local Runtime — built in
  • 4-bit / 8-bit quantization supported
  • Hot-swap between local and cloud per shortcut
Model inventory inside OnPrem

Bind your most-used actions to a key.

Every shortcut is a saved Edit Configuration — a name, an icon, a system prompt, a hotkey. Build new ones in seconds. No config files, no CLI, no restart.

Summarize
⌥⌘P
Make professional
⌃⌘P
Simplify
⌃⌘S
Bullet points
⌃⌘B
Translate to English
⌃⌘T
Explain this code
⌃⌘E

Edit Configuration

Saved edit actions define the shortcut, instruction, and output behavior used by the Edit runtime.

Icon
System Prompt
This instruction is sent whenever the saved shortcut runs.
Summarize the selected text in 2–3 clear sentences. Preserve the original language and keep only the key points.
Keyboard Shortcut
Choose modifiers plus a single key. Conflicts are detected immediately.
⌘ Command ⌥ Option ⇧ Shift ⌃ Control P

Local-first. Always your choice.

Local mode sends nothing to anyone. Switch to a provider and your key lives in the Keychain — OnPrem itself never reads or stores it.

0
Bytes sent in Local mode
With an on-device model selected, prompts and outputs never leave the machine. Works offline.
BYO
Keys, never ours
API keys are stored in the macOS Keychain. OnPrem has no backend to send them through.
Context, yours
No rate limits on local runs. Run it for hours — it's just your laptop.
OSS
Open weights supported
Any MLX-compatible model on the Hugging Face Hub. Bring your own, fine-tune your own.

Install it. Forget it's there.

Drop OnPrem in your Applications folder and it'll live in your menubar, ready when you select something.

Download OnPrem for Mac Free · macOS 14+

Everything else.

Yes, completely. Once a model is downloaded, OnPrem never needs the internet — inference, shortcuts, and reasoning all run from your machine's memory and storage.
Any Apple Silicon Mac (M1 or newer) on macOS 14+. Larger models benefit from 16 GB of unified memory; 4-bit quantized models run comfortably on 8 GB machines.
Yes — any MLX-compatible model. Drop it into the Models tab, set a quantization preset, and it shows up in every shortcut's model picker.
Only crash reports, opt-in. No prompts, no selections, no model usage data. You can disable it entirely in Settings.
Open the Edit tab, hit "New Configuration," fill in the name, icon, system prompt, and keyboard shortcut. It's live immediately — no restart, no config files.