Local vs. cloud — and what "local-first" means

If this is your first stop, start here. It explains the one idea that makes Daisy different — local-first — and the one real choice you'll make: whether Daisy turns your audio into text on your own computer or through a cloud service.

What "local-first" means

Most meeting tools work like this: a bot joins your call, your audio is sent to a company's servers, it's processed there, and your transcripts live in an account you don't control.

Daisy flips that around. It's an app on your computer. Your recordings, transcripts, summaries, and notes are saved as files on your disk, in a folder you choose. By default, nothing leaves your machine. You can use Daisy with no internet at all.

That's "local-first": your stuff lives with you first, and anything online is something you opt into — never the default.

Why it's nice

Private — your conversations stay on your device.
Yours — the files are right there; back them up, move them, keep them forever.
Works offline — on a plane, in a basement, wherever.
No mystery cloud — you decide if and when anything goes online.

The trade-off — doing the work on your own machine means your computer does the heavy lifting. That's where the next choice comes in.

The one choice: how Daisy makes the transcript

Turning speech into text is the heaviest job. Daisy can do it two ways, and both give you a good transcript:

Local — on your computer

Cost
Free, always

Privacy
Audio stays on your device

Internet
Not needed

Speed
A few minutes (per hour of audio)

Your machine
Works harder — more fan & battery

Speaker labels
Free, on-device

Cloud — a paid service

Cost
A small fraction of a cent / minute

Privacy
Audio sent to the provider

Internet
Required

Speed
A few seconds

Your machine
Stays light — work runs elsewhere

Speaker labels
Free, on-device

Rough comparison — your real speeds depend on your computer.

On your computer (local) — the default

Daisy runs a speech model right on your machine (it downloads a small one the first time; you can pick a slightly more accurate one later).

Free — no per-minute charges, ever.
Private — your audio never leaves your computer.
Offline — no internet needed.
Uses your computer's power — the machine works harder while it transcribes, so a laptop's fan may spin up and the battery drains a bit faster.
Plenty fast — it processes audio far quicker than real time, so a one-hour meeting finishes in a few minutes on a typical computer.

Through a cloud service (cloud) — optional

Instead, Daisy can send the audio to a paid transcription service (you bring your own key for one, like Deepgram or OpenAI).

Fastest — results come back in seconds.
Easy on your machine — the service's servers do the work, not your CPU.
Costs a little — a small fraction of a cent per minute, paid directly to that provider.
Needs internet and a key you set up.
Sends your audio to that third party to process.

A reassurance about "who said what"

Worried that going local means losing speaker labels? It doesn't. Daisy works out who said what on your own machine, for free, no matter which option you pick. That part is always local.

About the numbers

Any speeds or costs mentioned here are rough, order-of-magnitude comparisons — "a few minutes" vs "a few seconds," "free" vs "a fraction of a cent." Your real experience depends on your computer, the length of the meeting, and which models you choose. Treat them as relative guidance, not promises.

So which should I pick?

Most people should start local. It's free, fully private, works offline, and is accurate enough that you likely won't notice the difference in everyday meetings — and you keep speaker labels for free.

Switch to cloud if you want the fastest possible turnaround, you transcribe constantly and don't mind a tiny per-minute cost, or you'd rather keep your laptop cool and quiet by offloading the work.

You can change your mind anytime in Settings — and even mix approaches.

One more thing: summaries (the TL;DR, action items, etc.) are a separate feature that always needs an AI model — either a local one or a cloud key — no matter how you transcribed. Your transcription choice doesn't change that.