Setoku
Open source · MCP server · Apache-2.0

Give your AI a read-only window into your company's data.

Setoku is a small self-hosted MCP server. It does two things: it gives your AI a read-only way to query your data, and it remembers what that data means (the metric definitions, the gotchas), getting better the more you use it. No model runs on the server; it works with whatever AI you already have.

Quickstart

Setoku installs as a Claude Code plugin, so setup happens right inside Claude. Add the plugin (no server needed yet), then run onboarding from your main project directory, the codebase you want Setoku to learn from:

# 1 · add the plugin (skills only)
/plugin marketplace add Hedgy-Labs/setoku
/plugin install setoku@setoku

# 2 · from your project directory
/setoku:onboard

Onboarding stands up the server (on a small VPS you provide), connects this Claude to it, wires your database read-only, and generates the first knowledge from your code. You stay in the loop for anything that touches your data. Or just tell Claude "set up setoku."

You'll need somewhere to host it (a small VPS works) and an admin connection URL for the database you want it to learn about.

Or stand up the server by hand
# on a fresh Ubuntu VPS
git clone https://github.com/Hedgy-Labs/setoku /opt/setoku
cd /opt/setoku
SETOKU_ADMIN_USER=you ./deploy/bootstrap.sh

Installs Docker, generates secrets, gets an HTTPS certificate, and prints the connect command. Then add the plugin and run /setoku:onboard from your project; it'll detect the box you just made.

How it works

Setoku gives the AI two kinds of MCP tools, and one rule: look up what the data means before you touch it.

Context toolsfind_context · get_metric
The AI reads what your data means first: canonical metric definitions, entity docs, and the gotchas that make a naive query wrong.
Read-only queryget_schema · run_query
Read-only, with a row cap, a statement timeout, a table allow-list, and an append-only audit log. Enforced by the database role, not by parsing SQL.
Human-approvedreport_correction
The AI can only propose changes to what Setoku knows. A person accepts them on the admin page, outside the agent loop, so an injected session can't rewrite the brain.

What's in the box

You can host this on whatever infra you like. We put it all on one small box (a ~$12/mo VPS): it's simple, and it lets us turn a coding agent loose on the box to add custom integrations. Only the proxy is public; your databases aren't exposed. Queries are read-only, and knowledge changes pass through a person.

You + your AI MCP your box · only the proxy is public Setoku gateway context · query tools a person (admin) Your database read-only · not copied Knowledge store read-only proposes approves

Integrations

Point Setoku at the data you already have. Setup skills are included that help your coding agent wire everything up and add new integrations.

Query live (read-only)

PostgreSQLyour app database

Lake — ingested logs & events

ClickHouselogs · events · telemetry

Source connectors

Verceldeploys & logs
Renderdeploys & logs
Slackmessages
Mercurybanking & finance

AI clients (over MCP)

Claudeclaude.ai · Claude Code
Any MCP client

No connector for your source yet? The included setup skills give your coding agent the patterns to wire one up itself. See the repo, or open an issue.

Try it on real-ish data

There's a live demo wired to a fictional pro sports club, the Bonita Bulldogs (ticketing, sponsorship, concessions, payroll, broadcast rights).

Connect it: in Claude (or any MCP client) open Settings → Connectors → Add custom connector, and paste this URL. The token rides in the URL, so there's no separate key to enter.

https://demo.setoku.com/i/55e767ea376aa3783cfb4653e2bf81772876b9b5c36339d9

Then ask in plain English:

"How many unique fans do we have?"
71,204deduped by normalized email, internal accounts excluded. Not the raw 92,118.
"What's our total annual revenue, and how much is media rights?"
$192Mfive systems reconciled to the same units. Media rights is the biggest line, ~$90M.
"What was ticket revenue this season?"
$46.8Mcents reconciled to dollars; refunds, exchanges, and comps excluded.
"What's our season-ticket renewal rate?"
81%across three seasons of ticketing history.
"What's our total merchandise revenue?"
It flags itmost merch is sold via Fanatics, not in this data, so it says so instead of returning a wrong total.

Read-only, audited, synthetic data.

Why we built it

We're curious.

There are plenty of AI memory stores, and plenty of data gateways. Stapling the two together, and nudging the agent to gather knowledge about the data as it goes, seemed worth trying and fun to tinker with.

We're cheap.

We wanted something that runs on one small box, works on a Pro/Max subscription or a cheap model with no added inference cost, mostly sets itself up (no field engineer to pay for), stays portable between providers, and is open source.

We and some friends wanted the same thing.

  • Hedgy: keep scaling without hiring. Debug from live logs and data, find growth levers, and match candidates and companies better with more data.
  • Baggu: give employees state-of-the-art tools. Faster onboarding, and a safe way to build against real data.
  • Tlon: experimenting with giving agents curated data to work from.
  • Sports analysts: query across data that doesn't usually sit together.
  • Academic labs: think through hypotheses against real papers, data, and drafts.

Setup help

It's open source, so you can self-host it today. If you'd rather not, we're happy to help set it up: we wire it to your data, capture the first knowledge with you, and hand it over. Email hello@setoku.com.