About

The problem with
AI-assisted code review

Copilot, Cursor, and their siblings have made it trivially easy to write code you don't fully understand. The PR description sounds confident. The diff looks clean. But when a reviewer asks “why did you pick this approach?” — the answer is often silence.

What it is

A Turing Test for your own pull request

PRs.md is a micro-SaaS that gives developers a way to prove they actually read and understood their own changes. You paste a GitHub PR link. An LLM — your LLM, your key — reads the diff and writes three targeted questions about the specific code you changed. Not general knowledge. Not trivia. The actual decisions in your actual PR.

You answer in under three minutes. No copy-paste. No AI assist. The same model grades your answers and issues a signed proof badge if you pass.

The badge links to a permanent proof page with your full Q&A. Your reviewers can click through and read what you wrote. It's not magic — it's accountability.

How it works

Under the hood

Diff fetch

PRs.md calls the GitHub API to pull the raw diff for your PR. We request no repo permissions — the diff is fetched via the public API. Only public repositories are supported: even with BYOK, the generated questions and answers are stored on our servers to power the proof page, so private code would be exposed through those records.

Question generation

The diff is sent to your chosen LLM (OpenAI, Anthropic, or Gemini) with a structured prompt asking it to produce three specific, diff-grounded questions — one of which is a hallucination trap, a question about something that doesn't exist in the PR. Your API key is used; we receive nothing.

Timed quiz

You get three minutes to answer. The quiz UI disables copy-paste to reduce the temptation to feed answers back to an AI. The timer is enforced server-side — late submissions aren't accepted.

LLM grading

Your answers go back to the same model with a grading prompt. It scores each answer 0–100 and writes feedback. Scores are clamped and re-validated server-side before being persisted — the client can't inflate them.

Proof issuance

A passing score (threshold configurable per-team in the future) creates a permanent record in the database. The proof page and SVG badge are generated from that record. Both are public and immutable.

Principles

Built on a few firm opinions

Bring your own key

We have no revenue model tied to LLM usage. Your key, your cost, your provider. We don't see your requests. This keeps the incentives clean.

Open source, full stop

Every line is on GitHub. Fork it, audit it, self-host it. A trust claim that you can't verify is just marketing.

No telemetry on your code

Diffs are fetched at challenge time and never written to disk beyond the active request. We store Q&A pairs and scores — not source code.

Proof, not theater

A badge that links to a real Q&A is harder to fake than a description that says "I reviewed this." We're not trying to eliminate all bad actors — we're raising the floor.

FAQ

Common questions

Does PRs.md store my code or diff?

No. The diff is fetched from GitHub at challenge time, sent to your LLM provider for question generation, and then discarded. We store the questions, your answers, and your score — nothing from the raw diff.

Who can see my proof page?

Proof pages are intentionally public. Anyone with the URL can read your Q&A and score. That's the point — reviewers need to be able to verify the badge is real.

What happens if I fail?

You can retry up to 5 times per challenge. Every attempt is recorded, so the proof page shows your best result alongside the attempt count. There's no penalty for failing — only for not trying.

Which LLM providers are supported?

OpenAI, Anthropic, and Google Gemini. You bring your own key — we never proxy requests through our own credits. One key per provider can be saved per account.

Is it possible to cheat?

We don't claim this is cheat-proof. Copy-paste is disabled during the quiz and there's a timed window, but a determined bad actor can work around that. The badge is a signal, not a cryptographic guarantee. The trap question helps catch purely AI-generated answers.

Can I self-host PRs.md?

Yes. The full source is on GitHub. You'll need a Postgres database (Neon works on the free tier), a GitHub OAuth app, and a deployment target. The .env.example in the repo documents every required variable.

How are answers graded?

The same LLM that generated the questions grades the answers. It scores each response 0–100 based on correctness, specificity, and whether the answer reflects knowledge of the actual diff rather than general domain knowledge. Your overall score is the average of the three.

Read the guide or just try it