Testing code you didn't write

A practical approach to verifying AI-generated code — what to test, what to skip, and how to build confidence in code you didn't type.

The first time you ship a feature you did not type, it is uncomfortable. The model wrote four files, you skimmed them, the demo worked. Now what?

Testing AI-generated code is not the same as testing code you wrote yourself. You did not build a mental model as you typed. You need a different strategy.

Start from behavior, not coverage

Forget line coverage for a minute. The first question is simple: what is this code supposed to do for a user? Write that down in one sentence. That sentence is your first test.

If the feature is "user uploads a CSV and gets a chart," your first test sends a real-ish CSV and asserts a chart appears. That single end-to-end test buys you more confidence than ten unit tests of helper functions you never read.

Three tiers worth writing

For most vibe-coded apps, three tiers of test are enough:

One end-to-end test per feature. Playwright, Cypress, or even a Vitest browser test. It clicks the buttons and checks the page. This catches "the AI rewrote something and the page is now broken."
Unit tests for the parts you do not trust. Date math, parsing user input, anything with if branches and money. Ask the model to write the tests too, but read them before you trust them.
A smoke test that runs in CI. Boot the app, hit /, expect a 200. Cheap to write, catches deploy-breaking regressions.

That's it. You do not need a 90% coverage badge to ship a side project. You need to know that the next AI-generated diff did not silently break the thing you launched last week.

Make the AI write its own tests — but read them

A useful pattern: when you ask for a feature, ask for a test alongside it. "Write a function that does X. Then write a Vitest test that covers the happy path and one edge case."

You will still read the test. AI-generated tests sometimes:

Test the implementation instead of the behavior. ("It calls helperFn once.") That test breaks the moment you refactor.
Mock the thing you are trying to verify. Be suspicious of any test that mocks the function it is testing.
Pass trivially. A test that asserts expect(true).toBe(true) is not a test.

Read the assertions. If they describe what a user would notice, keep the test. If they describe internal mechanics, rewrite them.

What to skip

Some things are not worth testing in an MVP:

Display logic with no branching ("the button says Submit")
Generated boilerplate
Anything that is one prompt away from being regenerated

Save the testing effort for things that will be expensive if they break: data writes, payments, auth, anything user-facing on the happy path.

A daily habit

Once you have tests, run them. The discipline that saves you is run tests on every AI diff before you commit. The model is fast; your tests should be fast too. If your suite takes more than 30 seconds, you will skip it, and eventually something will break in production that a test would have caught.

For the pre-merge step, our checklist for reviewing AI-generated diffs pairs well with this — tests verify behavior, the review checklist verifies everything else.

Back to Blog