
Crit - AI Shipped Product
AI Design Critiques
That Actually Know UX
Designers at early-stage companies don't have access to a senior design voice.
Crit is the on-demand critique partner that fills that gap; AI-powered design
critiques based on validated UX principles.
Role
Solo - design, prompt
engineering & build
Stack
Claude (product thinking +
prompt engineering +
prototyping) → Lovable
(React/Vite build) → Vercel
(deployment) + Microsoft
Clarity + Tally
Year
2026
Duration
Concept to shipped in 3 weeks,
solo
4 min read
Live at crit-six.vercel.appd - upload a design, get a critique grounded in Nielsen's
heuristics and Gestalt principles.
View Site
↗
The Constraint
The obvious version of this
product was the one that fails
Language-based critique is imprecise; "improve the hierarchy"
means nothing without seeing what the designer sees.
The obvious version of this product is a chat interface where you describe your
design and get feedback. That version fails because language-based critique is
imprecise; "improve the hierarchy" means nothing without seeing what the
designer sees.
The harder problem was designing an output contract strict enough that the AI
returned structured, actionable critique every time; not freeform prose that
varied unpredictably between runs.

Structured critique output - scored, labelled, reproducible
3 Key Decisions
Three calls that made the critique trustworthy
Each decision traded an easy default for the harder choice
that made the output defensible, reproducible, and stable.
01
Encoding methodology into the prompt, not the persona.
The first version told the AI to act like a senior designer. The output was confident but
inconsistent; good one run, generic the next. The alternative was grounding each evaluation
dimension in a named framework: Gestalt for hierarchy, WCAG 2.1 AA for accessibility,
Nielsen's heuristics for interaction. This made the critique defensible and reproducible, which
matters specifically for a product where users need to trust the output enough to act on it.
02
One free critique before the paywall, not a time-based trial.
A 3-day trial works for tools people use daily. Designers don't critique designs every day;
they do it at specific project moments. A time-based trial would expire before most users
had a genuine reason to run a second critique. Usage-gating at one critique means the
paywall appears at the exact moment of demonstrated value, not an arbitrary deadline.
03
Structured JSON output contract over natural language
parsing.
The temptation was to let the model respond conversationally and parse it downstream. The
failure mode is obvious in hindsight: any variance in response format breaks the UI. Defining
an explicit JSON schema in the system prompt and validating every field on return —
clamping scores, sanitising labels, slicing arrays — made the interface stable regardless of
model variance. The tradeoff is prompt length and token cost per call, which is acceptable at
this price point.
The AI-Specific Insight
The prompt work that mattered wasn't about tone
“
When AI output feeds directly into
a UI, treat the response schema as
an API contract, not a suggestion.
The prompt engineering work that mattered had nothing to do with tone or persona; it was
making the output structurally deterministic while keeping the qualitative reasoning sharp.
The failure mode I hit early was conflating those two problems: I kept rewriting the
persona to get better reasoning, when the actual issue was that inconsistent output
format was making good reasoning invisible in the UI.
Separating the contract (what shape the response takes) from the methodology (what
framework the reasoning applies) fixed both.
Outcome
Early signals from a product still in its first
weeks live
3 wks
Concept → Shipped
Solo, end to end: product thinking,
prompt engineering, build, and
deployment.
Re-run
The Core Loop Metric
Early Clarity sessions show users
reading the full output and returning to
the upload screen; the behaviour the
product depends on.
Own
work
Strongest Pull
The weekly design-challenge card drew
almost no engagement next to users
bringing their own real designs.
The core loop re-run rate; users who upload a revised design after reading
their critique; is the metric the entire product depends on. Early sessions in
Clarity show users reading the full output and returning to the upload screen,
which is the behaviour the product needs to be worth a subscription.
The output quality question ("did this feel like a senior designer or generic AI?") is
the signal I'm watching most carefully. Initial responses suggest the methodology-
grounded prompt is doing real work; users are citing specific issues from the
critique in follow-up DMs, not just acknowledging that feedback was given.
The thing that didn't work as expected: the weekly design challenge card, which I
built to drive habitual return visits, gets almost no engagement compared to
users who come with their own work. The pull of "I need feedback on this specific
thing" is stronger than any content-driven hook I can create. That's a meaningful
finding about the use case.
One Thing I'd Do Differently
Watch real designers before
writing a single line of prompt
Run five live critique sessions with real designers before
writing the system prompt; watching where they felt the
output missed would have saved two full prompt rewrites.
More Case Studies
Keep exploring the work
MATS
Independent · AI Shipped Product
A post-training logging system that structures messy brain dumps into a queryable backlog using an AI chat interface. For martial artists.
Tatami
Independent · AI Shipped Product
BJJ practitioners have no lightweight tool for tracking training consistency; existing apps are either bloated with competition analytics or too generic to reflect how the sport is actually practiced.
ALONSO ROSADO
AI-first Product Designer shipping full
products, end to end.
© 2026 Alonso Rosado. All rights reserved.