How LLM Arena Works | Stevinator

LLM Arena compares the visible outputs of popular models on practical, funny, and easy-to-inspect prompts. We don't pick a winner — you vote blind for the output you like best, then see which model made it and how everyone else voted.

Fairness Rules

Every model receives the exact same prompt text.
Runs use the same temperature and token cap per showcase.
The first response is kept. No retries, no repair prompts, no follow-up.
Outputs are saved with model name, version string, timestamp, token usage, and latency when available.
We don't rank the outputs. You vote blind for the one you like, then reveal which model made it.

Security Rules

Model-generated HTML and JavaScript are untrusted code. Build outputs render inside sandboxed iframes with scripts allowed, but without same-origin privileges.

<iframe sandbox="allow-scripts" referrerpolicy="no-referrer"></iframe>

Generated HTML is not injected into the main page. SVG outputs are isolated or sanitized before publication.

What It Is Not

This is not an ELO leaderboard, a scientific benchmark, or a claim that one model is globally better than another. It is a repeatable content format for seeing real model behavior side by side.

You be the judge, not us.

Fairness Rules

Security Rules

What It Is Not