Methodology

This is an editorial showcase, not a benchmark lab.

LLM Arena compares the visible outputs of popular models on practical, funny, and easy-to-inspect prompts. The aim is to show differences you can feel immediately.

Fairness Rules

  • Every model receives the exact same prompt text.
  • Runs use the same temperature and token cap per showcase.
  • The first response is kept. No retries, no repair prompts, no follow-up.
  • Outputs are saved with model name, version string, timestamp, token usage, and latency when available.
  • Build artifacts are reviewed manually before publishing.

Security Rules

Model-generated HTML and JavaScript are untrusted code. Build outputs render inside sandboxed iframes with scripts allowed, but without same-origin privileges.

<iframe sandbox="allow-scripts" referrerpolicy="no-referrer"></iframe>

Generated HTML is not injected into the main page. SVG outputs are isolated or sanitized before publication.

What It Is Not

This is not an ELO leaderboard, a scientific benchmark, or a claim that one model is globally better than another. It is a repeatable content format for seeing real model behavior side by side.