This document describes the evaluation process for R5 (20 players, 190 matchups). We're sharing this because we want your expertise — if you see errors in our methodology or results, corrections are welcome.
Before evaluating individual matchups, we generated a "deck plan" for each of the 20 decks — a 2-3 sentence summary of what the deck does, its mana sequencing, and key vulnerabilities. These plans were reviewed by a human before proceeding, because an earlier version contained errors (e.g., claiming a deck could play two lands on T1).
You can see the deck plans we used: R5 Deck Plans
For each of the 20 decks, we launched a Claude AI agent with:
Each agent produced per-direction verdicts (on the play and on the draw) plus a 1-2 sentence narrative for each direction. This means every matchup was evaluated twice — once from each side.
The prompt instructs agents to think step by step about mana sequencing, interaction timing, and combat math. Both players play optimally: play to win if possible, force a draw if not, accept the loss if neither.
We compared the two agents' verdicts for each matchup. If Agent A (evaluating Deck X) says "X wins on the play" and Agent B (evaluating Deck Y) says "Y loses on the play," they agree. If they disagree, the matchup is flagged.
We identified several categories of errors:
We re-ran 5 of the most problematic decks with corrected deck plans and additional guidance in the prompt. This reduced disagreements from 64 to 40.
We also ran 9 targeted re-evaluations for specific matchups where deck plan errors were most likely to have changed the outcome. 3 of these flipped the result.
The remaining 40 disagreements were resolved by mod review. For each, we examined both agents' reasoning and narratives, considered the card interactions, and picked the correct result.
These 39 matchups (plus 4 historical) have a placeholder narrative: "Mod-resolved outcome, no narrative. Please argue your case in the thread."
Here's a simplified version of what each agent receives. The actual prompt is ~27,000 characters and includes full Oracle text for all cards.
You are evaluating Three Card Blind (3CB) matchups for one deck against all opponents. ## 3CB Rules - 3-card hand, no library. Drawing from empty library does NOT cause a loss. - Normal Magic rules. Starting life: 20. - Both players play optimally (3 pts win, 1 draw, 0 loss). - Coin flips/dice = worst outcome for controller. - Evaluate EACH DIRECTION independently. - WL (each wins on play) ≠ DD (neither can win). ## Your Deck (@handle) [Full Oracle text + deck plan] ## Opponent: @handle [Full Oracle text + deck plan] ... (19 opponents) ## Instructions For each opponent: 1. On-the-play verdict + narrative (you go first) 2. On-the-draw verdict + narrative (opponent goes first) Think step by step about mana sequencing, interaction timing, and combat math. VERDICT: P0_WINS | P1_WINS | DRAW NARRATIVE: [1-2 sentences, under 200 chars]
Reply to the results thread on Bluesky with your correction. Include:
We'll re-evaluate and update the dashboard. All corrections are tracked with an audit trail.