ARC-Challenge
Science multiple choice questions
Difficulty: Beginner | Optimizer: ContraPromptOptimizer
Grade-school science questions with four labelled choices (A–D). The model must output a single letter. The main failure mode is the model including explanation text in the answer field instead of just the letter, which causes the parser to fail.
Full Example
What the Optimizer Learns
ContraPromptOptimizer works well here because the correct and incorrect answers are
close in structure — the model often has the right reasoning but picks an adjacent
choice. Contrastive examples make the distinction between similar choices explicit,
and the optimizer learns to reinforce instructions that focus on the single best
answer rather than hedging across options.