Big-Bench Hard
27 challenging reasoning subtasks
Difficulty: Advanced | Optimizer: PromptGradOptimizer
BBH is a collection of 27 reasoning tasks — boolean logic, object tracking, word
sorting, date arithmetic, and more. Each example includes a task_description field
so the same signature handles all subtasks. The optimizer must learn instructions that
generalize across task types rather than overfitting to one.
Full Example
What the Optimizer Learns
BBH's diversity means errors come from different failure modes in different subtasks. The optimizer accumulates rules that address the most common cross-task patterns — typically: enforce the exact answer format (no surrounding text), be literal about sorting/counting rather than approximating, and apply logical operators strictly.