BoolQ

Difficulty: Beginner | Optimizer: PromptGradOptimizer

Given a passage and a question, the model must answer true or false. The failure mode is subtle: borderline phrasing causes the model to hedge or output unparseable text instead of a clean boolean.

Full Example

import dspy
import vizpy
 
dspy.configure(lm=dspy.LM("openai/gpt-4o-mini"))
 
 
class BooleanQA(dspy.Signature):
    """Read the passage and answer the yes/no question."""
 
    question = dspy.InputField()
    passage = dspy.InputField()
    answer = dspy.OutputField(desc="true or false")
 
 
module = dspy.ChainOfThought(BooleanQA)
 
 
def normalize_bool(value):
    s = str(value).strip().lower()
    if s in ("true", "yes", "1"):
        return True
    if s in ("false", "no", "0"):
        return False
    return None
 
 
def metric(example, prediction):
    gold = example["gold_answer"]
    pred = normalize_bool(getattr(prediction, "answer", ""))
 
    if pred is not None and pred == gold:
        return vizpy.Score(value=1.0, is_success=True, feedback=f"Correct: {gold}")
 
    error_type = "unparseable" if pred is None else (
        "false_negative" if gold is True else "false_positive"
    )
    return vizpy.Score(
        value=0.0,
        is_success=False,
        feedback=f"Expected {gold}, got {pred}",
        error_type=error_type,
    )
 
 
train_examples = [
    {
        "question": "is windows 10 a program or operating system",
        "passage": "Windows 10 is a series of personal computer operating systems produced by Microsoft as part of its Windows NT family of operating systems.",
        "gold_answer": True,
    },
    {
        "question": "can you use a debit card for a hotel room",
        "passage": "Hotels typically require a credit card to reserve a room. However, many hotels will also accept a debit card with a Visa or MasterCard logo, though they may place a hold on your account.",
        "gold_answer": True,
    },
    {
        "question": "do you need a passport to go to puerto rico",
        "passage": "Puerto Rico is a territory of the United States. US citizens do not need a passport to travel to Puerto Rico, just as they don't need one to travel between US states.",
        "gold_answer": False,
    },
    {
        "question": "is a masters degree the same as a doctorate",
        "passage": "A master's degree is a graduate degree typically requiring 1-2 years of study. A doctorate (PhD) is a higher degree requiring 4-7 years of study and original research.",
        "gold_answer": False,
    },
    {
        "question": "is pizza originally from italy",
        "passage": "Modern pizza evolved in Naples, Italy. The classic Neapolitan pizza was created in Naples in the late 19th century.",
        "gold_answer": True,
    },
]
 
val_examples = [
    {
        "question": "can a president serve more than two terms",
        "passage": "The 22nd Amendment limits presidents to two terms. Franklin D. Roosevelt served four terms before this amendment was ratified.",
        "gold_answer": False,
    },
    {
        "question": "is pluto considered a planet",
        "passage": "In 2006, the International Astronomical Union reclassified Pluto as a 'dwarf planet' rather than a full planet.",
        "gold_answer": False,
    },
    {
        "question": "is mount everest the tallest mountain on earth",
        "passage": "Mount Everest is the highest mountain above sea level at 29,032 feet.",
        "gold_answer": True,
    },
]
 
 
optimizer = vizpy.PromptGradOptimizer(
    metric=metric,
    config=vizpy.PromptGradConfig.dev(),
)
 
optimized = optimizer.optimize(
    module=module,
    train_examples=train_examples,
    val_examples=val_examples,
)

What the Optimizer Learns

The metric uses typed error_type values — false_negative, false_positive, unparseable — which lets the optimizer distinguish between wrong answers and formatting failures. It tends to add an instruction that enforces clean true/false output and clarifies how to handle hedged phrasing in the passage.

BoolQ

Full Example

What the Optimizer Learns

On this page