PromptGradOptimizer

PromptGrad optimizes your DSPy module by analyzing failure patterns across batches of examples and learning targeted correction rules. It freezes the module's base instructions and accumulates validated improvements as a separate layer, keeping the original prompt intact.

You can provide any base prompt — extract it from the module, pass a custom instruction string, or supply instructions generated by any external optimizer. PromptGrad layers its corrections on top of whatever base you provide.

API Signature

from vizpy import PromptGradOptimizer
from vizpy.optimizers.core import PromptGradConfig, Score
 
optimizer = PromptGradOptimizer(
    metric=my_metric,                          # Required
    config=PromptGradConfig(...),              # Optional
    base_prompt_source="module",               # Optional: "module" or a custom instruction string
    example_formatter=my_formatter,            # Optional
    rule_acceptor=my_acceptor,                 # Optional
)
 
optimized_module = optimizer.optimize(
    module=my_dspy_module,
    train_examples=train_data,
    val_examples=val_data,                     # Optional
)

Constructor Parameters

Parameter	Type	Description
`metric`	`Metric`	Required. Callable that takes `(example: dict, prediction)` and returns a `Score`.
`config`	`PromptGradConfig`	Configuration dataclass with all hyperparameters.
`base_prompt_source`	`str`	How to obtain base instructions. `"module"` extracts from the module's existing instructions. Any other string is used directly as the base instructions.
`example_formatter`	`ExampleFormatter`	Formats examples for internal optimization. Must implement `format_for_gradient(example, score) -> str` and `format_context(example) -> str`.
`rule_acceptor`	`Callable[[ParsedRule], bool]`	Custom filter for proposed improvements. Default accepts additions freely and removals only with high confidence.

`optimize()` Method

def optimize(
    module: dspy.Module,
    train_examples: list[dict],
    val_examples: Optional[list[dict]] = None,
) -> dspy.Module

Parameter	Type	Description
`module`	`dspy.Module`	The DSPy module to optimize.
`train_examples`	`list[dict]`	Training examples for optimization.
`val_examples`	`list[dict]`	Validation examples. If `None`, a holdout is split from `train_examples`.

Returns: An optimized dspy.Module with base instructions + learned corrections.

Important Parameters (`PromptGradConfig`)

These are the parameters you're most likely to tune.

Parameter	Type	Default	Description
`epochs`	`int`	`3`	Number of training epochs. Each epoch samples multiple batches and accumulates improvements.
`batches_per_epoch`	`int`	`3`	Batches per epoch. More batches = more diverse improvement proposals. Total optimization calls = `epochs * batches_per_epoch`.
`batch_size`	`int`	`10`	Examples per batch. Larger batches produce more representative failure analysis but cost more.
`validate_rules`	`bool`	`True`	Strongly recommended. Tests each proposed improvement individually before applying. Prevents harmful changes from accumulating.
`gradient_consensus`	`bool`	`True`	Recommended. Uses multiple analysis passes for robustness. Reduces variance from single-pass noise.
`base_prompt_source`	`str`	`"module"`	`"module"` extracts from the module. Any other string is used directly as custom base instructions. Also configurable via the constructor parameter.

All Parameters (`PromptGradConfig`)

Core Loop

Parameter	Type	Default	Description
`epochs`	`int`	`3`	Training epochs.
`batches_per_epoch`	`int`	`3`	Batches per epoch.
`batch_size`	`int`	`10`	Examples per batch.
`max_workers`	`int`	`10`	Thread pool size for parallel evaluation.
`num_val_runs`	`int`	`1`	Validation runs per epoch, averaged for stability. Increase to 3 for stochastic metrics.
`verbose`	`bool`	`True`	Print progress logs.
`seed`	`int`	`None`	Seed for deterministic sampling. `None` = random.
`val_holdout_fraction`	`float`	`0.2`	Holdout fraction when `val_examples` is not provided.

Rule Management

Parameter	Type	Default	Description
`max_local_rules`	`int`	`15`	Maximum improvements before merging or pruning. Hard cap to prevent prompt flooding.
`merge_threshold`	`int`	`8`	When improvement count exceeds this, similar improvements are consolidated.
`early_stop_patience`	`int`	`2`	Stop after this many consecutive epochs without improvement.
`rollback_threshold`	`float`	`0.05`	If score drops by more than this below baseline, rollback to the previous best state.
`failure_threshold`	`float`	`0.5`	Examples scoring below this are treated as failures during analysis.

Rule Validation

Parameter	Type	Default	Description
`validate_rules`	`bool`	`True`	Validate each improvement individually before applying.
`rule_validation_examples`	`int`	`15`	Validation subset size.
`rule_validation_min_delta`	`float`	`0.01`	Minimum improvement required to accept a change.
`rule_validation_mode`	`str`	`"delta"`	`"delta"`: Fixed threshold. `"auto"`: Confidence-interval-based testing (accepts if the lower bound of a 95% CI is > 0).
`rule_selection_mode`	`str`	`"default"`	`"default"`: Per-rule threshold acceptance. `"greedy"`: Forward selection — keeps each improvement only if it raises overall score. More expensive but produces minimal, high-quality sets.
`rule_pruning_strategy`	`str`	`"rollback"`	`"rollback"`: Revert to last best state when overloaded. `"delta"`: Keep top improvements by measured quality.

Conditional Rules

Parameter	Type	Default	Description
`conditional_rules`	`bool`	`False`	When enabled, improvements are applied selectively per example based on relevance. Prevents irrelevant corrections from interfering with examples they don't apply to.
`rule_trigger_top_k`	`int`	`5`	Maximum improvements applied per example when conditional mode is active.
`rule_trigger_min_overlap`	`float`	`0.1`	Minimum relevance threshold for applying an improvement to a given example.

Sampling

Parameter	Type	Default	Description
`stratified_sampling`	`bool`	`True`	Ensure batches cover diverse error types by sampling proportionally from each failure category.

Gradient Consensus

Parameter	Type	Default	Description
`gradient_consensus`	`bool`	`True`	Use multiple analysis passes at varying temperatures for robustness.
`gradient_samples`	`int`	`3`	Number of analysis passes for consensus.
`gradient_temperature_base`	`float`	`0.7`	Base temperature for the first pass.
`gradient_temperature_step`	`float`	`0.15`	Temperature increment per additional pass.
`gradient_max_tokens`	`int`	`4000`	Max tokens for analysis responses.
`gradient_failure_display_limit`	`int`	`5`	Maximum failure examples analyzed per batch.

Base Prompt Source

Parameter	Type	Default	Description
`base_prompt_source`	`str`	`"module"`	Source for base instructions. `"module"` extracts from the module. Any other string is used directly as custom base instructions.

Usage Example

Basic Usage

import dspy
from vizpy import PromptGradOptimizer
from vizpy.optimizers.core import PromptGradConfig, Score
 
# 1. Define your metric
def my_metric(example: dict, prediction) -> Score:
    correct = prediction.answer.strip() == example["gold"].strip()
    return Score(
        value=1.0 if correct else 0.0,
        is_success=correct,
        feedback="Correct" if correct else f"Expected '{example['gold']}', got '{prediction.answer}'",
        error_type="" if correct else "wrong_answer",
    )
 
# 2. Configure
config = PromptGradConfig(
    epochs=3,
    batches_per_epoch=3,
    batch_size=10,
    validate_rules=True,
    gradient_consensus=True,
)
 
optimizer = PromptGradOptimizer(metric=my_metric, config=config)
 
# 3. Optimize
lm = dspy.LM("openai/gpt-4o-mini", temperature=0.7)
dspy.configure(lm=lm)
 
module = dspy.ChainOfThought("question -> answer")
optimized = optimizer.optimize(module, train_examples, val_examples)

With Custom Base Instructions

custom_instructions = "You are an expert math tutor. Always show your work step by step."
 
optimizer = PromptGradOptimizer(
    metric=my_metric,
    config=PromptGradConfig(epochs=3, validate_rules=True, gradient_consensus=True),
    base_prompt_source=custom_instructions,
)
 
optimized = optimizer.optimize(module, train_examples, val_examples)

With Conditional Rules

config = PromptGradConfig(
    epochs=3,
    batches_per_epoch=3,
    batch_size=10,
    validate_rules=True,
    conditional_rules=True,       # Per-example relevance filtering
    rule_trigger_top_k=5,
    rule_trigger_min_overlap=0.1,
)
 
optimizer = PromptGradOptimizer(metric=my_metric, config=config)
optimized = optimizer.optimize(module, train_examples, val_examples)

With Greedy Selection

config = PromptGradConfig(
    epochs=3,
    batches_per_epoch=3,
    batch_size=10,
    validate_rules=True,
    rule_selection_mode="greedy",  # Keep only improvements that raise score
    rule_pruning_strategy="delta",
)
 
optimizer = PromptGradOptimizer(metric=my_metric, config=config)
optimized = optimizer.optimize(module, train_examples, val_examples)

Metric Protocol

Your metric must return a Score:

from vizpy.optimizers.core import Score
 
def my_metric(example: dict, prediction) -> Score:
    """
    Args:
        example: The training example dict (same keys you pass to the module).
        prediction: The dspy.Prediction returned by the module.
 
    Returns:
        Score with:
            value: float between 0 and 1
            is_success: bool (whether this meets your success threshold)
            feedback: str (human-readable explanation of the score)
            error_type: str (optional categorization — enables stratified
                        sampling and diversity-aware optimization)
    """
    ...

The error_type field is especially valuable for PromptGrad: it enables stratified batch sampling, ensuring the optimizer analyzes diverse failure modes rather than oversampling the most common error.

PromptGradOptimizer

On this page