Vizpy
API Reference

PromptGradOptimizer

API reference for the gradient-based prompt optimizer

PromptGrad optimizes your DSPy module by analyzing failure patterns across batches of examples and learning targeted correction rules. It freezes the module's base instructions and accumulates validated improvements as a separate layer, keeping the original prompt intact.

You can provide any base prompt — extract it from the module, pass a custom instruction string, or supply instructions generated by any external optimizer. PromptGrad layers its corrections on top of whatever base you provide.

API Signature

from vizpy import PromptGradOptimizer
from vizpy.optimizers.core import PromptGradConfig, Score
 
optimizer = PromptGradOptimizer(
    metric=my_metric,                          # Required
    config=PromptGradConfig(...),              # Optional
    base_prompt_source="module",               # Optional: "module" or a custom instruction string
    example_formatter=my_formatter,            # Optional
    rule_acceptor=my_acceptor,                 # Optional
)
 
optimized_module = optimizer.optimize(
    module=my_dspy_module,
    train_examples=train_data,
    val_examples=val_data,                     # Optional
)

Constructor Parameters

ParameterTypeDescription
metricMetricRequired. Callable that takes (example: dict, prediction) and returns a Score.
configPromptGradConfigConfiguration dataclass with all hyperparameters.
base_prompt_sourcestrHow to obtain base instructions. "module" extracts from the module's existing instructions. Any other string is used directly as the base instructions.
example_formatterExampleFormatterFormats examples for internal optimization. Must implement format_for_gradient(example, score) -> str and format_context(example) -> str.
rule_acceptorCallable[[ParsedRule], bool]Custom filter for proposed improvements. Default accepts additions freely and removals only with high confidence.

optimize() Method

def optimize(
    module: dspy.Module,
    train_examples: list[dict],
    val_examples: Optional[list[dict]] = None,
) -> dspy.Module
ParameterTypeDescription
moduledspy.ModuleThe DSPy module to optimize.
train_exampleslist[dict]Training examples for optimization.
val_exampleslist[dict]Validation examples. If None, a holdout is split from train_examples.

Returns: An optimized dspy.Module with base instructions + learned corrections.

Important Parameters (PromptGradConfig)

These are the parameters you're most likely to tune.

ParameterTypeDefaultDescription
epochsint3Number of training epochs. Each epoch samples multiple batches and accumulates improvements.
batches_per_epochint3Batches per epoch. More batches = more diverse improvement proposals. Total optimization calls = epochs * batches_per_epoch.
batch_sizeint10Examples per batch. Larger batches produce more representative failure analysis but cost more.
validate_rulesboolTrueStrongly recommended. Tests each proposed improvement individually before applying. Prevents harmful changes from accumulating.
gradient_consensusboolTrueRecommended. Uses multiple analysis passes for robustness. Reduces variance from single-pass noise.
base_prompt_sourcestr"module""module" extracts from the module. Any other string is used directly as custom base instructions. Also configurable via the constructor parameter.

All Parameters (PromptGradConfig)

Core Loop

ParameterTypeDefaultDescription
epochsint3Training epochs.
batches_per_epochint3Batches per epoch.
batch_sizeint10Examples per batch.
max_workersint10Thread pool size for parallel evaluation.
num_val_runsint1Validation runs per epoch, averaged for stability. Increase to 3 for stochastic metrics.
verboseboolTruePrint progress logs.
seedintNoneSeed for deterministic sampling. None = random.
val_holdout_fractionfloat0.2Holdout fraction when val_examples is not provided.

Rule Management

ParameterTypeDefaultDescription
max_local_rulesint15Maximum improvements before merging or pruning. Hard cap to prevent prompt flooding.
merge_thresholdint8When improvement count exceeds this, similar improvements are consolidated.
early_stop_patienceint2Stop after this many consecutive epochs without improvement.
rollback_thresholdfloat0.05If score drops by more than this below baseline, rollback to the previous best state.
failure_thresholdfloat0.5Examples scoring below this are treated as failures during analysis.

Rule Validation

ParameterTypeDefaultDescription
validate_rulesboolTrueValidate each improvement individually before applying.
rule_validation_examplesint15Validation subset size.
rule_validation_min_deltafloat0.01Minimum improvement required to accept a change.
rule_validation_modestr"delta""delta": Fixed threshold. "auto": Confidence-interval-based testing (accepts if the lower bound of a 95% CI is > 0).
rule_selection_modestr"default""default": Per-rule threshold acceptance. "greedy": Forward selection — keeps each improvement only if it raises overall score. More expensive but produces minimal, high-quality sets.
rule_pruning_strategystr"rollback""rollback": Revert to last best state when overloaded. "delta": Keep top improvements by measured quality.

Conditional Rules

ParameterTypeDefaultDescription
conditional_rulesboolFalseWhen enabled, improvements are applied selectively per example based on relevance. Prevents irrelevant corrections from interfering with examples they don't apply to.
rule_trigger_top_kint5Maximum improvements applied per example when conditional mode is active.
rule_trigger_min_overlapfloat0.1Minimum relevance threshold for applying an improvement to a given example.

Sampling

ParameterTypeDefaultDescription
stratified_samplingboolTrueEnsure batches cover diverse error types by sampling proportionally from each failure category.

Gradient Consensus

ParameterTypeDefaultDescription
gradient_consensusboolTrueUse multiple analysis passes at varying temperatures for robustness.
gradient_samplesint3Number of analysis passes for consensus.
gradient_temperature_basefloat0.7Base temperature for the first pass.
gradient_temperature_stepfloat0.15Temperature increment per additional pass.
gradient_max_tokensint4000Max tokens for analysis responses.
gradient_failure_display_limitint5Maximum failure examples analyzed per batch.

Base Prompt Source

ParameterTypeDefaultDescription
base_prompt_sourcestr"module"Source for base instructions. "module" extracts from the module. Any other string is used directly as custom base instructions.

Usage Example

Basic Usage

import dspy
from vizpy import PromptGradOptimizer
from vizpy.optimizers.core import PromptGradConfig, Score
 
# 1. Define your metric
def my_metric(example: dict, prediction) -> Score:
    correct = prediction.answer.strip() == example["gold"].strip()
    return Score(
        value=1.0 if correct else 0.0,
        is_success=correct,
        feedback="Correct" if correct else f"Expected '{example['gold']}', got '{prediction.answer}'",
        error_type="" if correct else "wrong_answer",
    )
 
# 2. Configure
config = PromptGradConfig(
    epochs=3,
    batches_per_epoch=3,
    batch_size=10,
    validate_rules=True,
    gradient_consensus=True,
)
 
optimizer = PromptGradOptimizer(metric=my_metric, config=config)
 
# 3. Optimize
lm = dspy.LM("openai/gpt-4o-mini", temperature=0.7)
dspy.configure(lm=lm)
 
module = dspy.ChainOfThought("question -> answer")
optimized = optimizer.optimize(module, train_examples, val_examples)

With Custom Base Instructions

custom_instructions = "You are an expert math tutor. Always show your work step by step."
 
optimizer = PromptGradOptimizer(
    metric=my_metric,
    config=PromptGradConfig(epochs=3, validate_rules=True, gradient_consensus=True),
    base_prompt_source=custom_instructions,
)
 
optimized = optimizer.optimize(module, train_examples, val_examples)

With Conditional Rules

config = PromptGradConfig(
    epochs=3,
    batches_per_epoch=3,
    batch_size=10,
    validate_rules=True,
    conditional_rules=True,       # Per-example relevance filtering
    rule_trigger_top_k=5,
    rule_trigger_min_overlap=0.1,
)
 
optimizer = PromptGradOptimizer(metric=my_metric, config=config)
optimized = optimizer.optimize(module, train_examples, val_examples)

With Greedy Selection

config = PromptGradConfig(
    epochs=3,
    batches_per_epoch=3,
    batch_size=10,
    validate_rules=True,
    rule_selection_mode="greedy",  # Keep only improvements that raise score
    rule_pruning_strategy="delta",
)
 
optimizer = PromptGradOptimizer(metric=my_metric, config=config)
optimized = optimizer.optimize(module, train_examples, val_examples)

Metric Protocol

Your metric must return a Score:

from vizpy.optimizers.core import Score
 
def my_metric(example: dict, prediction) -> Score:
    """
    Args:
        example: The training example dict (same keys you pass to the module).
        prediction: The dspy.Prediction returned by the module.
 
    Returns:
        Score with:
            value: float between 0 and 1
            is_success: bool (whether this meets your success threshold)
            feedback: str (human-readable explanation of the score)
            error_type: str (optional categorization — enables stratified
                        sampling and diversity-aware optimization)
    """
    ...

The error_type field is especially valuable for PromptGrad: it enables stratified batch sampling, ensuring the optimizer analyzes diverse failure modes rather than oversampling the most common error.