Vizpy
API Reference

ContraPromptOptimizer

API reference for the contrastive prompt optimizer

ContraPrompt is a contrastive prompt optimizer that automatically improves your DSPy module's instructions by learning from its own successes and failures on your training data. It is domain-agnostic — you supply a Metric that returns a Score object, and the optimizer handles everything else.

ContraPrompt works iteratively: each iteration evaluates your module, identifies where it can improve, and refines the instructions accordingly. It includes built-in early stopping, rule validation, and safeguards against prompt degradation.

API Signature

from vizpy import ContraPromptOptimizer
from vizpy.optimizers.core import ContraPromptConfig, Score
 
optimizer = ContraPromptOptimizer(
    metric=my_metric,                          # Required
    config=ContraPromptConfig(...),             # Optional, all defaults are reasonable
    feedback_generator=my_feedback_gen,         # Optional
    example_formatter=my_formatter,             # Optional
)
 
optimized_module = optimizer.optimize(
    module=my_dspy_module,
    train_examples=train_data,
    val_examples=val_data,                      # Optional, splits from train if omitted
)

Constructor Parameters

ParameterTypeDescription
metricMetricRequired. Callable that takes (example: dict, prediction) and returns a Score. This is the only thing you must implement.
configContraPromptConfigConfiguration dataclass containing all hyperparameters. Defaults are tuned for general use. See the parameter reference below.
feedback_generatorFeedbackGeneratorCallable that takes (example, attempts) and returns feedback text used during optimization. Default shows score + feedback from each prior attempt.
example_formatterExampleFormatterFormats examples for internal optimization prompts. Must implement format_for_gradient(example, score) -> str and format_context(example) -> str. Default uses str(example)[:500].

optimize() Method

def optimize(
    module: dspy.Module,
    train_examples: list[dict],
    val_examples: Optional[list[dict]] = None,
) -> dspy.Module
ParameterTypeDescription
moduledspy.ModuleThe DSPy module to optimize. Must have a .signature with .instructions.
train_exampleslist[dict]Training examples passed as keyword arguments to the module.
val_exampleslist[dict]Validation examples for scoring. If None, a holdout fraction is split from train_examples automatically.

Returns: An optimized dspy.Module with improved instructions (or the original module if no improvement was found).

Important Parameters (ContraPromptConfig)

These are the parameters you're most likely to tune.

ParameterTypeDefaultDescription
max_iterationsint5Number of optimization iterations. More iterations = more opportunity for improvement, but diminishing returns after 3-5.
max_attemptsint3Retries per example during optimization. The optimizer evaluates each example multiple times to learn from variation. Higher values improve signal quality but cost more LLM calls.
validate_rulesboolTrueStrongly recommended to keep enabled. Validates each learned improvement individually before applying it. Prevents harmful changes from degrading the prompt.
max_rulesint8Maximum number of improvements accumulated across iterations. Prevents the prompt from becoming overloaded.
patienceint3Early stop after this many iterations without improvement on validation.

All Parameters (ContraPromptConfig)

Core Loop

ParameterTypeDefaultDescription
max_attemptsint3Retries per example during optimization.
max_iterationsint5Optimization iterations.
min_improvementfloat0.02Minimum score improvement required to consider a signal meaningful.
patienceint3Early stop after this many iterations without improvement.
max_workersint10Thread pool size for parallel evaluation.
num_val_runsint1Validation runs per iteration, averaged for noise reduction. Set to 3 if your metric is stochastic.
verboseboolTruePrint progress logs.
seedintNoneSeed for deterministic train/val splitting and subsampling. None = random.
val_holdout_fractionfloat0.2Fraction of training data held out for validation when val_examples is not provided.

Signal Extraction

ParameterTypeDefaultDescription
demonstrations_kint0Number of improvement signals to extract per iteration. 0 = auto-scale with training size.
auto_k_minint3Lower bound when auto-scaling.
auto_k_maxint10Upper bound when auto-scaling.
auto_k_divisorint10Auto-scaling divisor: k = len(train) // auto_k_divisor.

Rule Management

ParameterTypeDefaultDescription
max_rulesint8Maximum accumulated improvements. Oldest or weakest are pruned when this limit is hit.
tip_modestr"synthesis""synthesis": Consolidates improvements into concise guidance. "injection": Applies improvements directly.

Rule Validation

ParameterTypeDefaultDescription
validate_rulesboolTrueValidate each improvement individually before applying.
rule_validation_examplesint15Number of examples used for validation.
rule_validation_min_deltafloat-0.02Minimum acceptable score delta. Negative value = soft threshold (reject only actively harmful changes).

Rule Selection

ParameterTypeDefaultDescription
rule_selectionstr"delta""delta": Rank by measured improvement. "preference": Use preference-based scoring (more expensive but higher quality).
rule_coverage_weightfloat0.0Extra weight for covering diverse error types when ranking. 0 = disabled.
preference_betafloat5.0Scaling factor for preference scoring. Only used when rule_selection="preference".
preference_examplesint0Examples for preference scoring. 0 = reuse rule_validation_examples.

Advanced Features

ParameterTypeDefaultDescription
progressive_rulesboolFalseMaintain improvements across iterations with periodic re-validation and pruning. Useful for long runs (5+ iterations).
contrastive_demosboolFalseInclude concrete examples of improved reasoning in the optimized prompt. Requires chain-of-thought modules (e.g., dspy.ChainOfThought).
max_contrastive_demosint2Maximum demo examples to include.
tiered_miningboolFalseExtract more granular improvement signals (incremental gains, not just worst-to-best).
failure_focusedboolFalseAfter iteration 1, focus on the hardest examples (those the model is inconsistent on).
failure_focus_floorfloat0.3Never reduce the training set below this fraction when failure-focused.
failure_score_thresholdfloat0.5Examples scoring below this are treated as failures.
max_failure_analysis_examplesint10Maximum failure examples analyzed per iteration.
subsample_fractionfloat0.6Fraction of training data used per iteration. Subsampling adds diversity across iterations.
min_subsample_sizeint10Minimum training examples per iteration.
diversity_miningboolFalseUse embedding-based clustering to ensure diverse error coverage. Requires sentence-transformers and scikit-learn.
diversity_embedding_modelstr"sentence-transformers/all-MiniLM-L6-v2"Embedding model for diversity-aware optimization.

Usage Example

Basic Usage

import dspy
from vizpy import ContraPromptOptimizer
from vizpy.optimizers.core import ContraPromptConfig, Score
 
# 1. Define your metric
def my_metric(example: dict, prediction) -> Score:
    correct = prediction.answer.strip() == example["gold"].strip()
    return Score(
        value=1.0 if correct else 0.0,
        is_success=correct,
        feedback="Correct" if correct else f"Expected '{example['gold']}', got '{prediction.answer}'",
        error_type="" if correct else "wrong_answer",
    )
 
# 2. Configure and create optimizer
config = ContraPromptConfig(
    max_iterations=5,
    max_attempts=3,
    validate_rules=True,
)
 
optimizer = ContraPromptOptimizer(metric=my_metric, config=config)
 
# 3. Optimize
lm = dspy.LM("openai/gpt-4o-mini", temperature=0.7)
dspy.configure(lm=lm)
 
module = dspy.ChainOfThought("question -> answer")
optimized = optimizer.optimize(module, train_examples, val_examples)

With Advanced Features

config = ContraPromptConfig(
    max_iterations=7,
    max_attempts=3,
    validate_rules=True,
    progressive_rules=True,       # Persistent improvements across iterations
    contrastive_demos=True,       # Include reasoning examples in prompt
    failure_focused=True,         # Focus on hardest examples after iter 1
    diversity_mining=True,        # Ensure diverse error coverage
    rule_selection="preference",  # Preference-based ranking
)
 
optimizer = ContraPromptOptimizer(metric=my_metric, config=config)
optimized = optimizer.optimize(module, train_examples, val_examples)

Metric Protocol

Your metric must conform to this protocol:

from vizpy.optimizers.core import Score
 
def my_metric(example: dict, prediction) -> Score:
    """
    Args:
        example: The training example dict (same keys you pass to the module).
        prediction: The dspy.Prediction returned by the module.
 
    Returns:
        Score with:
            value: float between 0 and 1
            is_success: bool (whether this meets your success threshold)
            feedback: str (human-readable explanation of the score)
            error_type: str (optional categorization like "wrong_format",
                        "missing_info" — enables diversity-aware optimization)
    """
    ...

The feedback field is particularly important: more specific feedback produces better optimization results. Instead of "Wrong", prefer "Expected a 3-digit number but got a sentence."

The error_type field is optional but recommended: it enables diversity-aware features that ensure the optimizer addresses all categories of failures, not just the most common one.

On this page