ContraPromptOptimizer

ContraPrompt is a contrastive prompt optimizer that automatically improves your DSPy module's instructions by learning from its own successes and failures on your training data. It is domain-agnostic — you supply a Metric that returns a Score object, and the optimizer handles everything else.

ContraPrompt works iteratively: each iteration evaluates your module, identifies where it can improve, and refines the instructions accordingly. It includes built-in early stopping, rule validation, and safeguards against prompt degradation.

API Signature

from vizpy import ContraPromptOptimizer
from vizpy.optimizers.core import ContraPromptConfig, Score
 
optimizer = ContraPromptOptimizer(
    metric=my_metric,                          # Required
    config=ContraPromptConfig(...),             # Optional, all defaults are reasonable
    feedback_generator=my_feedback_gen,         # Optional
    example_formatter=my_formatter,             # Optional
)
 
optimized_module = optimizer.optimize(
    module=my_dspy_module,
    train_examples=train_data,
    val_examples=val_data,                      # Optional, splits from train if omitted
)

Constructor Parameters

Parameter	Type	Description
`metric`	`Metric`	Required. Callable that takes `(example: dict, prediction)` and returns a `Score`. This is the only thing you must implement.
`config`	`ContraPromptConfig`	Configuration dataclass containing all hyperparameters. Defaults are tuned for general use. See the parameter reference below.
`feedback_generator`	`FeedbackGenerator`	Callable that takes `(example, attempts)` and returns feedback text used during optimization. Default shows score + feedback from each prior attempt.
`example_formatter`	`ExampleFormatter`	Formats examples for internal optimization prompts. Must implement `format_for_gradient(example, score) -> str` and `format_context(example) -> str`. Default uses `str(example)[:500]`.

`optimize()` Method

def optimize(
    module: dspy.Module,
    train_examples: list[dict],
    val_examples: Optional[list[dict]] = None,
) -> dspy.Module

Parameter	Type	Description
`module`	`dspy.Module`	The DSPy module to optimize. Must have a `.signature` with `.instructions`.
`train_examples`	`list[dict]`	Training examples passed as keyword arguments to the module.
`val_examples`	`list[dict]`	Validation examples for scoring. If `None`, a holdout fraction is split from `train_examples` automatically.

Returns: An optimized dspy.Module with improved instructions (or the original module if no improvement was found).

Important Parameters (`ContraPromptConfig`)

These are the parameters you're most likely to tune.

Parameter	Type	Default	Description
`max_iterations`	`int`	`5`	Number of optimization iterations. More iterations = more opportunity for improvement, but diminishing returns after 3-5.
`max_attempts`	`int`	`3`	Retries per example during optimization. The optimizer evaluates each example multiple times to learn from variation. Higher values improve signal quality but cost more LLM calls.
`validate_rules`	`bool`	`True`	Strongly recommended to keep enabled. Validates each learned improvement individually before applying it. Prevents harmful changes from degrading the prompt.
`max_rules`	`int`	`8`	Maximum number of improvements accumulated across iterations. Prevents the prompt from becoming overloaded.
`patience`	`int`	`3`	Early stop after this many iterations without improvement on validation.

All Parameters (`ContraPromptConfig`)

Core Loop

Parameter	Type	Default	Description
`max_attempts`	`int`	`3`	Retries per example during optimization.
`max_iterations`	`int`	`5`	Optimization iterations.
`min_improvement`	`float`	`0.02`	Minimum score improvement required to consider a signal meaningful.
`patience`	`int`	`3`	Early stop after this many iterations without improvement.
`max_workers`	`int`	`10`	Thread pool size for parallel evaluation.
`num_val_runs`	`int`	`1`	Validation runs per iteration, averaged for noise reduction. Set to 3 if your metric is stochastic.
`verbose`	`bool`	`True`	Print progress logs.
`seed`	`int`	`None`	Seed for deterministic train/val splitting and subsampling. `None` = random.
`val_holdout_fraction`	`float`	`0.2`	Fraction of training data held out for validation when `val_examples` is not provided.

Signal Extraction

Parameter	Type	Default	Description
`demonstrations_k`	`int`	`0`	Number of improvement signals to extract per iteration. `0` = auto-scale with training size.
`auto_k_min`	`int`	`3`	Lower bound when auto-scaling.
`auto_k_max`	`int`	`10`	Upper bound when auto-scaling.
`auto_k_divisor`	`int`	`10`	Auto-scaling divisor: `k = len(train) // auto_k_divisor`.

Rule Management

Parameter	Type	Default	Description
`max_rules`	`int`	`8`	Maximum accumulated improvements. Oldest or weakest are pruned when this limit is hit.
`tip_mode`	`str`	`"synthesis"`	`"synthesis"`: Consolidates improvements into concise guidance. `"injection"`: Applies improvements directly.

Rule Validation

Parameter	Type	Default	Description
`validate_rules`	`bool`	`True`	Validate each improvement individually before applying.
`rule_validation_examples`	`int`	`15`	Number of examples used for validation.
`rule_validation_min_delta`	`float`	`-0.02`	Minimum acceptable score delta. Negative value = soft threshold (reject only actively harmful changes).

Rule Selection

Parameter	Type	Default	Description
`rule_selection`	`str`	`"delta"`	`"delta"`: Rank by measured improvement. `"preference"`: Use preference-based scoring (more expensive but higher quality).
`rule_coverage_weight`	`float`	`0.0`	Extra weight for covering diverse error types when ranking. `0` = disabled.
`preference_beta`	`float`	`5.0`	Scaling factor for preference scoring. Only used when `rule_selection="preference"`.
`preference_examples`	`int`	`0`	Examples for preference scoring. `0` = reuse `rule_validation_examples`.

Advanced Features

Parameter	Type	Default	Description
`progressive_rules`	`bool`	`False`	Maintain improvements across iterations with periodic re-validation and pruning. Useful for long runs (5+ iterations).
`contrastive_demos`	`bool`	`False`	Include concrete examples of improved reasoning in the optimized prompt. Requires chain-of-thought modules (e.g., `dspy.ChainOfThought`).
`max_contrastive_demos`	`int`	`2`	Maximum demo examples to include.
`tiered_mining`	`bool`	`False`	Extract more granular improvement signals (incremental gains, not just worst-to-best).
`failure_focused`	`bool`	`False`	After iteration 1, focus on the hardest examples (those the model is inconsistent on).
`failure_focus_floor`	`float`	`0.3`	Never reduce the training set below this fraction when failure-focused.
`failure_score_threshold`	`float`	`0.5`	Examples scoring below this are treated as failures.
`max_failure_analysis_examples`	`int`	`10`	Maximum failure examples analyzed per iteration.
`subsample_fraction`	`float`	`0.6`	Fraction of training data used per iteration. Subsampling adds diversity across iterations.
`min_subsample_size`	`int`	`10`	Minimum training examples per iteration.
`diversity_mining`	`bool`	`False`	Use embedding-based clustering to ensure diverse error coverage. Requires `sentence-transformers` and `scikit-learn`.
`diversity_embedding_model`	`str`	`"sentence-transformers/all-MiniLM-L6-v2"`	Embedding model for diversity-aware optimization.

Usage Example

Basic Usage

import dspy
from vizpy import ContraPromptOptimizer
from vizpy.optimizers.core import ContraPromptConfig, Score
 
# 1. Define your metric
def my_metric(example: dict, prediction) -> Score:
    correct = prediction.answer.strip() == example["gold"].strip()
    return Score(
        value=1.0 if correct else 0.0,
        is_success=correct,
        feedback="Correct" if correct else f"Expected '{example['gold']}', got '{prediction.answer}'",
        error_type="" if correct else "wrong_answer",
    )
 
# 2. Configure and create optimizer
config = ContraPromptConfig(
    max_iterations=5,
    max_attempts=3,
    validate_rules=True,
)
 
optimizer = ContraPromptOptimizer(metric=my_metric, config=config)
 
# 3. Optimize
lm = dspy.LM("openai/gpt-4o-mini", temperature=0.7)
dspy.configure(lm=lm)
 
module = dspy.ChainOfThought("question -> answer")
optimized = optimizer.optimize(module, train_examples, val_examples)

With Advanced Features

config = ContraPromptConfig(
    max_iterations=7,
    max_attempts=3,
    validate_rules=True,
    progressive_rules=True,       # Persistent improvements across iterations
    contrastive_demos=True,       # Include reasoning examples in prompt
    failure_focused=True,         # Focus on hardest examples after iter 1
    diversity_mining=True,        # Ensure diverse error coverage
    rule_selection="preference",  # Preference-based ranking
)
 
optimizer = ContraPromptOptimizer(metric=my_metric, config=config)
optimized = optimizer.optimize(module, train_examples, val_examples)

Metric Protocol

Your metric must conform to this protocol:

from vizpy.optimizers.core import Score
 
def my_metric(example: dict, prediction) -> Score:
    """
    Args:
        example: The training example dict (same keys you pass to the module).
        prediction: The dspy.Prediction returned by the module.
 
    Returns:
        Score with:
            value: float between 0 and 1
            is_success: bool (whether this meets your success threshold)
            feedback: str (human-readable explanation of the score)
            error_type: str (optional categorization like "wrong_format",
                        "missing_info" — enables diversity-aware optimization)
    """
    ...

The feedback field is particularly important: more specific feedback produces better optimization results. Instead of "Wrong", prefer "Expected a 3-digit number but got a sentence."

The error_type field is optional but recommended: it enables diversity-aware features that ensure the optimizer addresses all categories of failures, not just the most common one.

ContraPromptOptimizer

On this page