ContraPromptOptimizer
API reference for the contrastive prompt optimizer
ContraPrompt is a contrastive prompt optimizer that automatically improves your DSPy module's instructions by learning from its own successes and failures on your training data. It is domain-agnostic — you supply a Metric that returns a Score object, and the optimizer handles everything else.
ContraPrompt works iteratively: each iteration evaluates your module, identifies where it can improve, and refines the instructions accordingly. It includes built-in early stopping, rule validation, and safeguards against prompt degradation.
API Signature
Constructor Parameters
| Parameter | Type | Description |
|---|---|---|
metric | Metric | Required. Callable that takes (example: dict, prediction) and returns a Score. This is the only thing you must implement. |
config | ContraPromptConfig | Configuration dataclass containing all hyperparameters. Defaults are tuned for general use. See the parameter reference below. |
feedback_generator | FeedbackGenerator | Callable that takes (example, attempts) and returns feedback text used during optimization. Default shows score + feedback from each prior attempt. |
example_formatter | ExampleFormatter | Formats examples for internal optimization prompts. Must implement format_for_gradient(example, score) -> str and format_context(example) -> str. Default uses str(example)[:500]. |
optimize() Method
| Parameter | Type | Description |
|---|---|---|
module | dspy.Module | The DSPy module to optimize. Must have a .signature with .instructions. |
train_examples | list[dict] | Training examples passed as keyword arguments to the module. |
val_examples | list[dict] | Validation examples for scoring. If None, a holdout fraction is split from train_examples automatically. |
Returns: An optimized dspy.Module with improved instructions (or the original module if no improvement was found).
Important Parameters (ContraPromptConfig)
These are the parameters you're most likely to tune.
| Parameter | Type | Default | Description |
|---|---|---|---|
max_iterations | int | 5 | Number of optimization iterations. More iterations = more opportunity for improvement, but diminishing returns after 3-5. |
max_attempts | int | 3 | Retries per example during optimization. The optimizer evaluates each example multiple times to learn from variation. Higher values improve signal quality but cost more LLM calls. |
validate_rules | bool | True | Strongly recommended to keep enabled. Validates each learned improvement individually before applying it. Prevents harmful changes from degrading the prompt. |
max_rules | int | 8 | Maximum number of improvements accumulated across iterations. Prevents the prompt from becoming overloaded. |
patience | int | 3 | Early stop after this many iterations without improvement on validation. |
All Parameters (ContraPromptConfig)
Core Loop
| Parameter | Type | Default | Description |
|---|---|---|---|
max_attempts | int | 3 | Retries per example during optimization. |
max_iterations | int | 5 | Optimization iterations. |
min_improvement | float | 0.02 | Minimum score improvement required to consider a signal meaningful. |
patience | int | 3 | Early stop after this many iterations without improvement. |
max_workers | int | 10 | Thread pool size for parallel evaluation. |
num_val_runs | int | 1 | Validation runs per iteration, averaged for noise reduction. Set to 3 if your metric is stochastic. |
verbose | bool | True | Print progress logs. |
seed | int | None | Seed for deterministic train/val splitting and subsampling. None = random. |
val_holdout_fraction | float | 0.2 | Fraction of training data held out for validation when val_examples is not provided. |
Signal Extraction
| Parameter | Type | Default | Description |
|---|---|---|---|
demonstrations_k | int | 0 | Number of improvement signals to extract per iteration. 0 = auto-scale with training size. |
auto_k_min | int | 3 | Lower bound when auto-scaling. |
auto_k_max | int | 10 | Upper bound when auto-scaling. |
auto_k_divisor | int | 10 | Auto-scaling divisor: k = len(train) // auto_k_divisor. |
Rule Management
| Parameter | Type | Default | Description |
|---|---|---|---|
max_rules | int | 8 | Maximum accumulated improvements. Oldest or weakest are pruned when this limit is hit. |
tip_mode | str | "synthesis" | "synthesis": Consolidates improvements into concise guidance. "injection": Applies improvements directly. |
Rule Validation
| Parameter | Type | Default | Description |
|---|---|---|---|
validate_rules | bool | True | Validate each improvement individually before applying. |
rule_validation_examples | int | 15 | Number of examples used for validation. |
rule_validation_min_delta | float | -0.02 | Minimum acceptable score delta. Negative value = soft threshold (reject only actively harmful changes). |
Rule Selection
| Parameter | Type | Default | Description |
|---|---|---|---|
rule_selection | str | "delta" | "delta": Rank by measured improvement. "preference": Use preference-based scoring (more expensive but higher quality). |
rule_coverage_weight | float | 0.0 | Extra weight for covering diverse error types when ranking. 0 = disabled. |
preference_beta | float | 5.0 | Scaling factor for preference scoring. Only used when rule_selection="preference". |
preference_examples | int | 0 | Examples for preference scoring. 0 = reuse rule_validation_examples. |
Advanced Features
| Parameter | Type | Default | Description |
|---|---|---|---|
progressive_rules | bool | False | Maintain improvements across iterations with periodic re-validation and pruning. Useful for long runs (5+ iterations). |
contrastive_demos | bool | False | Include concrete examples of improved reasoning in the optimized prompt. Requires chain-of-thought modules (e.g., dspy.ChainOfThought). |
max_contrastive_demos | int | 2 | Maximum demo examples to include. |
tiered_mining | bool | False | Extract more granular improvement signals (incremental gains, not just worst-to-best). |
failure_focused | bool | False | After iteration 1, focus on the hardest examples (those the model is inconsistent on). |
failure_focus_floor | float | 0.3 | Never reduce the training set below this fraction when failure-focused. |
failure_score_threshold | float | 0.5 | Examples scoring below this are treated as failures. |
max_failure_analysis_examples | int | 10 | Maximum failure examples analyzed per iteration. |
subsample_fraction | float | 0.6 | Fraction of training data used per iteration. Subsampling adds diversity across iterations. |
min_subsample_size | int | 10 | Minimum training examples per iteration. |
diversity_mining | bool | False | Use embedding-based clustering to ensure diverse error coverage. Requires sentence-transformers and scikit-learn. |
diversity_embedding_model | str | "sentence-transformers/all-MiniLM-L6-v2" | Embedding model for diversity-aware optimization. |
Usage Example
Basic Usage
With Advanced Features
Metric Protocol
Your metric must conform to this protocol:
The feedback field is particularly important: more specific feedback produces better optimization results. Instead of "Wrong", prefer "Expected a 3-digit number but got a sentence."
The error_type field is optional but recommended: it enables diversity-aware features that ensure the optimizer addresses all categories of failures, not just the most common one.