PromptGradOptimizer
API reference for the gradient-based prompt optimizer
PromptGrad optimizes your DSPy module by analyzing failure patterns across batches of examples and learning targeted correction rules. It freezes the module's base instructions and accumulates validated improvements as a separate layer, keeping the original prompt intact.
You can provide any base prompt — extract it from the module, pass a custom instruction string, or supply instructions generated by any external optimizer. PromptGrad layers its corrections on top of whatever base you provide.
API Signature
Constructor Parameters
| Parameter | Type | Description |
|---|---|---|
metric | Metric | Required. Callable that takes (example: dict, prediction) and returns a Score. |
config | PromptGradConfig | Configuration dataclass with all hyperparameters. |
base_prompt_source | str | How to obtain base instructions. "module" extracts from the module's existing instructions. Any other string is used directly as the base instructions. |
example_formatter | ExampleFormatter | Formats examples for internal optimization. Must implement format_for_gradient(example, score) -> str and format_context(example) -> str. |
rule_acceptor | Callable[[ParsedRule], bool] | Custom filter for proposed improvements. Default accepts additions freely and removals only with high confidence. |
optimize() Method
| Parameter | Type | Description |
|---|---|---|
module | dspy.Module | The DSPy module to optimize. |
train_examples | list[dict] | Training examples for optimization. |
val_examples | list[dict] | Validation examples. If None, a holdout is split from train_examples. |
Returns: An optimized dspy.Module with base instructions + learned corrections.
Important Parameters (PromptGradConfig)
These are the parameters you're most likely to tune.
| Parameter | Type | Default | Description |
|---|---|---|---|
epochs | int | 3 | Number of training epochs. Each epoch samples multiple batches and accumulates improvements. |
batches_per_epoch | int | 3 | Batches per epoch. More batches = more diverse improvement proposals. Total optimization calls = epochs * batches_per_epoch. |
batch_size | int | 10 | Examples per batch. Larger batches produce more representative failure analysis but cost more. |
validate_rules | bool | True | Strongly recommended. Tests each proposed improvement individually before applying. Prevents harmful changes from accumulating. |
gradient_consensus | bool | True | Recommended. Uses multiple analysis passes for robustness. Reduces variance from single-pass noise. |
base_prompt_source | str | "module" | "module" extracts from the module. Any other string is used directly as custom base instructions. Also configurable via the constructor parameter. |
All Parameters (PromptGradConfig)
Core Loop
| Parameter | Type | Default | Description |
|---|---|---|---|
epochs | int | 3 | Training epochs. |
batches_per_epoch | int | 3 | Batches per epoch. |
batch_size | int | 10 | Examples per batch. |
max_workers | int | 10 | Thread pool size for parallel evaluation. |
num_val_runs | int | 1 | Validation runs per epoch, averaged for stability. Increase to 3 for stochastic metrics. |
verbose | bool | True | Print progress logs. |
seed | int | None | Seed for deterministic sampling. None = random. |
val_holdout_fraction | float | 0.2 | Holdout fraction when val_examples is not provided. |
Rule Management
| Parameter | Type | Default | Description |
|---|---|---|---|
max_local_rules | int | 15 | Maximum improvements before merging or pruning. Hard cap to prevent prompt flooding. |
merge_threshold | int | 8 | When improvement count exceeds this, similar improvements are consolidated. |
early_stop_patience | int | 2 | Stop after this many consecutive epochs without improvement. |
rollback_threshold | float | 0.05 | If score drops by more than this below baseline, rollback to the previous best state. |
failure_threshold | float | 0.5 | Examples scoring below this are treated as failures during analysis. |
Rule Validation
| Parameter | Type | Default | Description |
|---|---|---|---|
validate_rules | bool | True | Validate each improvement individually before applying. |
rule_validation_examples | int | 15 | Validation subset size. |
rule_validation_min_delta | float | 0.01 | Minimum improvement required to accept a change. |
rule_validation_mode | str | "delta" | "delta": Fixed threshold. "auto": Confidence-interval-based testing (accepts if the lower bound of a 95% CI is > 0). |
rule_selection_mode | str | "default" | "default": Per-rule threshold acceptance. "greedy": Forward selection — keeps each improvement only if it raises overall score. More expensive but produces minimal, high-quality sets. |
rule_pruning_strategy | str | "rollback" | "rollback": Revert to last best state when overloaded. "delta": Keep top improvements by measured quality. |
Conditional Rules
| Parameter | Type | Default | Description |
|---|---|---|---|
conditional_rules | bool | False | When enabled, improvements are applied selectively per example based on relevance. Prevents irrelevant corrections from interfering with examples they don't apply to. |
rule_trigger_top_k | int | 5 | Maximum improvements applied per example when conditional mode is active. |
rule_trigger_min_overlap | float | 0.1 | Minimum relevance threshold for applying an improvement to a given example. |
Sampling
| Parameter | Type | Default | Description |
|---|---|---|---|
stratified_sampling | bool | True | Ensure batches cover diverse error types by sampling proportionally from each failure category. |
Gradient Consensus
| Parameter | Type | Default | Description |
|---|---|---|---|
gradient_consensus | bool | True | Use multiple analysis passes at varying temperatures for robustness. |
gradient_samples | int | 3 | Number of analysis passes for consensus. |
gradient_temperature_base | float | 0.7 | Base temperature for the first pass. |
gradient_temperature_step | float | 0.15 | Temperature increment per additional pass. |
gradient_max_tokens | int | 4000 | Max tokens for analysis responses. |
gradient_failure_display_limit | int | 5 | Maximum failure examples analyzed per batch. |
Base Prompt Source
| Parameter | Type | Default | Description |
|---|---|---|---|
base_prompt_source | str | "module" | Source for base instructions. "module" extracts from the module. Any other string is used directly as custom base instructions. |
Usage Example
Basic Usage
With Custom Base Instructions
With Conditional Rules
With Greedy Selection
Metric Protocol
Your metric must return a Score:
The error_type field is especially valuable for PromptGrad: it enables stratified batch sampling, ensuring the optimizer analyzes diverse failure modes rather than oversampling the most common error.