Migration from DSPy

If you're using DSPy's built-in optimizers (MIPROv2, BootstrapFewShot, COPRO), migrating to Vizpy is mostly a one-line change. The conceptual difference is deeper and worth understanding.

What Changes — and Why

DSPy's built-in optimizers are demonstration-based: they search for good few-shot examples to prepend to your prompt. They work by running your module on a training set, finding high-scoring traces, and using those as demonstrations.

Vizpy's optimizers are failure-based: they analyze why predictions are wrong, extract correction rules from that analysis, and synthesize those rules into instructions. The optimizer reads your metric feedback, not just your metric score.

This distinction matters most when:

The model is making a systematic error (always does X when it should do Y)
The error requires understanding context, not just seeing more examples
You want to know what rule was learned, not just whether the score went up

The Metric Change

This is the only breaking change. DSPy metrics return a bool or float. Vizpy metrics return a vizpy.Score.

Before:

def metric(example, prediction, trace=None) -> bool:
    return prediction.answer.lower() == example["answer"].lower()

After:

def metric(example, prediction) -> vizpy.Score:
    correct = prediction.answer.lower() == example["answer"].lower()
    return vizpy.Score(
        value=1.0 if correct else 0.0,
        is_success=correct,
        feedback=f"Expected '{example['answer']}', got '{prediction.answer}'" if not correct else "",
    )

The feedback field is optional — you'll get a working optimizer without it. But feedback is how the optimizer understands why a prediction failed, which determines the quality of the rules it generates. Richer feedback → more precise rules.

API Changes

DSPy	Vizpy	Notes
`optimizer.compile(module, trainset=examples)`	`optimizer.optimize(module, examples)`	`compile` → `optimize`, `trainset=` is positional
`dspy.MIPROv2(metric=metric)`	`vizpy.ContraPromptOptimizer(metric=metric)`	or `PromptGradOptimizer`
`dspy.BootstrapFewShot(metric=metric)`	`vizpy.ContraPromptOptimizer(metric=metric)`
Metric returns `bool` or `float`	Metric returns `vizpy.Score`	Add `feedback` for best results

Full Migration Example

Here's a before/after for a real task: classifying commit messages by type.

The DSPy version finds good few-shot examples. The Vizpy version learns the rule that separates feat from fix from refactor — which is harder to convey with examples alone when the commits are ambiguous.

Before (DSPy MIPROv2):

import dspy
 
class ClassifyCommit(dspy.Signature):
    """Classify a git commit message by type."""
    message: str = dspy.InputField()
    commit_type: str = dspy.OutputField(desc="One of: feat, fix, refactor, docs, test, chore")
 
module = dspy.Predict(ClassifyCommit)
 
def metric(example, prediction, trace=None):
    return prediction.commit_type.lower() == example["commit_type"].lower()
 
examples = [dspy.Example(**ex).with_inputs("message") for ex in train_data]
 
optimizer = dspy.MIPROv2(metric=metric, auto="light")
optimized = optimizer.compile(module, trainset=examples)

After (Vizpy ContraPromptOptimizer):

import dspy
import vizpy
 
class ClassifyCommit(dspy.Signature):
    """Classify a git commit message by type."""
    message: str = dspy.InputField()
    commit_type: str = dspy.OutputField(desc="One of: feat, fix, refactor, docs, test, chore")
 
module = dspy.Predict(ClassifyCommit)
 
# The feedback explains the key distinction the model keeps missing
COMMIT_TYPE_RULES = {
    "feat": "feat = new user-visible behaviour that didn't exist before",
    "fix": "fix = corrects behaviour that was broken; user experienced a bug",
    "refactor": "refactor = code restructure with no behaviour change; user sees nothing",
    "docs": "docs = only documentation files changed",
    "test": "test = only test files changed",
    "chore": "chore = tooling, deps, CI — nothing that touches runtime behaviour",
}
 
def metric(example, prediction):
    expected = example["commit_type"].lower()
    actual = prediction.commit_type.strip().lower()
    is_correct = expected == actual
 
    feedback = ""
    if not is_correct:
        feedback = (
            f"Classified as '{actual}', should be '{expected}'. "
            f"Rule for '{expected}': {COMMIT_TYPE_RULES.get(expected, '')} "
            f"Rule for '{actual}': {COMMIT_TYPE_RULES.get(actual, '')}"
        )
 
    return vizpy.Score(
        value=1.0 if is_correct else 0.0,
        is_success=is_correct,
        feedback=feedback,
        error_type=f"{actual}_as_{expected}" if not is_correct else "",
    )
 
optimizer = vizpy.ContraPromptOptimizer(metric=metric)
optimized = optimizer.optimize(module, train_data)

The key insight: by providing the rule for both the wrong label and the right label in the feedback, the optimizer can extract a precise decision boundary — not just "use feat more often" but "feat = new user-visible behaviour; fix = corrects broken behaviour."

Your metric is a continuous score (rubric-based evaluation, semantic similarity)
You have 50+ training examples
The failure mode is distributed across many examples rather than appearing as clear contrastive pairs

See the examples section for side-by-side comparisons across different task types.

Migration from DSPy

Migration from DSPy

What Changes — and Why

The Metric Change

API Changes

Full Migration Example

Migration Checklist

Update your metric signature

Remove DSPy Example wrapping

Change the optimizer call

Choose your optimizer

Choosing Between the Two Optimizers

On this page