Vizpy

Examples

Runnable examples for VizPy prompt optimizers — from real product problems to benchmarks

Examples

These examples are designed around one question: when does the optimizer actually matter?

Each real-world example has a specific, non-obvious failure mode — the kind where you'd spend a day rewriting your prompt and still not fix it, because the issue isn't word choice, it's that the model has the wrong mental model of the task. The optimizer finds and articulates that mental model for you.

Real-World Use Cases

Benchmarks

Standard research benchmarks — useful for measuring optimizer performance with ground-truth labels and comparing across runs.

Which Optimizer for Which Task?

Task typeRecommended optimizerWhy
Classification with a systematic biasContraPromptOptimizerContrastive mining finds what separates correct from incorrect
Open-ended generation qualityPromptGradOptimizerBatch gradient analysis handles rubric-based metrics better
Extraction with subtle ownership/attributionPromptGradOptimizerAccumulates rules across many failure examples
Translation between registers (technical→user)ContraPromptOptimizerClear contrastive pairs exist between good and bad output

Both optimizers accept the same interface — you can swap them without changing your metric or examples.

On this page