Introduction
Better prompts in one API call
Vizpy automatically optimizes your LLM prompts by learning from failures. One API call, dramatically better results.
Quickstart
Supported Models
Vizpy only optimizes the prompt — it never calls your model directly. You configure the model through DSPy, and Vizpy works with whatever you point it at.
| Type | Examples |
|---|---|
| Hosted APIs | OpenAI, Anthropic, Mistral, Google Gemini, Cohere |
| Self-hosted | Ollama, vLLM, LM Studio, any OpenAI-compatible endpoint |
| Custom endpoints | Internal proxies, fine-tuned models behind an API |
The optimizer runs on Vizpy's servers using your VIZPY_API_KEY. Your model and your
data stay wherever you host them.
The Problem
Prompts fail for specific, fixable reasons — the model has the wrong mental model of your task. You can't fix this by rewriting words. You need to know what the model thinks it's supposed to do, and correct that.
Vizpy finds it. It runs your examples, extracts the rule that explains each failure, validates that the rule actually helps, and synthesizes everything into precise instructions you can read.
Quick Example
GPT-4o-mini misclassifies workflow blockers as CRITICAL because it pattern-matches on "ASAP" and "blocking" instead of reasoning about impact. CRITICAL should be reserved for production outages and security incidents — not a broken CI pipeline.
What the optimizer learned:
"CRITICAL = customer-facing impact (outage, data loss, security breach). HIGH = internal team velocity blocked (CI, staging, sprint). This distinction applies even when the email uses urgent language — impact radius determines level, not tone."
That rule is injected into the module's instructions. You can read it, audit it, and edit it if it's wrong.
See the full example with training data →
Key Features
One API Call
Pass your module, examples, and metric. Get back an optimized module with better instructions.
Learns from Failures
Extracts the rule that explains each failure — not just collects examples of getting it right.
Validates Every Rule
Each candidate rule is tested on held-out examples before being applied. Regressions are rejected.
Interpretable
Learned rules are plain English. You can read exactly what changed and why.
Works with DSPy
Pass any dspy.Module, get back an optimized dspy.Module. Your signature and structure are unchanged.
Two Optimizers
ContraPromptOptimizer for classification tasks. PromptGradOptimizer for generation and rubric-based metrics.
How It Works
Solve with retries
Each training example is run through your module. On failure, the module retries using your metric's feedback field as a hint. This generates contrastive pairs — the wrong attempt and the corrected one.
Mine the signal
The optimizer selects pairs where the gap between failure and success is largest. These are the cases that most clearly reveal where the model's understanding breaks down.
Extract rules
An LLM analyzes the pairs and generates candidate rules: "When X happens, do Y instead of Z." The feedback from your metric shapes these rules directly — more specific feedback produces more precise rules.
Validate
Each candidate rule is tested independently on held-out examples. A rule is only accepted if it improves the score without causing regressions elsewhere.
Synthesize and inject
Accepted rules are merged into clear instructions and injected into your module's prompt. The original signature and structure are preserved — only the instructions change.
Iterate
The loop repeats with the updated module. Each round builds on the last. Early stopping triggers when no further improvement is found.
Pricing
Simple, predictable pricing. One credit = one optimize() call.
| Plan | Price | Credits/Month | Best For |
|---|---|---|---|
| Free | $0 | 10 | Trying it out |
| Pro | $20 | 200 | Indie devs |
| Enterprise | $200 | 1,000 | Scale |