Introduction

Vizpy automatically optimizes your LLM prompts by learning from failures. One API call, dramatically better results.

Quickstart

pip install vizpy dspy-ai
export VIZPY_API_KEY="..."
export OPENAI_API_KEY="sk-..."   # or your provider's key — see Supported Models below

import dspy
import vizpy
 
dspy.configure(lm=dspy.LM("openai/gpt-4o-mini"))  # use any model you have access to
 
# 1. Define your task
class Sentiment(dspy.Signature):
    """Classify the sentiment of a product review."""
    review: str = dspy.InputField()
    label: str = dspy.OutputField(desc="One of: POSITIVE, NEGATIVE, NEUTRAL")
 
module = dspy.ChainOfThought(Sentiment)
 
# 2. Define a metric — feedback is what the optimizer learns from
def metric(example, pred):
    expected = example["label"]
    actual = pred.label.strip().upper()
    return vizpy.Score(
        value=1.0 if expected == actual else 0.0,
        is_success=expected == actual,
        feedback="" if expected == actual else f"Expected {expected}, got {actual}.",
    )
 
# 3. A handful of labelled examples
train = [
    {"review": "Broke after one week.", "label": "NEGATIVE"},
    {"review": "Exceeded my expectations, very happy.", "label": "POSITIVE"},
    {"review": "Works as described, nothing special.", "label": "NEUTRAL"},
    {"review": "Stopped working on day two.", "label": "NEGATIVE"},
    {"review": "Solid build quality, does exactly what it promises.", "label": "POSITIVE"},
]
 
# 4. Optimize
optimizer = vizpy.ContraPromptOptimizer(metric=metric)
optimized = optimizer.optimize(module, train_examples=train)
 
# 5. Use the result — same interface, better instructions
print(optimized(review="Feels cheap and the buttons stick.").label)  # NEGATIVE

Supported Models

Vizpy only optimizes the prompt — it never calls your model directly. You configure the model through DSPy, and Vizpy works with whatever you point it at.

Type	Examples
Hosted APIs	OpenAI, Anthropic, Mistral, Google Gemini, Cohere
Self-hosted	Ollama, vLLM, LM Studio, any OpenAI-compatible endpoint
Custom endpoints	Internal proxies, fine-tuned models behind an API

# Hosted API
dspy.configure(lm=dspy.LM("openai/gpt-4o-mini"))
dspy.configure(lm=dspy.LM("anthropic/claude-haiku-4-5-20251001"))
 
# Self-hosted via Ollama
dspy.configure(lm=dspy.LM("ollama/llama3", api_base="http://localhost:11434"))
 
# Any OpenAI-compatible endpoint (vLLM, LM Studio, internal proxy, etc.)
dspy.configure(lm=dspy.LM("openai/your-model", api_base="http://your-host/v1", api_key="..."))

The optimizer runs on Vizpy's servers using your VIZPY_API_KEY. Your model and your data stay wherever you host them.

Prompts fail for specific, fixable reasons — the model has the wrong mental model of your task. You can't fix this by rewriting words. You need to know what the model thinks it's supposed to do, and correct that.

Vizpy finds it. It runs your examples, extracts the rule that explains each failure, validates that the rule actually helps, and synthesizes everything into precise instructions you can read.

Quick Example

GPT-4o-mini misclassifies workflow blockers as CRITICAL because it pattern-matches on "ASAP" and "blocking" instead of reasoning about impact. CRITICAL should be reserved for production outages and security incidents — not a broken CI pipeline.

import dspy
import vizpy
 
dspy.configure(lm=dspy.LM("openai/gpt-4o-mini"))
 
class EmailUrgency(dspy.Signature):
    """Classify the urgency level of an email."""
    email: str = dspy.InputField()
    urgency: str = dspy.OutputField(desc="One of: CRITICAL, HIGH, MEDIUM, LOW")
 
module = dspy.ChainOfThought(EmailUrgency)
 
# Before optimization
test = "Blocking our sprint. Tests failing, can't merge. Need help ASAP."
print(module(email=test).urgency)   # CRITICAL  ← wrong, this is HIGH
 
# Metric feedback tells the optimizer *why* it's wrong
def metric(example, prediction):
    expected, actual = example["gold_urgency"], prediction.urgency.strip().upper()
    is_correct = expected == actual
    feedback = (
        "CRITICAL = customer-facing outage/breach. Workflow blockers = HIGH."
        if not is_correct and expected == "HIGH" and actual == "CRITICAL" else
        f"Expected {expected}, got {actual}." if not is_correct else ""
    )
    return vizpy.Score(value=1.0 if is_correct else 0.0, is_success=is_correct, feedback=feedback)
 
optimizer = vizpy.PromptGradOptimizer(metric=metric)
optimized = optimizer.optimize(module, train_examples, val_examples)
 
# After optimization
print(optimized(email=test).urgency)  # HIGH  ← correct

What the optimizer learned:

"CRITICAL = customer-facing impact (outage, data loss, security breach). HIGH = internal team velocity blocked (CI, staging, sprint). This distinction applies even when the email uses urgent language — impact radius determines level, not tone."

That rule is injected into the module's instructions. You can read it, audit it, and edit it if it's wrong.

See the full example with training data →

Key Features

One API Call

Pass your module, examples, and metric. Get back an optimized module with better instructions.

Learns from Failures

Extracts the rule that explains each failure — not just collects examples of getting it right.

Validates Every Rule

Each candidate rule is tested on held-out examples before being applied. Regressions are rejected.

Interpretable

Learned rules are plain English. You can read exactly what changed and why.

Works with DSPy

Pass any dspy.Module, get back an optimized dspy.Module. Your signature and structure are unchanged.

Two Optimizers

ContraPromptOptimizer for classification tasks. PromptGradOptimizer for generation and rubric-based metrics.

Plan	Price	Credits/Month	Best For
Free	$0	10	Trying it out
Pro	$20	200	Indie devs
Enterprise	$200	1,000	Scale

View full pricing →

Introduction

Quickstart

Supported Models

The Problem

Quick Example

Key Features

One API Call

Learns from Failures

Validates Every Rule

Interpretable

Works with DSPy

Two Optimizers

How It Works

Solve with retries

Mine the signal

Extract rules

Validate

Synthesize and inject

Iterate

Pricing

Get Started

Quickstart

Choosing an Optimizer

API Reference

On this page