Recipe Difficulty Rating

LLMs rate recipe difficulty by the wrong signals. Five ingredients, short prep time, few steps — that must be Easy, right? Except beef wellington is five steps and genuinely hard. The model has no concept of technique difficulty — it can't tell the difference between "chop onions" and "julienne carrots into 2mm matchsticks while keeping them cold so they don't wilt."

This example uses ContraPromptOptimizer to teach the model what actually makes a recipe hard: knife technique, timing parallelism, temperature precision, and the gap between reading a technique and executing it under pressure.

Optimizer: ContraPromptOptimizer Difficulty: Beginner

The Failure Mode

recipe = {
    "name": "Classic Beef Wellington",
    "steps": 6,
    "time_minutes": 90,
    "ingredients": 8,
    "description": "Sear fillet, make duxelles, wrap in prosciutto and puff pastry, bake to medium-rare.",
}
 
result = module(recipe=format_recipe(recipe))
print(result.difficulty)  # Easy — completely wrong

The model counts steps and ingredients. It misses that:

Getting beef to exactly medium-rare through pastry requires a probe thermometer and experience
Duxelles must be cooked completely dry or the pastry goes soggy — a technique that takes feel, not just instructions
You have to rest the beef twice at specific temperatures, coordinated with pastry timing

Full Example

import dspy
import vizpy
 
dspy.configure(lm=dspy.LM("openai/gpt-4o-mini"))
 
 
class RecipeDifficulty(dspy.Signature):
    """Rate the difficulty of a recipe for a home cook.
    Consider technique required, not just ingredient count or time."""
 
    recipe: str = dspy.InputField(desc="Recipe name, steps, and description")
    difficulty: str = dspy.OutputField(desc="One of: Easy, Medium, Hard, Expert")
 
 
module = dspy.Predict(RecipeDifficulty)
 
 
_recipes = [
    {
        "name": "Scrambled Eggs",
        "desc": "Beat eggs, cook on low heat stirring constantly, remove just before set.",
        "difficulty": "Easy",
        "reason": "Single technique, immediate feedback, very forgiving.",
    },
    {
        "name": "Beef Wellington",
        "desc": "Sear beef fillet, make mushroom duxelles (must be bone-dry), wrap in prosciutto and puff pastry, bake to 125°F internal. Must rest twice.",
        "difficulty": "Expert",
        "reason": "Temperature precision, duxelles texture judgment, timing two proteins through pastry.",
    },
    {
        "name": "Pasta Carbonara",
        "desc": "Cook guanciale, whisk eggs with pecorino, combine off heat — egg must not scramble.",
        "difficulty": "Medium",
        "reason": "The off-heat emulsification is easy to get wrong, but recoverable with practice.",
    },
    {
        "name": "Croissants",
        "desc": "Laminate butter into dough through 27 layers via repeated folds, proof 2 hours, bake.",
        "difficulty": "Expert",
        "reason": "Lamination requires consistent butter temperature throughout; collapses if rushed.",
    },
    {
        "name": "Roast Chicken",
        "desc": "Season, truss, roast at 425°F for 1 hour, rest 15 minutes.",
        "difficulty": "Easy",
        "reason": "Forgiving technique, single timer, no precision required.",
    },
    {
        "name": "Hollandaise Sauce",
        "desc": "Emulsify egg yolks with clarified butter over a bain-marie. Temperature must stay between 140-160°F or sauce breaks.",
        "difficulty": "Hard",
        "reason": "Narrow temperature window, immediate breakage if overheated, can't be rescued easily.",
    },
    {
        "name": "French Omelette",
        "desc": "Beat eggs, cook in butter on medium-high heat, shake pan constantly, fold and roll out in 90 seconds.",
        "difficulty": "Hard",
        "reason": "Speed and pan control — looks simple, requires 50+ attempts to get right.",
    },
    {
        "name": "Banana Bread",
        "desc": "Mash bananas, mix with flour, sugar, butter, eggs, bake 60 minutes.",
        "difficulty": "Easy",
        "reason": "No technique, very forgiving, no timing precision needed.",
    },
]
 
 
def format_recipe(r):
    return f"{r['name']}: {r['desc']}"
 
 
def metric(example, prediction):
    predicted = prediction.difficulty.strip().capitalize()
    expected = example["difficulty"]
    is_correct = predicted == expected
 
    feedback = ""
    if not is_correct:
        feedback = (
            f"Rated '{predicted}', should be '{expected}'. "
            f"Reason: {example['reason']} "
            f"Technique and precision requirements matter more than step count."
        )
 
    return vizpy.Score(
        value=1.0 if is_correct else 0.0,
        is_success=is_correct,
        feedback=feedback,
        error_type="" if is_correct else (
            "underrated" if ["Easy", "Medium", "Hard", "Expert"].index(predicted) <
                            ["Easy", "Medium", "Hard", "Expert"].index(expected)
            else "overrated"
        ),
    )
 
 
train_examples = [
    {"recipe": format_recipe(r), "difficulty": r["difficulty"], "reason": r["reason"]}
    for r in _recipes
]
 
optimizer = vizpy.ContraPromptOptimizer(metric=metric)
optimized = optimizer.optimize(module, train_examples)
 
 
# Test
result = optimized(recipe="Beef Wellington: Sear fillet, make mushroom duxelles (bone-dry), wrap in prosciutto and puff pastry, bake to 125°F internal.")
print(result.difficulty)  # Expert

What the Optimizer Discovers

The contrastive pairs here are particularly rich: a 3-ingredient hollandaise rates Hard, while a 10-ingredient banana bread rates Easy. The optimizer sees these pairs and extracts rules that override the surface-level heuristics:

"Rate based on technique precision, not ingredient count. Key hard signals: narrow temperature windows, emulsification steps, simultaneous timing coordination, and any technique where failure produces an unrecoverable result."

The error_type field (underrated / overrated) enables stratified sampling — the optimizer sees both directions of error in each batch instead of just the more common one.

Recipe Difficulty Rating

Recipe Difficulty Rating

The Failure Mode

Full Example

What the Optimizer Discovers

On this page