ExamplesHotPotQAMulti-hop question answering benchmarkHotPotQA Difficulty: Intermediate | File: examples/04_hotpotqa.py Multi-hop question answering requiring reasoning over multiple documents. Usage python examples/04_hotpotqa.py --lm anthropic/claude-haiku-4-5-20251001 --demoPreviousBig-Bench HardNextARC-Challenge