CSE/DSC 234 Spring 2026 Guest Lecture: Lakshya Agrawal (UC Berkeley)

Arun Kumar

7 chapters7 takeaways10 key terms5 questions

Overview

This lecture introduces Jeppa, a novel framework for "reflective optimization" that significantly enhances AI capabilities by enabling models to learn from their own experiences and textual feedback. Unlike traditional methods that rely heavily on massive datasets and gradient descent, Jeppa optimizes AI systems by refining their prompts and system specifications. This approach is demonstrated to be highly sample-efficient, capable of improving performance on complex tasks with minimal data, and applicable to various AI systems, including code generation, agent design, and even training model weights. The core idea is to leverage the rich information within text-based feedback to guide AI towards better performance, automating processes that previously required extensive human engineering.

How was this?

Save this permanently with flashcards, quizzes, and AI chat

Chapters

Traditional AI training methods (pre-training, fine-tuning, RL) require vast amounts of data (trillions of tokens, thousands of examples).
As AI tackles more complex problems, sample efficiency (learning from fewer examples) becomes a critical bottleneck, especially in domains with limited data.
Real-world applications and tool integrations can be slow or expensive, further exacerbating the sample inefficiency problem.
Current reinforcement learning methods lose valuable information by only using binary reward signals, ignoring detailed traces of thought, tool calls, and error messages.

Understanding the limitations of current AI training methods highlights the need for more efficient learning paradigms, especially for real-world applications where data is scarce or interactions are costly.

Training an AI that must wait for a slow physical action to complete, where the environment itself becomes the bottleneck, regardless of model speed or GPU power.

Jeppa proposes 'reflective optimization' where AI reflects on its own past actions and feedback, not just numerical rewards.
The AI analyzes detailed traces of its rollouts (thoughts, tool calls, errors) to diagnose failures and learn.
Instead of solely updating model weights, Jeppa can update the AI's system prompt, which can induce significant behavioral changes with natural language instructions.
This allows learning from as few as one rollout by correcting mistakes and refining the prompt.

This approach fundamentally changes how AI learns by utilizing rich textual feedback, enabling faster adaptation and improvement with significantly less data.

If a code generation AI receives a compiler error about an unavailable API, it can reflect on this error, update its prompt to avoid that API, and learn to use an alternative.

Jeppa uses a genetic algorithm where prompts are treated as 'genes' that are mutated and selected.
It employs a multi-objective selection strategy using a Pareto frontier to balance exploration and exploitation.
A scoring matrix tracks prompt performance across validation items, identifying the best prompts for each task.
The system iteratively selects prompts from the Pareto frontier, runs them on dev examples, reflects on feedback, and updates the prompt pool.

The Pareto frontier approach is crucial for avoiding local optima and ensuring diverse exploration of strategies, leading to more robust and higher-performing prompts.

Instead of a greedy approach getting stuck after finding one improved prompt, Jeppa maintains multiple good prompts on the Pareto frontier, allowing it to explore diverse strategies and avoid getting trapped in suboptimal solutions.

Jeppa achieves significant performance improvements with far fewer rollouts compared to state-of-the-art methods like GRPO.
It automates prompt engineering, a process that can take weeks for human teams, by discovering latent task specifications and edge cases.
Jeppa can optimize proprietary, black-box models, improving their performance even beyond their original capabilities.
It demonstrates remarkable sample efficiency, optimizing LLMs for novel hardware accelerators with minimal initial training data.

Jeppa offers a powerful, automated way to enhance AI performance, making advanced AI capabilities accessible even in data-scarce environments and for proprietary models.

Jeppa optimized GPT-4.1 Mini to outperform the full GPT-4.1 on a live benchmark by discovering and incorporating specific strategies into its prompt, such as avoiding a particular library (adf.h) that was incompatible with the target hardware.

Jeppa's 'Optimize Anything' API extends reflective optimization to any text artifact, not just prompts.
This includes optimizing code, agent architectures, numerical parameters, and even policy optimization for data centers.
The core idea is to use actionable side information (like compiler traces, gradients, SLA violations) as textual feedback to guide optimization.
It offers modes for generalization, single-task optimization, and multi-task optimization, adapting to different goals.

This framework democratizes optimization by framing diverse problems as text optimization tasks, allowing AI to solve complex, non-differentiable problems efficiently.

Optimizing a CUDA kernel for performance by providing the kernel code as text, and using compiler errors and profiler traces as actionable side information to guide Jeppa's improvements.

Jeppa can automatically design and optimize agent architectures, including control flow, prompts, and multi-agent interactions.
It automates the discovery of complex agent pipelines that significantly outperform simpler designs.
The 'fast slow training' paradigm combines Jeppa's prompt/context optimization (fast loop) with traditional RL weight updates (slow loop) for more robust learning.
This hybrid approach mitigates issues like catastrophic forgetting in weight updates and performance plateaus in prompt optimization alone.

By automating agent design and integrating prompt optimization with weight updates, this approach pushes the boundaries of AI performance and learning efficiency.

Jeppa automatically designed a complex agent architecture for a difficult benchmark, tripling its performance by discovering a multi-step process involving rule induction, code synthesis, and a debugging loop, all without human intervention.

Jeppa's principles apply beyond text models to multimodal and VLM models, improving tasks like OCR and medical diagnosis.
It works across a wide range of model scales, from small 1B parameter models to large frontier models, often achieving significant cost reductions.
Jeppa can optimize subjective tasks by using LLMs as judges trained on human annotations, creating a data flywheel for continuous improvement.
The core insight is that as models improve their instruction-following capabilities, precise textual specifications become increasingly critical for unlocking their full potential.

Jeppa's versatility and effectiveness across diverse AI tasks and models suggest it's a foundational technique for future AI development and deployment.

Using Jeppa to optimize agent skills (markdown files) for a Go coding repository, dramatically increasing problem-solving rates and enabling even powerful models like Claude Code to achieve near-perfect performance with reduced runtime.

Key takeaways

1AI training is increasingly bottlenecked by sample efficiency, necessitating methods that learn effectively from limited data.
2Reflective optimization, as implemented by Jeppa, leverages rich textual feedback (errors, traces) to enable AI to learn and improve autonomously.
3Updating system prompts can be a highly effective way to induce significant behavioral changes in LLMs, often more efficiently than weight updates.
4Jeppa's Pareto frontier approach ensures diverse exploration of optimization strategies, preventing local optima and leading to more robust solutions.
5The 'Optimize Anything' framework extends reflective optimization to various text-based artifacts, enabling AI to tackle complex, non-differentiable problems.
6Combining prompt/context optimization with weight updates (fast slow training) offers a powerful paradigm for overcoming the limitations of each individual method.
7As AI models improve instruction following, precise textual specifications and prompt optimization become even more critical for maximizing performance.

Key terms

Reflective OptimizationSample EfficiencyJeppaSystem PromptPareto FrontierGenetic AlgorithmActionable Side InformationOptimize AnythingFast Slow TrainingPrompt Engineering

Test your understanding

1How does Jeppa's approach to learning from AI rollouts differ from traditional reinforcement learning?
2Explain the role of the Pareto frontier in Jeppa's optimization process and why it is important for exploration.
3What does the 'Optimize Anything' framework allow AI to optimize beyond just system prompts?
4How does the 'fast slow training' paradigm combine different learning mechanisms to improve AI training?
5Why is prompt optimization expected to remain crucial even as AI models become more capable?