Teaching AI to Generate Reviewable Code
Master prompting strategies that produce focused diffs, feed project context effectively, and specify version constraints upfront.
Introduction: Why Teaching AI Matters More Than Prompting It
You've probably experienced this: you ask an AI to generate some code, it spits out 200 lines that look reasonable, and then you spend an hour trying to figure out if it's actually correct. Maybe it works on the first run. Maybe it passes your tests. But something feels off—the structure is unfamiliar, the naming conventions don't match your codebase, and there's a subtle logic error buried in a nested conditional that you almost missed. By the time you finish reviewing and refactoring, you wonder if writing it yourself would have been faster. If this sounds familiar, you're not alone, and these free flashcards throughout this lesson will help you master a better approach.
The uncomfortable truth is that we're entering a world where most code will be generated by AI. Not some code. Not boilerplate or scaffolding. Most code. GitHub's data already shows that developers using Copilot accept AI suggestions for 30-40% of their code, and that number climbs every quarter. But here's what the data doesn't capture: the hidden cost of reviewing that code, the technical debt from accepting suggestions that "work" but don't fit, and the exponential complexity that builds when you treat AI as a magic box instead of a tool you can train.
The paradigm shift happening right now isn't about AI replacing developers—it's about prompting versus teaching, and understanding this distinction will determine whether you thrive or struggle in the next five years of software development.
The Prompting Trap: Why Ad-Hoc Requests Create Chaos
Most developers interact with AI code generators through ad-hoc prompting—one-off requests treated as isolated transactions. You need a function to parse CSV files, so you ask: "Write a Python function to parse CSV files." The AI generates something. You use it. Tomorrow, you need to parse JSON. You ask again. The AI generates something with a completely different error handling approach, different naming patterns, and different assumptions about logging.
This is the prompting trap: each interaction is a fresh start, disconnected from every other interaction. The AI has no memory of your preferences, your codebase's conventions, or the architectural decisions you made yesterday. It's like hiring a new contractor for every single task in building a house—one person does the framing in imperial measurements, another does the electrical in metric, and a third uses entirely different building codes.
Here's what ad-hoc prompting produces:
## First request: "Generate a function to fetch user data"
def getUserData(userId):
response = requests.get(f"https://api.example.com/users/{userId}")
return response.json()
## Second request: "Generate a function to fetch product data"
def fetch_product_information(product_identifier: str) -> dict:
"""
Retrieves product information from the API.
Args:
product_identifier: The unique product ID
Returns:
Dictionary containing product details
"""
try:
response = requests.get(
f"https://api.example.com/products/{product_identifier}",
timeout=5
)
response.raise_for_status()
return response.json()
except requests.exceptions.RequestException as e:
logger.error(f"Failed to fetch product: {e}")
return {}
Notice the chaos? The first function uses camelCase, no type hints, no error handling, and no timeout. The second uses snake_case, full type annotations, comprehensive error handling, logging, and timeouts. Both "work," but reviewing code like this is exhausting. You're not just checking logic—you're constantly context-switching between different patterns, mentally translating between styles, and making architectural decisions about which approach to keep.
🤔 Did you know? Research from Microsoft shows that code review time increases exponentially, not linearly, with stylistic inconsistency. A codebase with three different error-handling patterns takes 4-7 times longer to review than one with a single consistent approach.
From Prompting to Teaching: A Fundamental Shift
Now consider a different approach: systematic AI training. Instead of treating each prompt as an isolated request, you teach the AI your requirements upfront. You establish patterns, constraints, and conventions that persist across every generation. You're not asking for "a function"—you're teaching the AI how your team writes functions.
This is the difference between:
- ❌ "Generate a function to validate email addresses"
- ✅ "Following our established pattern of input validation (explicit error types, no exceptions for flow control, return Result types), generate a function to validate email addresses"
The second approach front-loads context that makes every subsequent generation more reviewable. But here's the crucial insight: this isn't just about longer prompts. It's about building a teaching system—a repeatable framework where you progressively train the AI to understand your codebase's DNA.
💡 Mental Model: Think of ad-hoc prompting like giving directions one turn at a time: "Turn left. Now go straight. Turn right." Teaching AI is like giving someone a map and explaining how to read it: once they understand the system, they can navigate independently while staying aligned with your expectations.
Reviewability: The New Measure of Code Quality
In a world where you write most code yourself, quality means correctness, maintainability, and performance. In a world where AI generates most code, quality means something different: reviewability. Can you quickly and confidently verify that this generated code is correct? Can you spot errors without mentally executing every branch? Can you understand the intent and implications in seconds, not minutes?
Reviewability becomes the primary quality metric because it determines the bottleneck in your workflow. If AI can generate code in 10 seconds but you need 30 minutes to review it, the generation speed is irrelevant. The system is only as fast as the review process.
Here's what makes code reviewable:
🎯 Key Principle: Reviewable code minimizes the cognitive load required to verify correctness. It makes errors obvious and correct behavior self-evident.
Consider these two implementations of the same logic:
## Low reviewability - requires mental execution to verify
def process_order(order, user):
if order['status'] != 'pending':
return False
if not user['verified']:
if order['amount'] > 100:
return False
if len([i for i in order['items'] if i['stock'] > 0]) != len(order['items']):
return False
if user['payment_methods']:
order['status'] = 'processing'
return True
return False
## High reviewability - errors are obvious, logic is transparent
def process_order(order: Order, user: User) -> OrderProcessingResult:
"""Process an order if it meets all requirements."""
if not order.is_pending():
return OrderProcessingResult.already_processed(order.status)
if not user.can_place_order(order.amount):
return OrderProcessingResult.user_not_authorized()
if not order.all_items_in_stock():
return OrderProcessingResult.insufficient_stock()
if not user.has_payment_method():
return OrderProcessingResult.no_payment_method()
order.mark_as_processing()
return OrderProcessingResult.success()
The second version is immediately reviewable because:
- 🧠 Named predicates replace complex conditions ("is_pending" vs "status != 'pending'")
- 🔍 Explicit return types document all possible outcomes
- 📖 Linear flow eliminates nested logic that requires mental stack management
- 🎯 Domain language makes business logic transparent ("user can place order" vs checking verification and amount)
When you teach AI to generate the second style, your review time drops from minutes to seconds. You're not mentally executing code—you're scanning for deviations from expected patterns.
The Economic Reality: Review Efficiency as Job Security
Let's talk about the elephant in the room: will AI replace developers? The answer is nuanced, but here's the part that matters for your career: developers who can't review AI-generated code efficiently are already becoming less valuable.
Think about the economics. If Developer A can review AI-generated code at 500 lines per hour with high confidence, and Developer B reviews at 100 lines per hour with low confidence, which one delivers more value? When AI generates 80% of the code, the productivity difference between these developers is 5x, not 20%. The bottleneck shifts from writing speed to review speed.
But here's the insight most people miss: review efficiency isn't about reading faster. You can't skim-read your way to 5x productivity. Review efficiency comes from getting AI to generate code that's inherently easier to review—and that requires teaching the AI upfront.
💡 Real-World Example: A senior engineer at a fintech company told me they now spend 70% of their time reviewing AI-generated code and 30% writing code directly. But through systematic AI training, they reduced review time per pull request from 45 minutes to 12 minutes. That's a 4x productivity improvement, achieved not through faster reviewing but through teaching AI to generate more reviewable code.
The developers who survive and thrive in this transition are those who master the teaching-reviewing feedback loop: they invest time upfront to train AI, which reduces review burden, which creates time to refine their teaching approach, which further reduces review burden. It's a compounding advantage.
Developers who skip the teaching step and rely on ad-hoc prompting end up in a different loop: chaotic generation requires heavy review, which consumes all available time, leaving no time to improve the process, which perpetuates chaotic generation. This is a death spiral for productivity.
❌ Wrong thinking: "I don't have time to teach AI my requirements; I'll just review whatever it generates."
✅ Correct thinking: "Teaching AI my requirements is an investment that pays exponential dividends in reduced review time."
The Teaching-Reviewing Feedback Loop
The core skill for surviving—and thriving—as a developer in an AI-augmented world is mastering the teaching-reviewing feedback loop. This isn't a one-time setup; it's a continuous cycle where each iteration improves both the AI's generations and your ability to review them.
Here's how the loop works:
┌─────────────────────────────────────────────────────────┐
│ │
│ 1. TEACH: Specify constraints, patterns, requirements │
│ │ │
│ ▼ │
│ 2. GENERATE: AI produces code following specifications │
│ │ │
│ ▼ │
│ 3. REVIEW: Evaluate code against reviewability metrics │
│ │ │
│ ▼ │
│ 4. FEEDBACK: Identify where specifications were │
│ insufficient or unclear │
│ │ │
│ ▼ │
│ 5. REFINE: Update teaching approach based on gaps │
│ │ │
│ │ │
│ └──────────────┐ │
│ │ │
└──────────────────────────────────────────┼──────────────┘
│
▼
[REPEAT CYCLE]
Each cycle through this loop:
- 📈 Reduces review time as AI learns your requirements
- 🎯 Improves generation accuracy through refined specifications
- 🧠 Builds pattern recognition in your reviewing ability
- 📚 Creates reusable context for future projects
The developers who master this loop gain a compounding advantage. After six months, they've built a mental and documented framework for teaching AI that makes every new generation more reviewable. After a year, their review speed is 10x what it was at the start—not because they review faster, but because the code they're reviewing is fundamentally more reviewable.
💡 Pro Tip: Track your review time per AI-generated pull request over time. If this number isn't decreasing, you're prompting, not teaching. A good target: 50% reduction in review time within the first month of systematic teaching.
⚠️ Common Mistake: Treating teaching as a one-time setup.
Many developers create a "prompt template" once and expect it to solve all problems. But effective teaching is iterative. Your requirements evolve as you encounter edge cases. Your understanding of reviewability deepens through experience. The teaching-reviewing feedback loop must be continuous, not a one-time configuration.
Why Front-Loading Context Prevents Technical Debt
Here's a counterintuitive insight: spending 10 minutes teaching AI your requirements before generation saves you hours of refactoring later. Yet most developers do the opposite—they spend 30 seconds on a prompt, then spend hours cleaning up the result.
This happens because we underestimate the exponential cost of technical debt in AI-generated code. When you accept AI code that "works but doesn't fit," you're not just creating one problem—you're creating a template that influences future generations.
Consider this scenario:
Day 1: You accept an AI-generated function that handles errors by returning None instead of raising exceptions, even though your codebase uses exceptions.
Day 5: You prompt AI to generate a function that calls the Day 1 function. The AI sees that function returns None for errors, so it generates defensive if result is None checks.
Day 10: You generate integration tests. The AI sees the None-checking pattern and generates tests that expect None returns.
Day 20: You realize this pattern has proliferated across 15 files. Refactoring would require changing functions, call sites, and tests. You decide it's not worth it.
This is how technical debt compounds. The initial shortcut creates a pattern that influences subsequent generations, which reinforces the pattern, which makes it harder to refactor. Within weeks, you have a parallel architecture that violates your team's conventions.
🎯 Key Principle: Front-loading context prevents exponential debt. Teaching AI upfront is not extra work—it's avoiding the exponential cost of cleaning up inconsistent generations later.
The Five Levels of AI Interaction
To understand where you are and where you need to go, it helps to see the progression of AI interaction skills:
📋 Quick Reference Card: Five Levels of AI Code Generation Mastery
| Level | 🎯 Approach | ⏱️ Review Time | 📊 Output Quality | 🔄 Consistency |
|---|---|---|---|---|
| 1. Naive Prompting | "Generate a function to..." | 30-60 min | Unpredictable | Random |
| 2. Detailed Prompting | Specify language, libraries, approach | 20-40 min | Better but inconsistent | Low |
| 3. Pattern Specification | Include structural and style requirements | 10-20 min | Good with deviations | Medium |
| 4. Systematic Teaching | Reusable context with examples | 5-10 min | High, reviewable | High |
| 5. Feedback Loop Mastery | Continuous refinement system | 2-5 min | Excellent, predictable | Very High |
Most developers are at Level 2. They've learned that detailed prompts produce better results than vague ones. But the jump from Level 2 to Level 4 is where the real transformation happens—where you shift from prompting to teaching.
Level 3 introduces pattern specification: "Use snake_case, return explicit Result types, handle errors without exceptions." This is better, but still requires you to remember and type these requirements for every generation.
Level 4 builds systematic teaching: you create reusable context (templates, examples, documented patterns) that you can reference across sessions. The AI doesn't just know what to do—it understands why you have these requirements.
Level 5 is mastery: you've internalized the feedback loop, continuously refining your teaching approach based on review discoveries. You can generate complex, multi-file implementations and review them confidently in minutes because you've trained the AI to generate in your team's "voice."
💡 Mental Model: Think of these levels like learning a language. Level 2 is using a phrasebook (memorized requests). Level 4 is having conversational fluency (teaching grammar rules). Level 5 is thinking directly in the language (intuitive, continuous refinement).
What Makes This Hard (And Why It Matters)
If teaching AI is so valuable, why don't more developers do it? Because it requires skills that weren't central to traditional development:
- 🎯 Explicit articulation of implicit knowledge: You need to explain conventions you follow instinctively
- 🧠 Pattern recognition and abstraction: You must identify why certain code is more reviewable
- 🔄 Systematic iteration: You must treat teaching as a process, not a one-time task
- 📊 Meta-cognition about review: You must understand your own review process to optimize it
These are fundamentally different skills from writing algorithms or debugging systems. Many senior developers struggle because their expertise is implicit—they can write great code but can't easily explain the principles they follow.
But here's why this matters: these skills are becoming more valuable than pure coding ability. A junior developer who can effectively teach AI and efficiently review generated code may deliver more value than a senior developer who writes better code but can't scale through AI.
This is the uncomfortable transition we're in. The skills that made you successful as a developer five years ago aren't enough. You need to add systematic AI teaching and efficient reviewing to your toolkit—not as nice-to-haves, but as core competencies that determine your productivity and career trajectory.
⚠️ Common Mistake: Assuming good developers automatically become good AI teachers.
Being able to write clean code doesn't mean you can teach AI to write clean code. These are separate skills. Many excellent developers struggle with AI because they've never had to explicitly articulate their decision-making process. The good news: this is a learnable skill, and this lesson will show you how.
The Path Forward
The rest of this lesson will give you a systematic framework for teaching AI to generate reviewable code. We'll cover:
- 🔍 The anatomy of reviewability: What specific characteristics make code quick to review
- 🎓 Constraint specification: How to communicate your requirements effectively
- 🔄 The feedback loop: How to refine AI's generation through progressive teaching
- ⚠️ Common pitfalls: What mistakes compound review complexity
- 🛠️ Building your system: Creating a reusable teaching framework
But before we dive into techniques, understand this: the goal isn't to make AI generate perfect code. The goal is to make AI generate code you can confidently review in minutes instead of hours. Reviewability, not perfection, is the target.
Your survival as a developer doesn't depend on being better than AI at generating code. It depends on being better than other developers at teaching AI to generate reviewable code, and then reviewing that code efficiently. This is the new competitive advantage.
The developers who treat AI as a magic box—prompting ad-hoc and hoping for the best—will drown in review backlog and technical debt. The developers who build systematic teaching approaches will achieve productivity levels that seem impossible today.
Which developer will you be? The rest of this lesson gives you the tools to choose wisely.
The Anatomy of Reviewable AI-Generated Code
When you review code written by a junior developer, you know what to look for: unclear variable names, tangled logic, missing context. But when you review AI-generated code, you're facing a different challenge entirely. AI doesn't get tired or distracted—it generates exactly what its training and your instructions tell it to generate. The question isn't whether AI can write good code, but whether you can teach it to write code you can actually review.
The difference is critical. A human developer learns from code reviews and gradually improves. An AI system will happily generate the same problematic patterns forever unless you explicitly define what "reviewable" means in terms it can operationalize. This section establishes those definitions and shows you how to recognize reviewable versus non-reviewable code before it becomes technical debt.
Understanding Reviewability as a Spectrum
Reviewability isn't binary—it's a spectrum from "instantly comprehensible" to "requires archaeological expedition." When you review code, you're performing several cognitive tasks simultaneously: understanding intent, verifying correctness, assessing maintainability, and checking for edge cases. Reviewable code minimizes the cognitive load for each of these tasks.
AI-generated code occupies a peculiar position on this spectrum. It's often syntactically perfect and functionally correct, yet somehow harder to review than equivalent human code. Why? Because AI optimizes for the prompt, not for the reviewer. It might generate a perfectly working 50-line function when five smaller functions would be easier to verify. It might choose technically accurate but obscure variable names. It might implement the letter of your requirement while missing the spirit.
🎯 Key Principle: Reviewability is about cognitive efficiency. Code is reviewable when a competent developer can verify its correctness and understand its implications in minimal time with minimal context switching.
The Five Pillars of Reviewable Code
Every piece of reviewable code—whether human or AI-generated—rests on five foundational pillars. Understanding these pillars allows you to communicate clear quality standards to AI systems.
Pillar 1: Structure (Cognitive Chunking)
Structure refers to how code is organized into logical units that match human cognitive limits. Research shows we can hold about 7±2 items in working memory. Reviewable code respects this limitation.
Consider this AI-generated function:
def process_order(order_data):
# Validate, calculate, update inventory, send notifications, log
if not order_data or 'items' not in order_data or 'customer_id' not in order_data:
return {'error': 'Invalid order data'}
total = 0
for item in order_data['items']:
if item['id'] not in inventory:
return {'error': f'Item {item["id"]} not found'}
if inventory[item['id']]['quantity'] < item['quantity']:
return {'error': f'Insufficient stock for {item["id"]}'}
total += inventory[item['id']]['price'] * item['quantity']
inventory[item['id']]['quantity'] -= item['quantity']
discount = 0
if order_data['customer_id'] in premium_customers:
discount = total * 0.1
final_total = total - discount
send_email(order_data['customer_id'], f'Order confirmed: ${final_total}')
send_sms(order_data['customer_id'], 'Order received')
log_order(order_data, final_total)
return {'success': True, 'total': final_total}
This function works, but it's difficult to review because it performs validation, calculation, inventory management, and notifications all in one scope. Now consider the structured version:
def process_order(order_data):
"""Process customer order through validation, pricing, and fulfillment."""
validation_result = validate_order_data(order_data)
if validation_result['error']:
return validation_result
pricing_result = calculate_order_total(order_data)
if pricing_result['error']:
return pricing_result
inventory_result = reserve_inventory(order_data['items'])
if inventory_result['error']:
rollback_inventory(order_data['items'])
return inventory_result
notify_customer(order_data['customer_id'], pricing_result['total'])
log_order_completion(order_data, pricing_result['total'])
return {'success': True, 'total': pricing_result['total']}
def validate_order_data(order_data):
"""Verify order contains required fields."""
if not order_data:
return {'error': 'Order data is empty'}
if 'items' not in order_data:
return {'error': 'No items in order'}
if 'customer_id' not in order_data:
return {'error': 'No customer ID provided'}
return {'error': None}
def calculate_order_total(order_data):
"""Calculate final order total including discounts."""
subtotal = sum(
inventory[item['id']]['price'] * item['quantity']
for item in order_data['items']
)
discount = apply_customer_discount(order_data['customer_id'], subtotal)
return {'error': None, 'total': subtotal - discount}
The structured version is longer but dramatically more reviewable. Each function has a single responsibility, making it easy to verify correctness. You can review validate_order_data in seconds because it only does one thing.
💡 Mental Model: Think of structured code like a well-organized filing cabinet. You can find what you need without reading everything. Unstructured code is like a pile of papers—technically it contains the same information, but you must read it all to understand any of it.
Pillar 2: Naming (Semantic Clarity)
Naming is how code communicates intent without requiring the reviewer to execute it mentally. AI systems often generate technically accurate but semantically weak names because they optimize for uniqueness rather than clarity.
Consider these AI-generated variable names for a payment processing system:
// AI's first attempt
const val1 = getUserInput();
const val2 = processPaymentData(val1);
const res = validateTransaction(val2);
if (res.ok) {
const final = applyFee(val2.amt);
submitToGateway(final);
}
These names force the reviewer to mentally track what each variable represents. Now compare:
// After teaching AI about naming standards
const userPaymentInput = getUserInput();
const normalizedPaymentData = processPaymentData(userPaymentInput);
const validationResult = validateTransaction(normalizedPaymentData);
if (validationResult.isValid) {
const amountWithProcessingFee = applyFee(normalizedPaymentData.amount);
submitToGateway(amountWithProcessingFee);
}
The second version requires no mental translation. Each name describes what the variable contains and often hints at its lifecycle stage (input → normalized → validated → final).
⚠️ Common Mistake: Accepting AI's first-pass naming because "you can understand it if you read the whole function." In a review, you shouldn't need to read everything to understand one part. ⚠️
🎯 Key Principle: Names should reveal intent, not just describe type. paymentAmount is better than amount; validatedPaymentAmount is even better because it reveals state.
Pillar 3: Granularity (Unit of Change)
Granularity determines the size of reviewable units. AI systems often generate monolithic solutions because they receive requirements as single prompts. This creates code that's difficult to review incrementally.
Think about reviewing a 500-line function versus reviewing ten 50-line functions. The total lines are identical, but the cognitive load differs dramatically. With smaller units:
- You can verify each piece independently
- You can test each piece in isolation
- You can understand the system incrementally
- Changes affect smaller blast radiuses
Here's how granularity manifests in practice:
// Poor granularity - one function does everything
function handleUserRegistration(userData: any) {
// 200 lines of validation, database operations,
// email sending, analytics tracking, etc.
}
// Good granularity - composed operations
function handleUserRegistration(userData: UserRegistrationInput): RegistrationResult {
const validationResult = validateRegistrationData(userData);
if (!validationResult.isValid) {
return { success: false, errors: validationResult.errors };
}
const user = createUserAccount(validationResult.sanitizedData);
sendWelcomeEmail(user.email, user.firstName);
trackRegistrationEvent(user.id, userData.source);
return { success: true, userId: user.id };
}
The granular version allows you to review each operation's correctness independently. If there's a bug in email sending, you don't need to re-review validation logic.
💡 Pro Tip: When teaching AI to generate code, specify maximum function lengths (e.g., "no function should exceed 50 lines"). This forces the AI to decompose problems appropriately.
Pillar 4: Testability (Verification Pathway)
Testability measures how easily you can verify code behavior. Reviewable code makes testing obvious; non-reviewable code makes you wonder how to test it. AI-generated code often lacks testability because AI doesn't automatically consider the reviewer's verification needs.
Compare these approaches:
## Difficult to test - hidden dependencies
def apply_discount(cart):
now = datetime.now() # Hidden time dependency
if now.hour >= 18: # Happy hour
discount = 0.2
else:
discount = 0.1
if random.random() < 0.1: # Random flash discount
discount += 0.05
for item in cart.items:
item.price *= (1 - discount)
return cart
## Easy to test - explicit dependencies
def apply_discount(cart, current_time, random_value):
"""Apply time-based and random discounts to cart.
Args:
cart: Shopping cart with items
current_time: datetime for time-based discount calculation
random_value: float between 0-1 for flash discount determination
"""
base_discount = 0.2 if current_time.hour >= 18 else 0.1
flash_discount = 0.05 if random_value < 0.1 else 0.0
total_discount = base_discount + flash_discount
for item in cart.items:
item.price *= (1 - total_discount)
return cart
The testable version exposes dependencies as parameters. During review, you can immediately see what affects behavior and mentally verify the logic. You can also write deterministic tests.
🤔 Did you know? Code that's difficult to test is often difficult to review because testing and reviewing both require understanding inputs, outputs, and transformations. Making code testable automatically makes it more reviewable.
Pillar 5: Documentation (Context Preservation)
Documentation in reviewable code isn't about verbose comments—it's about preserving the why when the what isn't obvious. AI-generated code often includes either no documentation or excessive documentation that states the obvious.
Consider this spectrum:
## Under-documented (what's the business rule?)
def calc(a, b):
if b > 100:
return a * 0.9
return a
## Over-documented (states the obvious)
def calculate_price(base_price, quantity):
# Check if quantity is greater than 100
if quantity > 100:
# Multiply base price by 0.9 to get discounted price
discounted_price = base_price * 0.9
# Return the discounted price
return discounted_price
# Return the base price without discount
return base_price
## Appropriately documented (explains the why)
def calculate_price(base_price, quantity):
"""Calculate final price with bulk discount.
Business rule: Orders of 100+ units receive 10% discount.
Defined in pricing policy doc: /docs/pricing-2024.md
"""
BULK_THRESHOLD = 100
BULK_DISCOUNT = 0.10
if quantity >= BULK_THRESHOLD:
return base_price * (1 - BULK_DISCOUNT)
return base_price
The appropriately documented version preserves context (the business rule origin) and makes magic numbers self-explanatory through named constants. During review, you can verify that the implementation matches the documented business rule.
💡 Real-World Example: A developer reviewing AI-generated code for a financial system couldn't verify correctness because the AI had implemented a complex calculation without documenting which regulation required it. The code was syntactically perfect but unreviewable because the reviewer couldn't confirm it was right without reverse-engineering the requirements.
How AI-Generated Code Differs in Review Complexity
AI-generated code presents unique review challenges that differ from human-written code. Understanding these differences helps you teach AI to compensate for its blind spots.
The Confidence Problem: Human developers signal uncertainty through code structure. They might add extra comments, use cautious variable names, or structure code conservatively when they're unsure. AI generates everything with equal confidence, making it harder to identify areas that need extra scrutiny.
The Context Gap: Human developers write code with implicit context from team discussions, previous iterations, and domain knowledge. AI generates code from explicit prompts only, potentially missing crucial context that "everyone knows" but nobody stated.
The Pattern Mixing Issue: AI has seen millions of code patterns and might blend idioms from different paradigms, frameworks, or eras in ways that are individually correct but collectively confusing. For example:
// AI mixing callback, promise, and async/await patterns
function fetchUserData(userId, callback) {
return new Promise(async (resolve, reject) => {
try {
const response = await fetch(`/api/users/${userId}`);
const data = await response.json();
callback(null, data); // Callback pattern
resolve(data); // Promise pattern
} catch (error) {
callback(error, null);
reject(error);
}
});
}
This code technically works but mixes three different async patterns, creating confusion during review. A human developer would typically stick to one pattern.
⚠️ Common Mistake: Assuming AI-generated code is correct because it runs without errors. AI optimizes for execution, not correctness, maintainability, or reviewability. ⚠️
Pattern Recognition: Identifying Problem Code Before Generation
Experienced developers learn to spot code that will be difficult to review before they fully read it. Similarly, you can learn to predict when AI will generate hard-to-review code based on your prompt and initial output.
Warning Sign 1: Unclear Boundaries
When your prompt doesn't specify clear function boundaries, AI generates monolithic solutions:
❌ Problematic prompt:
"Create a user authentication system"
✅ Reviewable prompt:
"Create a user authentication system with separate functions for:
- Password validation (returns boolean)
- Token generation (returns JWT string)
- Session creation (returns session object)
- Each function should be under 30 lines"
Warning Sign 2: Implicit Requirements
When requirements are implicit, AI fills gaps with assumptions:
❌ Problematic prompt:
"Add error handling to this function"
✅ Reviewable prompt:
"Add error handling that:
- Catches network errors separately from validation errors
- Logs errors with structured data (timestamp, user_id, error_type)
- Returns user-friendly error messages
- Preserves stack traces for debugging"
Warning Sign 3: Missing Constraints
Without constraints, AI optimizes for completeness over reviewability:
❌ Problematic prompt:
"Implement data validation for user profiles"
✅ Reviewable prompt:
"Implement data validation for user profiles:
- One validation function per field (email, phone, age, etc.)
- Each validator returns {isValid: boolean, error: string}
- Maximum cyclomatic complexity of 3 per function
- Include JSDoc with examples of valid/invalid inputs"
💡 Mental Model: Think of AI as an extremely literal contractor. If you don't specify how you want the house organized internally, you'll get a structure that technically meets building codes but is impossible to navigate.
Comparative Analysis: Reviewable vs. Non-Reviewable Outputs
Let's examine the same requirement implemented in ways that illustrate the full spectrum of reviewability. The requirement: "Create a function that processes user feedback, stores it in a database, and sends a confirmation email."
Non-Reviewable Implementation:
def process_feedback(feedback_data):
# Process user feedback
import sqlite3
import smtplib
from email.mime.text import MIMEText
conn = sqlite3.connect('app.db')
c = conn.cursor()
# Validate and clean data
if not feedback_data or not isinstance(feedback_data, dict):
return False
user_id = feedback_data.get('user_id', 0)
if user_id <= 0:
return False
text = feedback_data.get('text', '').strip()
if len(text) < 10 or len(text) > 5000:
return False
rating = feedback_data.get('rating', 0)
if rating < 1 or rating > 5:
rating = 3
# Store in database
try:
c.execute("INSERT INTO feedback (user_id, text, rating, created_at) VALUES (?, ?, ?, datetime('now'))",
(user_id, text, rating))
conn.commit()
feedback_id = c.lastrowid
except Exception as e:
conn.rollback()
return False
finally:
conn.close()
# Send email
c2 = sqlite3.connect('app.db').cursor()
c2.execute("SELECT email, name FROM users WHERE id = ?", (user_id,))
user_data = c2.fetchone()
if user_data:
msg = MIMEText(f"Thanks for your feedback! We received your {rating}-star rating and will review your comments.")
msg['Subject'] = 'Feedback Received'
msg['From'] = 'noreply@example.com'
msg['To'] = user_data[0]
try:
smtp = smtplib.SMTP('localhost')
smtp.send_message(msg)
smtp.quit()
except:
pass # Email failure shouldn't stop processing
return True
Problems that make this hard to review:
- 🔴 Imports buried inside function
- 🔴 Multiple responsibilities (validation, storage, email)
- 🔴 Hardcoded values scattered throughout
- 🔴 Silent failures (email exception swallowed)
- 🔴 Database connections created twice
- 🔴 Magic numbers (10, 5000, 1, 5) without explanation
- 🔴 Impossible to test without real database and SMTP server
Reviewable Implementation:
from dataclasses import dataclass
from typing import Optional
import logging
logger = logging.getLogger(__name__)
@dataclass
class FeedbackValidationResult:
is_valid: bool
error_message: Optional[str] = None
sanitized_data: Optional[dict] = None
class FeedbackProcessor:
"""Handles user feedback processing workflow.
Business rules:
- Feedback must be 10-5000 characters (UX requirement)
- Rating defaults to 3 if invalid (per product team)
- Email failures are logged but don't fail the operation
"""
MIN_FEEDBACK_LENGTH = 10
MAX_FEEDBACK_LENGTH = 5000
MIN_RATING = 1
MAX_RATING = 5
DEFAULT_RATING = 3
def __init__(self, db_repository, email_service):
self.db = db_repository
self.email = email_service
def process_feedback(self, feedback_data: dict) -> bool:
"""Process user feedback through validation, storage, and notification."""
validation_result = self._validate_feedback(feedback_data)
if not validation_result.is_valid:
logger.warning(f"Invalid feedback: {validation_result.error_message}")
return False
feedback_id = self._store_feedback(validation_result.sanitized_data)
if not feedback_id:
logger.error("Failed to store feedback")
return False
self._send_confirmation_email(validation_result.sanitized_data)
return True
def _validate_feedback(self, data: dict) -> FeedbackValidationResult:
"""Validate feedback data and return sanitized version."""
if not data or not isinstance(data, dict):
return FeedbackValidationResult(False, "Invalid data format")
user_id = data.get('user_id', 0)
if user_id <= 0:
return FeedbackValidationResult(False, "Invalid user ID")
text = data.get('text', '').strip()
if not (self.MIN_FEEDBACK_LENGTH <= len(text) <= self.MAX_FEEDBACK_LENGTH):
return FeedbackValidationResult(
False,
f"Feedback must be {self.MIN_FEEDBACK_LENGTH}-{self.MAX_FEEDBACK_LENGTH} characters"
)
rating = data.get('rating', self.DEFAULT_RATING)
if not (self.MIN_RATING <= rating <= self.MAX_RATING):
rating = self.DEFAULT_RATING
sanitized = {
'user_id': user_id,
'text': text,
'rating': rating
}
return FeedbackValidationResult(True, None, sanitized)
def _store_feedback(self, feedback_data: dict) -> Optional[int]:
"""Store feedback in database and return feedback ID."""
try:
return self.db.create_feedback(feedback_data)
except Exception as e:
logger.error(f"Database error: {e}", exc_info=True)
return None
def _send_confirmation_email(self, feedback_data: dict) -> None:
"""Send confirmation email to user (best-effort, failures logged)."""
try:
user = self.db.get_user(feedback_data['user_id'])
if user:
self.email.send_feedback_confirmation(
to_email=user.email,
user_name=user.name,
rating=feedback_data['rating']
)
except Exception as e:
# Email is non-critical; log and continue
logger.warning(f"Failed to send confirmation email: {e}")
What makes this reviewable:
- ✅ Clear separation of concerns (validate, store, notify)
- ✅ Named constants explain magic numbers
- ✅ Explicit dependencies (injected repository and email service)
- ✅ Each method has single responsibility
- ✅ Error handling with appropriate logging
- ✅ Business rules documented in docstrings
- ✅ Testable without real database or email server
- ✅ Type hints clarify expectations
Establishing Your Reviewability Standards as Training Constraints
Once you understand what makes code reviewable, you can encode these standards as constraints that guide AI generation. Think of this as creating a reviewability specification that travels with every prompt.
Here's a practical template for encoding reviewability standards:
📋 Reviewability Constraints Template:
STRUCTURE:
- Maximum function length: [N] lines
- Maximum parameters per function: [N]
- Maximum nesting depth: [N]
- One responsibility per function
NAMING:
- Variables: [convention] (e.g., descriptive_snake_case)
- Functions: [convention] (e.g., verb_noun_pattern)
- Classes: [convention] (e.g., PascalCase)
- Constants: [convention] (e.g., UPPER_SNAKE_CASE)
- Avoid abbreviations except: [list]
GRANULARITY:
- Decompose operations into functions of [N-M] lines
- Each function should be testable independently
- Compose complex operations from simple ones
TESTABILITY:
- Inject dependencies (no hardcoded connections)
- Pure functions where possible
- Side effects isolated and explicit
- Return values over exceptions for business logic
DOCUMENTATION:
- Docstring for every public function
- Explain WHY for non-obvious logic
- Reference business rules/requirements
- Include examples for complex functions
Here's how you might apply this to a real prompt:
Task: Create a password reset function
Reviewability Constraints:
- Max 40 lines per function
- Separate validation, token generation, and email sending
- Name functions with verb_noun pattern (validate_email, generate_reset_token)
- Inject email service and token repository
- Document token expiration policy in docstring
- Include example of valid/invalid inputs
The AI will now generate code that satisfies both your functional requirements and your reviewability standards.
💡 Pro Tip: Create a "reviewability checklist" file that you reference in prompts: "Follow the reviewability standards in /docs/code-standards.md". This allows you to maintain consistent standards across all AI-generated code.
The Reviewability Feedback Loop
When AI generates code that doesn't meet your reviewability standards, you have an opportunity to teach it more effectively. Instead of just rejecting the code, provide specific feedback tied to the five pillars:
Your code works but isn't reviewable because:
❌ STRUCTURE: The process_order function does 5 distinct things.
→ Split into: validate_order, calculate_total, reserve_inventory, notify_customer, log_order
❌ NAMING: Variable 'val' doesn't convey meaning.
→ Rename to: validated_order_data
❌ GRANULARITY: 120-line function exceeds our 50-line maximum.
→ Decompose into smaller functions as shown above
❌ TESTABILITY: Hard-coded database connection.
→ Inject as parameter: def process_order(order_data, db_connection)
❌ DOCUMENTATION: Magic number 0.15 has no explanation.
→ Add comment: # 15% premium customer discount (pricing policy v2.1)
Please regenerate following these specific changes.
This feedback teaches the AI why something isn't reviewable and how to fix it, building a more effective model for future generations.
🎯 Key Principle: Specific feedback tied to named principles is more effective than general criticism. "This is too complex" is less useful than "This violates the granularity principle—functions should be under 50 lines."
Measuring Reviewability
While reviewability is partly subjective, you can establish quantifiable metrics that serve as guardrails:
| Metric | Target | Why It Matters |
|---|---|---|
| 🎯 Function length | < 50 lines | Limits cognitive load per unit |
| 🎯 Cyclomatic complexity | < 10 | Reduces testing paths to verify |
| 🎯 Parameter count | < 5 | Keeps mental model manageable |
| 🎯 Nesting depth | < 3 levels | Prevents cognitive stack overflow |
| 🎯 Comment ratio | 10-20% | Balances clarity vs. noise |
You can incorporate these metrics directly into your AI prompts: "Generate a function with cyclomatic complexity under 8." Many code analysis tools can verify these metrics automatically, creating a quantitative reviewability check.
The Ultimate Test: The Five-Minute Rule
The Five-Minute Rule is a practical test for reviewability: Can a competent developer unfamiliar with the code understand its purpose, verify its correctness for the happy path, and identify potential edge cases in five minutes or less?
If the answer is no, the code fails the reviewability test regardless of whether it works. This rule forces you to prioritize clarity and structure over cleverness or brevity.
When teaching AI to generate code, you can explicitly include this constraint: "Generate code that a developer can review in under 5 minutes. This means clear structure, obvious naming, and well-documented business logic."
Reviewable code isn't just easier to verify—it's easier to maintain, extend, and debug. By understanding the anatomy of reviewable code and teaching AI to generate it consistently, you transform AI from a code-generation tool into a collaborative partner that produces work you can trust. The five pillars—structure, naming, granularity, testability, and documentation—provide a common language for communicating quality standards that AI systems can operationalize.
In the next section, we'll explore how to communicate these standards through constraint specification, turning abstract principles into concrete instructions that AI can follow consistently.
Training AI Through Constraint Specification
When you ask an AI to "create a user authentication service," you'll get code. But will you get reviewable code? The difference lies not in what you ask for, but in how precisely you constrain the generation space. Think of it this way: an AI without constraints is like a brilliant but undisciplined artist—capable of greatness but prone to creating sprawling, idiosyncratic work that only they can understand.
Constraint specification is the practice of defining boundaries, patterns, and rules that guide AI code generation toward outputs you can efficiently review, understand, and maintain. This isn't about limiting AI's capabilities—it's about channeling them productively. A well-constrained AI becomes a collaborator that speaks your architectural language and respects your review capacity.
The Constraint Hierarchy: From Structure to Style
Constraints operate at multiple levels, forming a hierarchy from broad architectural decisions down to granular style choices. Understanding this hierarchy helps you apply the right constraints at the right specificity level.
Architectural Layer
↓ (High-level patterns, separation of concerns)
Structural Layer
↓ (File organization, module boundaries, function size)
Interface Layer
↓ (API design, parameter patterns, return types)
Naming Layer
↓ (Convention schemes, verbosity, domain language)
Documentation Layer
↓ (Comment density, docstring format, explanation depth)
Style Layer
↓ (Formatting, whitespace, expression complexity)
Most developers make the mistake of starting at the bottom—asking AI to "use camelCase" or "add comments"—when the real reviewability problems stem from unconstrained architectural and structural decisions. Let's work our way down this hierarchy with practical examples.
Architectural Constraints: Setting the Foundation
🎯 Key Principle: Architectural constraints define how systems are organized at the highest level. They're your first line of defense against reviewing tangled, interdependent code.
When you specify architectural patterns, you're giving AI a blueprint that inherently supports reviewability. Consider this comparison:
Unconstrained prompt:
"Create a blog post management system with database access and API endpoints."
Constrained prompt:
"Create a blog post management system using the repository pattern.
Separate concerns into three layers:
- Controllers (handle HTTP, validate input, return responses)
- Services (business logic only, no framework dependencies)
- Repositories (database access only, return domain objects)
Each layer should be in its own directory. No layer should directly
import from a layer at the same level."
The difference in reviewability is profound. The constrained version produces code you can review layer-by-layer, with clear boundaries and predictable responsibilities. Here's what the constrained output structure might look like:
## repositories/post_repository.py
from typing import List, Optional
from models.post import Post
class PostRepository:
"""Handles all database operations for Post entities."""
def __init__(self, db_connection):
self.db = db_connection
def find_by_id(self, post_id: int) -> Optional[Post]:
"""Retrieve a single post by ID."""
row = self.db.execute(
"SELECT * FROM posts WHERE id = ?",
(post_id,)
).fetchone()
return Post.from_db_row(row) if row else None
def find_published(self) -> List[Post]:
"""Retrieve all published posts."""
rows = self.db.execute(
"SELECT * FROM posts WHERE status = 'published' ORDER BY created_at DESC"
).fetchall()
return [Post.from_db_row(row) for row in rows]
## services/post_service.py
from typing import List
from repositories.post_repository import PostRepository
from models.post import Post
class PostService:
"""Business logic for post operations."""
def __init__(self, post_repository: PostRepository):
self.repository = post_repository
def get_post_for_display(self, post_id: int) -> Optional[Post]:
"""Get a post if it exists and is published."""
post = self.repository.find_by_id(post_id)
if post and post.status == 'published':
return post
return None
def list_recent_posts(self, limit: int = 10) -> List[Post]:
"""Get recent published posts with limit."""
posts = self.repository.find_published()
return posts[:limit]
## controllers/post_controller.py
from flask import jsonify, abort
from services.post_service import PostService
class PostController:
"""HTTP endpoint handlers for post resources."""
def __init__(self, post_service: PostService):
self.service = post_service
def get_post(self, post_id: int):
"""GET /posts/{id} endpoint."""
if post_id < 1:
abort(400, "Invalid post ID")
post = self.service.get_post_for_display(post_id)
if not post:
abort(404, "Post not found or not published")
return jsonify(post.to_dict())
⚠️ Common Mistake: Notice how easy it is to review each file in isolation? Without architectural constraints, AI often produces monolithic files where controllers directly access databases, mixing HTTP concerns with SQL queries. This makes reviewing the business logic nearly impossible without also reasoning about database transactions and HTTP edge cases simultaneously. ⚠️
💡 Pro Tip: Common architectural patterns that dramatically improve reviewability include: Repository Pattern, Service Layer Pattern, Hexagonal Architecture, Command Query Responsibility Segregation (CQRS), and Factory Pattern. Learn their names and constraints—they're your review-enhancing vocabulary.
Structural Constraints: Making Code Scannable
Architectural constraints define the what; structural constraints define the shape. They determine whether you can understand a module at a glance or need to maintain complex mental models while reviewing.
Key structural constraints that enhance reviewability:
🔧 Function size limits: "Keep functions under 20 lines" 🔧 Module cohesion: "Each module should have a single clear purpose" 🔧 File size caps: "Split files that exceed 200 lines into focused submodules" 🔧 Dependency rules: "No circular dependencies between modules" 🔧 Nesting limits: "Maximum 3 levels of indentation"
Let's see how these constraints transform generated code:
Before (unconstrained):
// user_manager.js - 450 lines
class UserManager {
async processUserRegistration(userData) {
// Validate email
if (!userData.email || !userData.email.includes('@')) {
const errorLog = await this.db.query('INSERT INTO error_logs...');
const adminNotification = await this.emailService.send({
to: process.env.ADMIN_EMAIL,
subject: 'Registration Error',
body: `Invalid email: ${userData.email}`
});
if (this.config.slackEnabled) {
await this.slack.postMessage({
channel: '#errors',
text: `Registration failed for ${userData.email}`
});
}
throw new ValidationError('Invalid email format');
}
// Check if user exists
const existingUser = await this.db.query(
'SELECT * FROM users WHERE email = ?',
[userData.email]
);
if (existingUser.length > 0) {
if (existingUser[0].status === 'pending') {
// Resend verification email
const token = crypto.randomBytes(32).toString('hex');
await this.db.query(
'UPDATE users SET verification_token = ? WHERE email = ?',
[token, userData.email]
);
const emailSent = await this.emailService.send({
to: userData.email,
subject: 'Verify Your Email',
body: this.templates.verification({ token })
});
// ... 15 more lines of nested logic
}
}
// ... 380 more lines
}
}
After (with structural constraints):
// validators/email_validator.js
class EmailValidator {
static isValid(email) {
return email && email.includes('@') && email.includes('.');
}
}
// services/error_notifier.js
class ErrorNotifier {
constructor(emailService, slackService, config) {
this.email = emailService;
this.slack = slackService;
this.config = config;
}
async notifyRegistrationError(email, error) {
await this.email.sendToAdmin('Registration Error',
`Invalid email: ${email}`);
if (this.config.slackEnabled) {
await this.slack.postToErrors(
`Registration failed for ${email}`);
}
}
}
// services/user_registration_service.js
class UserRegistrationService {
constructor(userRepository, emailValidator, errorNotifier) {
this.users = userRepository;
this.validator = emailValidator;
this.notifier = errorNotifier;
}
async register(userData) {
this.validateEmail(userData.email);
const existing = await this.users.findByEmail(userData.email);
if (existing) {
return this.handleExistingUser(existing, userData);
}
return this.createNewUser(userData);
}
validateEmail(email) {
if (!this.validator.isValid(email)) {
this.notifier.notifyRegistrationError(email, 'Invalid format');
throw new ValidationError('Invalid email format');
}
}
async handleExistingUser(user, userData) {
if (user.status === 'pending') {
return this.resendVerification(user);
}
throw new ConflictError('User already exists');
}
async createNewUser(userData) {
// Implementation here
}
}
🤔 Did you know? Studies of code review effectiveness show that reviewers catch 60-90% of defects in functions under 20 lines, but only 20-30% in functions over 100 lines. Structural constraints aren't just aesthetic—they directly impact review quality.
The prompt that generates the second version includes explicit structural constraints:
"Create a user registration system with these constraints:
- Maximum function length: 15 lines
- Maximum file length: 100 lines
- Each class should have a single responsibility
- Extract validation logic into separate validator classes
- Extract notification logic into a dedicated notifier service
- Use dependency injection for all services"
Naming and Interface Constraints: Creating Cognitive Anchors
Even well-structured code becomes difficult to review when names are inconsistent or interfaces unpredictable. Naming conventions and interface patterns serve as cognitive anchors—they let you make accurate assumptions without reading implementation details.
Effective naming constraints include:
📋 Quick Reference Card: Naming Constraint Examples
| Category | Constraint | Example |
|---|---|---|
| 🎯 Functions | Verbs for actions, prefix with intent | getUserById(), validateEmail(), calculateTotal() |
| 📦 Classes | Nouns, suffix with role | UserRepository, EmailValidator, PaymentProcessor |
| 🔒 Booleans | Question form or is/has prefix | isValid, hasPermission, canEdit |
| 🧠 Constants | SCREAMING_SNAKE_CASE for immutable | MAX_RETRY_ATTEMPTS, API_BASE_URL |
| ⚙️ Private | Prefix with underscore | _validateInternal(), _connectionPool |
💡 Real-World Example: A development team I worked with reduced review time by 40% simply by enforcing consistent naming for repository methods. Before: get(), fetch(), retrieve(), find() were used interchangeably. After: findById(), findAll(), findBy{Criteria}() became the standard. Reviewers could immediately understand query intent without reading implementations.
Interface pattern constraints work similarly. Consider this specification:
"All repository methods should:
- Return domain objects, never database rows
- Use Optional/Maybe for single-item queries that might fail
- Use List/Array for multi-item queries (empty list if none found)
- Throw exceptions only for infrastructure failures, not missing data
- Accept primitive types or domain objects as parameters, never DTOs"
This produces reviewable code because you can validate correctness by scanning method signatures:
// repositories/product_repository.ts
import { Product } from '../models/product';
import { Optional } from '../utils/optional';
class ProductRepository {
// ✅ Returns Optional - reviewer knows this handles "not found"
findById(id: number): Promise<Optional<Product>> {
// Implementation
}
// ✅ Returns array - reviewer knows this never fails
findByCategory(category: string): Promise<Product[]> {
// Implementation
}
// ✅ Accepts domain object - clear what data is needed
save(product: Product): Promise<Product> {
// Implementation
}
// ✅ Infrastructure failure throws - reviewer understands error handling
deleteById(id: number): Promise<void> {
// May throw DatabaseConnectionError
}
}
Documentation Constraints: Balancing Signal and Noise
AI can generate extensive documentation, but more isn't better—reviewable documentation has the right density and placement. Too little leaves reviewers guessing; too much obscures the code itself.
🎯 Key Principle: Documentation constraints should specify what to document and what to omit, not just "add comments."
Effective documentation constraints:
"Follow these documentation rules:
1. Document WHY for non-obvious decisions, not WHAT (code shows what)
2. Add docstrings to all public functions with: purpose, parameters, returns, exceptions
3. No docstrings for private functions unless complex (name should be self-documenting)
4. Add TODO comments for known limitations
5. Include usage examples for classes with complex initialization
6. No commented-out code—use version control
7. Explain magic numbers with inline comments"
This produces code that's self-documenting where possible, with strategic explanation where needed:
class RateLimiter:
"""Prevents API abuse using a token bucket algorithm.
Each user gets a bucket of tokens that refills over time.
Each request consumes one token. Requests fail when bucket is empty.
Example:
limiter = RateLimiter(capacity=100, refill_rate=10)
if limiter.allow_request(user_id):
# Process request
else:
# Return 429 Too Many Requests
"""
def __init__(self, capacity: int, refill_rate: int):
"""Initialize rate limiter.
Args:
capacity: Maximum tokens per bucket (burst size)
refill_rate: Tokens added per second
"""
self.capacity = capacity
self.refill_rate = refill_rate
self._buckets = {} # user_id -> (tokens, last_update)
def allow_request(self, user_id: str) -> bool:
"""Check if user has available tokens.
Args:
user_id: Unique identifier for the user
Returns:
True if request allowed, False if rate limited
"""
self._refill_bucket(user_id)
tokens, timestamp = self._buckets.get(user_id, (self.capacity, time.time()))
if tokens >= 1:
self._buckets[user_id] = (tokens - 1, timestamp)
return True
return False
def _refill_bucket(self, user_id: str):
# No docstring needed - name is self-documenting
if user_id not in self._buckets:
self._buckets[user_id] = (self.capacity, time.time())
return
tokens, last_update = self._buckets[user_id]
now = time.time()
elapsed = now - last_update
# Calculate tokens to add, but don't exceed capacity
new_tokens = min(
self.capacity,
tokens + (elapsed * self.refill_rate)
)
self._buckets[user_id] = (new_tokens, now)
⚠️ Common Mistake: Asking AI to "add lots of comments" often produces noise like # Increment counter above counter += 1. Instead, constrain documentation to non-obvious intent: "Only comment code where the purpose isn't clear from well-named variables and functions." ⚠️
Building Reusable Constraint Templates
Once you've identified constraints that improve reviewability for your context, codify them into reusable templates. This transforms ad-hoc prompting into a systematic teaching practice.
Template structure:
[PROJECT CONTEXT]
Language: {language}
Framework: {framework}
Purpose: {purpose}
[ARCHITECTURAL CONSTRAINTS]
{pattern specifications}
[STRUCTURAL CONSTRAINTS]
{size and organization rules}
[INTERFACE CONSTRAINTS]
{method signature patterns}
[NAMING CONSTRAINTS]
{convention specifications}
[DOCUMENTATION CONSTRAINTS]
{what to document and how}
[TASK]
{specific generation request}
💡 Pro Tip: Maintain a library of constraint templates for common scenarios: API endpoints, data models, background jobs, CLI tools, test suites. Each template encodes your team's reviewability standards for that scenario.
Example: REST API Endpoint Template
[PROJECT CONTEXT]
Language: Python
Framework: Flask
Purpose: RESTful API endpoint
[ARCHITECTURAL CONSTRAINTS]
- Use three-layer architecture: Controller → Service → Repository
- Controllers handle HTTP only (validation, serialization, status codes)
- Services contain business logic only (no HTTP or database imports)
- Repositories handle database access only (return domain objects)
[STRUCTURAL CONSTRAINTS]
- Maximum 15 lines per function
- Maximum 100 lines per file
- One class per file (except small helper classes)
- Extract validation into separate validator classes
[INTERFACE CONSTRAINTS]
- Controller methods return Flask Response objects
- Service methods return domain objects or raise business exceptions
- Repository methods return domain objects, lists, or None
- All methods have type hints
[NAMING CONSTRAINTS]
- Controllers: {Resource}Controller with methods named after HTTP verbs (get, post, put, delete)
- Services: {Domain}Service with verb methods (create_user, calculate_total)
- Repositories: {Entity}Repository with find/save/delete methods
- Variables: snake_case, descriptive (user_id not uid)
[DOCUMENTATION CONSTRAINTS]
- Docstring for every public method: purpose, parameters, returns, exceptions
- No docstrings for obvious private methods
- Comment non-obvious business rules
- Include usage example in class docstring
[TASK]
{Your specific endpoint request}
When you use this template, AI generates consistently reviewable endpoint code because all structural decisions are predetermined. You're reviewing business logic correctness, not architectural choices.
🧠 Mnemonic: The SAND Framework for constraint templates:
- Structure (how code is organized)
- Architecture (patterns and layers)
- Naming (conventions and clarity)
- Documentation (what to explain)
Constraint Evolution: Learning from Reviews
Your constraint templates should evolve based on review experience. Track which AI outputs require significant revision and add constraints that would have prevented those issues.
❌ Wrong thinking: "AI generated bad code, I'll just rewrite it myself." ✅ Correct thinking: "AI generated bad code because my constraints were incomplete. What constraint would have prevented this? I'll add it to my template."
Example evolution:
Iteration 1: Generate user authentication endpoint Review finding: Password validation logic mixed with database access New constraint: "Extract all validation logic into separate validator classes"
Iteration 2: Generate with validation constraint Review finding: Validators return boolean, error messages lost New constraint: "Validators should return Result<T, ValidationError> with specific error messages"
Iteration 3: Generate with Result constraint
Review finding: No consistent error message format
New constraint: "ValidationError must include field name and user-facing message"
Each iteration makes your template more sophisticated and the generated code more reviewable.
Practical Constraint Application: A Complete Example
Let's see how layered constraints transform a vague request into reviewable code.
Initial request (unconstrained):
"Create a payment processing system."
Enhanced request (with layered constraints):
Create a payment processing system with these specifications:
ARCHITECTURE:
- Use hexagonal architecture (ports and adapters)
- Core business logic has no external dependencies
- Payment gateway integration is an adapter (swappable)
STRUCTURE:
- Core domain in /domain directory
- Adapters in /adapters directory
- Each class maximum 100 lines
- Each method maximum 15 lines
INTERFACES:
- All payment operations return Result<Payment, PaymentError>
- Gateway adapter implements PaymentGateway interface
- Never throw exceptions for business rule violations
NAMING:
- Domain objects: Payment, Transaction, PaymentMethod
- Services: PaymentProcessor, RefundService
- Adapters: StripeGatewayAdapter, PayPalGatewayAdapter
- Methods: process_payment(), refund_payment(), validate_payment_method()
DOCUMENTATION:
- Docstring for all public methods
- Explain business rules inline (e.g., why refund window is 30 days)
- Include example usage in service class docstrings
TASK:
Implement payment processing with credit card support.
Must handle: authorization, capture, refund scenarios.
This produces code organized exactly for efficient review:
## domain/payment.py
from dataclasses import dataclass
from decimal import Decimal
from enum import Enum
class PaymentStatus(Enum):
PENDING = "pending"
AUTHORIZED = "authorized"
CAPTURED = "captured"
REFUNDED = "refunded"
FAILED = "failed"
@dataclass
class Payment:
"""Represents a payment transaction in the system."""
id: str
amount: Decimal
currency: str
status: PaymentStatus
gateway_transaction_id: str = None
def is_refundable(self) -> bool:
"""Check if payment can be refunded."""
return self.status == PaymentStatus.CAPTURED
## domain/ports/payment_gateway.py
from abc import ABC, abstractmethod
from typing import Protocol
from domain.payment import Payment
from utils.result import Result
class PaymentGateway(Protocol):
"""Interface for payment gateway adapters."""
def authorize(self, payment: Payment) -> Result[Payment, str]:
"""Authorize payment without capturing funds."""
...
def capture(self, payment: Payment) -> Result[Payment, str]:
"""Capture previously authorized payment."""
...
## adapters/stripe_gateway.py
from domain.payment import Payment, PaymentStatus
from domain.ports.payment_gateway import PaymentGateway
import stripe
class StripeGatewayAdapter:
"""Stripe payment gateway adapter.
Example:
gateway = StripeGatewayAdapter(api_key="sk_test_...")
result = gateway.authorize(payment)
if result.is_success():
captured = gateway.capture(result.value)
"""
def __init__(self, api_key: str):
stripe.api_key = api_key
def authorize(self, payment: Payment) -> Result[Payment, str]:
"""Authorize payment using Stripe.
Args:
payment: Payment object with amount and currency
Returns:
Result containing updated Payment or error message
"""
try:
intent = self._create_payment_intent(payment)
return self._build_success_result(payment, intent)
except stripe.error.CardError as e:
return Result.failure(f"Card declined: {e.user_message}")
def _create_payment_intent(self, payment: Payment):
return stripe.PaymentIntent.create(
amount=int(payment.amount * 100), # Stripe uses cents
currency=payment.currency,
capture_method='manual' # Authorization only
)
Notice how easy this is to review:
- ✅ Clear separation between domain and infrastructure
- ✅ Each file focused on one concern
- ✅ Consistent naming makes scanning easy
- ✅ Documentation explains purpose without noise
- ✅ Short methods keep cognitive load low
The reviewer can verify business logic correctness without understanding Stripe's API details, because the adapter pattern and constraints create clear boundaries.
Moving Forward
Constraint specification is a skill that compounds over time. Each well-specified generation teaches you which constraints matter most for your context. Each review reveals missing constraints that should be added to your templates.
The developers who thrive in an AI-augmented world aren't those who write the most code—they're those who most effectively teach AI to write reviewable code through precise constraint specification. Your constraint library becomes your most valuable asset, encoding years of architectural wisdom into reusable templates.
In the next section, we'll explore how to refine AI's output through iterative feedback loops, turning good constraint specification into excellent generated code through progressive teaching.
Iterative Teaching: The Feedback Loop
The difference between a developer who merely prompts AI and one who teaches AI lies in their approach to iteration. When AI generates code that falls short of your reviewability standards, you face a critical choice: do you accept it, manually rewrite it, or use this moment to teach the AI what you actually need? The most effective developers treat every suboptimal output as a teaching moment—an opportunity to refine the AI's understanding of what constitutes reviewable code in their specific context.
Think of this process as training a junior developer. You wouldn't expect them to write perfect code on the first try, nor would you simply redo their work without explanation. Instead, you'd review their output, identify specific issues, and provide feedback that helps them improve next time. The same principle applies to AI, but with a crucial advantage: AI can internalize and apply your feedback much faster than humans, creating a virtuous feedback loop that compounds improvements across iterations.
The Teaching Moment: Identifying and Translating Issues
When AI generates code that's difficult to review, the problem usually manifests in specific, identifiable ways. Perhaps functions are too long, variable names are opaque, error handling is implicit, or the logic flow is convoluted. Your first task is to diagnose the reviewability issue with precision, then translate it into actionable feedback the AI can use.
Consider this AI-generated function:
def process(data):
result = []
for item in data:
if item.get('status') == 'active':
val = item.get('value', 0) * 1.2
if val > 100:
result.append({'id': item['id'], 'val': val, 'priority': 'high'})
else:
result.append({'id': item['id'], 'val': val, 'priority': 'low'})
return result
As a reviewer, you immediately spot multiple reviewability issues: the function name is vague, the magic numbers (1.2, 100) lack context, the transformation logic is buried in conditionals, and the intent isn't clear. But here's where most developers stumble—they either accept this code or ask the AI to "make it better" without specificity.
Instead, create a specific teaching message that identifies each issue and explains why it matters:
"This function has several reviewability issues:
- The name 'process' doesn't convey what transformation is being applied
- Magic number 1.2 appears to be a markup multiplier but lacks explanation
- The threshold value 100 needs context—what does it represent?
- The priority assignment logic should be extractable and testable separately
- Variable names like 'val' are abbreviated unnecessarily
Please rewrite this function with: descriptive names that reveal intent, named constants for all magic numbers with comments explaining their business meaning, and the priority logic extracted to a separate, testable function."
This teaching approach transforms a vague complaint into a structured lesson. The AI now has concrete criteria to apply not just to this function, but to future generations.
🎯 Key Principle: Specific feedback creates specific improvements. Vague criticism ("this is hard to review") produces vague changes, while precise identification of issues ("magic numbers lack context") produces targeted fixes.
Building on Success: The Reference Implementation Technique
Once you've provided specific feedback and the AI has generated improved code, you've created something invaluable: a reference implementation. This is a concrete example of code that meets your reviewability standards, and it becomes a powerful teaching tool for future requests.
Here's the improved version after teaching:
## Business constants with clear explanations
PREMIUM_MARKUP = 1.2 # 20% markup applied to active item values
HIGH_PRIORITY_THRESHOLD = 100 # Values exceeding $100 require expedited processing
def calculate_premium_value(base_value: float) -> float:
"""Apply premium markup to base value."
Args:
base_value: The original item value before markup
Returns:
The value with premium markup applied
"""
return base_value * PREMIUM_MARKUP
def determine_priority(value: float) -> str:
"""Determine processing priority based on value threshold.
Args:
value: The calculated value to assess
Returns:
'high' if value exceeds threshold, 'low' otherwise
"""
return 'high' if value > HIGH_PRIORITY_THRESHOLD else 'low'
def enrich_active_items_with_premium_pricing(items: list[dict]) -> list[dict]:
"""Transform active items by applying premium markup and priority assignment.
Filters for active items, applies premium pricing calculation, and assigns
processing priority based on the calculated value.
Args:
items: List of item dictionaries with 'status', 'value', and 'id' keys
Returns:
List of enriched items with calculated value and assigned priority
"""
enriched_items = []
for item in items:
if item.get('status') != 'active':
continue
base_value = item.get('value', 0)
calculated_value = calculate_premium_value(base_value)
priority = determine_priority(calculated_value)
enriched_items.append({
'id': item['id'],
'calculated_value': calculated_value,
'priority': priority
})
return enriched_items
Now you have a reference. In future conversations, you can tell the AI:
"Generate a function similar to
enrich_active_items_with_premium_pricingthat processes customer orders. Use the same patterns: descriptive function names, extracted helper functions for business logic, named constants with comments, and comprehensive docstrings."
The AI now understands not just through abstract rules, but through concrete example. This is exponentially more effective than repeating your constraints each time.
💡 Pro Tip: Maintain a personal library of reference implementations from successful AI interactions. When starting new projects or working in new domains, point the AI to similar reference code from your library. This accelerates the teaching process dramatically.
The Iterative Feedback Cycle: Structure and Flow
Effective AI teaching follows a structured feedback cycle that progressively refines output quality. Understanding this cycle helps you recognize where you are in the process and what action to take next.
┌─────────────────────────────────────────────────────────┐
│ INITIAL REQUEST │
│ (with constraints from previous teaching, if any) │
└────────────────────┬────────────────────────────────────┘
│
▼
┌─────────────┐
│ AI GENERATES │
│ CODE │
└──────┬───────┘
│
▼
┌──────────────────────┐
│ REVIEW FOR │
│ REVIEWABILITY │
└──────┬───────────────┘
│
┌───────┴────────┐
│ │
▼ ▼
ACCEPTABLE ISSUES FOUND
│ │
│ ▼
│ ┌──────────────────────┐
│ │ IDENTIFY SPECIFIC │
│ │ REVIEWABILITY GAPS │
│ └─────────┬────────────┘
│ │
│ ▼
│ ┌──────────────────────┐
│ │ PROVIDE TARGETED │
│ │ TEACHING FEEDBACK │
│ └─────────┬────────────┘
│ │
│ ▼
│ ┌──────────────────────┐
│ │ AI REGENERATES │
│ │ WITH LEARNING │
│ └─────────┬────────────┘
│ │
│ └─────────┐
│ │
▼ ▼
┌──────────────┐ ┌──────────────────┐
│ STORE AS │ │ RETURN TO │
│ REFERENCE │ │ REVIEW STEP │
└──────────────┘ └──────────────────┘
│ │
└────────┬───────────────┘
│
▼
┌──────────────────┐
│ LEARNING STORED │
│ FOR NEXT REQUEST│
└──────────────────┘
Each iteration through this cycle should bring you closer to reviewable code. The key is recognizing that each cycle has a purpose: early cycles establish basic structure and naming, middle cycles refine logic and extract complexity, and later cycles polish documentation and edge cases.
Measuring Improvement Across Iterations
How do you know if your teaching is working? You need concrete reviewability metrics that you can track across iterations. These don't need to be automated (though they can be)—even manual assessment gives you valuable feedback on the feedback loop itself.
Consider tracking these dimensions:
📋 Quick Reference Card: Reviewability Metrics
| 📊 Metric | 🎯 What to Measure | ✅ Success Indicator |
|---|---|---|
| 📏 Function Length | Lines per function | Functions under 20-30 lines |
| 🏷️ Naming Clarity | Descriptive vs. abbreviated names | 90%+ names are immediately clear |
| 🔢 Magic Numbers | Unnamed constants in code | Zero magic numbers; all extracted |
| 📝 Documentation | Presence of docstrings and comments | Every public function documented |
| 🧩 Complexity | Cyclomatic complexity | Complexity under 10 per function |
| 🧪 Testability | Logic extraction to pure functions | Business logic is isolated |
After each iteration, quickly assess the generated code against these metrics. If iteration 3 shows improvement over iteration 1, your teaching is effective. If not, you need to adjust your feedback approach.
💡 Real-World Example: A developer working with AI on a data pipeline noticed that across 5 iterations, function lengths decreased from an average of 87 lines to 23 lines, magic numbers dropped from 12 to 0, and cyclomatic complexity fell from 15 to 4. These measurable improvements proved the teaching approach was working.
Using Previous Outputs to Teach Future Generations
One of the most powerful teaching techniques is showing the AI its own evolution. When the AI generates code with issues, don't just describe the problems—show the contrast between what it produced and what it should produce.
Here's a teaching pattern that works exceptionally well:
"You generated this code in your previous response: [paste problematic code]
This version has these specific reviewability issues:
- [specific issue]
- [specific issue]
- [specific issue]
Here's how this should be structured for reviewability: [show corrected version or outline the structure]
Now apply these same improvements to the
process_ordersfunction you also generated."
This technique works because it:
🧠 Provides concrete before/after context—the AI sees exactly what changed
🧠 Makes the learning transferable—improvements to one function inform improvements to others
🧠 Creates a conversation history—later in the same session, you can reference "the pattern we established for the enrich_active_items function"
🧠 Builds consistency—the AI learns your specific style through accumulated examples
⚠️ Common Mistake: Many developers provide feedback but then start a completely new conversation for their next request, losing all the teaching context they built up. This forces them to reteach the same lessons repeatedly. Mistake: Treating each AI interaction as isolated. ⚠️
When to Persist vs. When to Reset
Here's the paradox of iterative teaching: the same conversation context that makes teaching effective can eventually become a liability. How do you know when to continue building on an existing conversation versus starting fresh?
Persist in the same conversation when:
🔧 You're seeing measurable improvement across iterations—each version gets closer to your reviewability standards
🔧 The core requirements remain stable—you're refining implementation, not changing direction
🔧 Reference implementations are accumulating—you're building a library of examples within the conversation
🔧 The AI demonstrates learning transfer—improvements you taught for one function appear automatically in subsequent functions
🔧 You're within the same problem domain—the context and constraints remain relevant
Reset and start fresh when:
🔄 The conversation has become overly long (typically over 20-30 exchanges)—context windows fill up and early teachings may be deprioritized
🔄 You're switching to a significantly different problem—the patterns you taught for data processing don't apply to UI components
🔄 The AI seems "stuck" repeating the same mistakes—sometimes accumulated context creates contradictions that fresh eyes resolve
🔄 You've accumulated conflicting instructions—if you've changed direction multiple times, the AI may be trying to satisfy incompatible constraints
🔄 You need to apply consolidated learnings—start a new conversation with a comprehensive prompt incorporating everything you learned
Here's a practical example of when to reset:
## After 15 iterations in a conversation, you've taught the AI to generate
## highly reviewable data processing functions. The AI now consistently:
## - Uses descriptive names
## - Extracts constants
## - Separates business logic
## - Writes comprehensive docstrings
##
## NOW is the time to start a fresh conversation with a prompt like:
"I need you to generate data processing functions following these patterns: [Include 1-2 reference implementations from the previous conversation]
Key requirements for all functions:
- Descriptive names that reveal intent
- All magic numbers extracted to named constants with explanatory comments
- Business logic separated into pure, testable functions
- Comprehensive docstrings with Args and Returns sections
- Maximum function length of 30 lines
Apply these patterns to create functions for processing customer orders..."
This consolidated teaching prompt captures the lessons from your iterative teaching and makes them immediately available in a fresh context. You've effectively transferred the learning from one conversation to another.
💡 Mental Model: Think of conversation persistence like working memory and conversation reset like consolidating to long-term memory. Sometimes you need to "sleep on it" and consolidate what you learned into a clean, new context.
The Progressive Refinement Pattern
When you decide to persist in a conversation and continue teaching, follow the progressive refinement pattern—a structured approach that builds reviewability in layers rather than demanding everything at once.
Layer 1: Structure and Naming (Iterations 1-2)
First, get the basic structure right. Focus your feedback on:
- Function and variable naming
- Breaking down large functions
- Overall code organization
Don't worry yet about documentation or edge cases. Establish the skeleton first.
Layer 2: Logic Clarity (Iterations 3-4)
Once structure is solid, refine the logic:
- Extract magic numbers to constants
- Separate business logic from I/O
- Simplify conditionals
- Improve error handling
Layer 3: Documentation and Edge Cases (Iterations 5-6)
Finally, polish for reviewability:
- Add comprehensive docstrings
- Include inline comments for non-obvious logic
- Handle edge cases explicitly
- Add type hints where applicable
Layer 4: Test Alignment (If Needed)
If you're also generating tests, ensure:
- Code structure supports easy testing
- Test cases are comprehensive
- Tests serve as documentation
This layered approach prevents overwhelming the AI (and yourself) with too many constraints simultaneously. Each layer builds on the previous one, and the AI can focus its "attention" on specific improvements.
🤔 Did you know? Research on AI code generation shows that iterative refinement with specific feedback produces code that's rated 40-60% more maintainable than single-shot generation, even when the single-shot prompt includes all the same requirements. The progressive focus appears to help AI models prioritize and apply constraints more effectively.
Practical Teaching Dialogues
Let's examine a realistic teaching dialogue that demonstrates effective iteration:
Iteration 1 - Initial Generation:
Developer: "Generate a function that calculates shipping costs based on weight and distance."
AI generates:
function calc(w, d) {
if (w < 5) return d * 0.5;
if (w < 20) return d * 0.75;
return d * 1.2 + 10;
}
Iteration 2 - Teaching Moment:
Developer: "This function has reviewability issues:
- The name 'calc' is too generic—it should describe what is being calculated
- Parameters 'w' and 'd' are abbreviations—use full words
- Magic numbers (5, 20, 0.5, 0.75, 1.2, 10) lack business context
- No documentation explaining the pricing tiers
- No type information
Rewrite this with:
- A descriptive function name
- Full parameter names with types
- Named constants with comments explaining the business rules
- A docstring explaining the pricing structure"
AI regenerates:
// Weight thresholds for pricing tiers (in pounds)
const LIGHT_PACKAGE_MAX_WEIGHT = 5;
const MEDIUM_PACKAGE_MAX_WEIGHT = 20;
// Shipping rates per mile for each weight tier
const LIGHT_PACKAGE_RATE = 0.50; // Under 5 lbs
const MEDIUM_PACKAGE_RATE = 0.75; // 5-20 lbs
const HEAVY_PACKAGE_RATE = 1.20; // Over 20 lbs
// Additional fee for heavy packages
const HEAVY_PACKAGE_SURCHARGE = 10.00;
/**
* Calculate shipping cost based on package weight and delivery distance.
*
* Pricing structure:
* - Light packages (< 5 lbs): $0.50 per mile
* - Medium packages (5-20 lbs): $0.75 per mile
* - Heavy packages (> 20 lbs): $1.20 per mile + $10 surcharge
*
* @param {number} weightInPounds - Package weight in pounds
* @param {number} distanceInMiles - Delivery distance in miles
* @returns {number} Total shipping cost in dollars
*/
function calculateShippingCost(weightInPounds, distanceInMiles) {
if (weightInPounds < LIGHT_PACKAGE_MAX_WEIGHT) {
return distanceInMiles * LIGHT_PACKAGE_RATE;
}
if (weightInPounds < MEDIUM_PACKAGE_MAX_WEIGHT) {
return distanceInMiles * MEDIUM_PACKAGE_RATE;
}
return distanceInMiles * HEAVY_PACKAGE_RATE + HEAVY_PACKAGE_SURCHARGE;
}
Iteration 3 - Refinement:
Developer: "Much better! This is now reviewable. The constants and documentation make the business logic clear. For future functions in this module, follow this same pattern: constants at the top with explanatory comments, comprehensive docstrings, and descriptive names throughout.
Now generate a function to calculate delivery time estimates using the same patterns."
Notice what happened: The developer identified specific issues, provided clear teaching, validated the improvement, and then explicitly stated that this pattern should be followed for subsequent code. The AI now has a reference implementation and clear direction.
✅ Correct thinking: "I'll spend two iterations establishing reviewability patterns, then leverage those patterns for subsequent code generation in this session."
❌ Wrong thinking: "I'll describe all my requirements once in detail and expect perfect code immediately."
Recognizing When Teaching Is Working
How can you tell if your iterative teaching is effective? Look for these learning indicators:
🎯 Unprompted application—The AI starts applying patterns you taught for one function to new functions without being reminded
🎯 Fewer iterations per function—Early functions require 3-4 iterations, later ones require 1-2
🎯 Proactive documentation—The AI includes comprehensive docstrings without being asked
🎯 Consistent style—Generated code maintains consistent naming, structure, and organization
🎯 Anticipating requirements—The AI asks clarifying questions about business logic or constraints
When you see these indicators, you know your teaching has created a productive feedback loop. The AI has internalized your reviewability standards for this context.
Conversely, warning signs that teaching isn't working:
⚠️ The same issues recur across iterations
⚠️ Improvements in one area cause regressions in another
⚠️ The AI seems to "forget" patterns you established earlier
⚠️ You're providing the same feedback repeatedly
When you see these warning signs, it's time to either reset the conversation or fundamentally change your teaching approach—perhaps you need to provide more concrete examples, or your constraints are conflicting.
The Compound Effect of Teaching
The true power of iterative teaching reveals itself over time. Each teaching cycle makes the next cycle more efficient. Each reference implementation you create becomes a template for future work. Each pattern you establish reduces the cognitive load of reviewing subsequent code.
Consider the compound effect over a project:
Week 1: You spend significant time teaching the AI your reviewability standards for API endpoints. It takes 4-5 iterations per endpoint.
Week 2: New endpoints require only 2-3 iterations because you reference previous endpoints as examples.
Week 3: You can generate endpoints that are nearly review-ready on first try by pointing to your collection of reference implementations.
Week 4: You start a new module (background jobs) and apply the patterns you established, adapting them to the new context with minimal iteration.
This compounding improvement transforms your relationship with AI-generated code. You move from being a constant editor to being a teacher who occasionally corrects, then finally to being a reviewer who primarily validates.
💡 Remember: The goal of iterative teaching isn't to eventually get perfect code from AI—it's to progressively reduce the review burden while maintaining code quality. Even experienced developers working with human team members don't get perfect code on the first try. The feedback loop is the work.
Creating Your Teaching Playbook
As you iterate and teach across multiple sessions, document patterns that work. Build a personal teaching playbook—a collection of:
📚 Feedback templates for common reviewability issues
📚 Reference implementations organized by type (API handlers, data processing, validation logic, etc.)
📚 Constraint specifications that worked well for different contexts
📚 Success metrics that helped you track improvement
📚 Reset triggers that tell you when to start fresh
This playbook becomes increasingly valuable over time. Instead of reinventing your teaching approach for each new project, you can draw on proven patterns from your playbook, dramatically accelerating the teaching process.
The feedback loop between you and AI is not just about improving code—it's about improving your ability to teach AI to generate reviewable code. Each cycle makes you better at recognizing issues, articulating requirements, and measuring improvement. This meta-skill—teaching AI effectively—becomes your most valuable capability in a world where most code is AI-generated.
In our next section, we'll examine the common pitfalls that derail this feedback loop and how to avoid the mistakes that compound review complexity rather than reducing it.
Common Pitfalls When Training AI for Code Generation
The difference between developers who extract reviewable code from AI and those who drown in technical debt often comes down to avoiding a handful of critical mistakes. These pitfalls aren't obvious at first—they feel like reasonable approaches until you're three months into a project, staring at a codebase that looks like it was written by fifteen different people with no shared context. Let's examine the systematic mistakes that transform AI from a productivity multiplier into a complexity generator.
Pitfall 1: Over-Specifying Implementation Details Instead of Reviewability Constraints
⚠️ Common Mistake 1: The Implementation Dictator ⚠️
When developers first start working with AI for code generation, they often bring the mindset of a junior developer manager—micromanaging every implementation detail. They write prompts like architectural blueprints, specifying which loop constructs to use, what variable names to choose, and exactly how many lines each function should contain.
❌ Wrong thinking: "I need to tell the AI exactly how to implement this, or it will make bad choices."
✅ Correct thinking: "I need to specify what makes code reviewable for me, then let the AI find implementations that satisfy those constraints."
Consider this over-specified prompt:
Create a user authentication function. Use a for loop to iterate through
the users array. Store the result in a variable called 'foundUser'.
Use exactly three if statements to check: first if the username matches,
second if the user is active, third if the password hash matches.
Return the user object if all conditions pass, otherwise return null.
This approach creates multiple problems. First, you're constraining the solution space to a specific implementation that might be suboptimal. Second, you're not teaching the AI what makes code reviewable for you—you're just dictating one specific solution. Third, when the AI generates code for a similar problem later, it won't understand the underlying principles; it only knows the specific instructions you gave for this one case.
Here's what the same request looks like when focused on reviewability:
Create a user authentication function with these reviewability constraints:
- Single responsibility: authentication logic only, no side effects
- Explicit error cases: make it obvious what each failure condition is
- No nested conditionals deeper than 2 levels
- Guard clauses for early returns to reduce cognitive load
- Names that reveal intent without requiring comments
- Separation of validation from business logic
The AI might generate:
function authenticateUser(username, password, userRepository) {
// Guard clause: user lookup failure
const user = userRepository.findByUsername(username);
if (!user) {
return { success: false, reason: 'USER_NOT_FOUND' };
}
// Guard clause: inactive user
if (!user.isActive) {
return { success: false, reason: 'USER_INACTIVE' };
}
// Separate password verification logic
const isPasswordValid = verifyPasswordHash(password, user.passwordHash);
if (!isPasswordValid) {
return { success: false, reason: 'INVALID_CREDENTIALS' };
}
return { success: true, user };
}
Notice how the reviewability constraints led to code that's easy to review: each failure case is obvious, the flow is linear, and responsibilities are clear. You didn't dictate the implementation; you specified what makes it reviewable.
💡 Pro Tip: When you catch yourself specifying how to implement something, pause and ask: "What review principle am I actually trying to enforce here?" Convert the implementation detail into a constraint about reviewability.
🎯 Key Principle: Constraint-based specification teaches AI your review standards. Implementation-based specification just gets you one specific solution without transferable learning.
Pitfall 2: Accepting the First Output Without Teaching AI Your Review Requirements
The second major pitfall is what I call "the vending machine mindset"—you put in a prompt, you get out code, and you move on. This treats AI like a code vending machine instead of a system you need to train.
⚠️ Common Mistake 2: One-and-Done Generation ⚠️
Developers who fall into this trap typically follow this pattern:
Developer → AI: "Create a data validation module"
AI → Developer: [Generates 200 lines of code]
Developer: [Copies into codebase without review feedback]
The problem compounds over time:
Iteration 1: AI generates without knowing your standards
↓
Iteration 2: AI still doesn't know your standards (no feedback loop)
↓
Iteration 3: AI generates in a different style (no consistency)
↓
Iteration N: Your codebase looks like a franken-system
Instead, effective AI training looks like this:
Developer → AI: "Create a data validation module"
AI → Developer: [Generates initial code]
Developer → AI: "This is hard to review because validation rules are
scattered across multiple functions. Consolidate rules
into a declarative structure that I can scan visually."
AI → Developer: [Generates improved version]
Developer → AI: "Better. Now make error messages reference the specific
rule that failed, so debugging is obvious."
AI → Developer: [Generates final version]
Developer: [Reviews and accepts, AI has learned your standards]
💡 Real-World Example: A development team at a fintech company discovered they were spending 60% of code review time just asking "what does this function do?" They started rejecting AI-generated functions that didn't have obvious intent from their signature alone. After two weeks of consistently providing this feedback, their AI began generating self-documenting function signatures by default, and review time dropped by 40%.
The teaching moment happens when you articulate why code isn't reviewable and ask for specific improvements. Each feedback cycle builds the AI's understanding of your review requirements.
🧠 Mnemonic: FIRST stands for Feedback Is Required for Systematic Training. Never accept first output without giving the AI feedback about your review needs.
Pitfall 3: Inconsistent Constraint Application Leading to Style Chaos
This pitfall is insidious because it sneaks up on you. You successfully teach the AI to generate reviewable code for one feature... then forget to apply the same constraints to the next feature. Six months later, your codebase looks like a patchwork quilt sewn by a committee that never met.
⚠️ Common Mistake 3: Context Amnesia ⚠️
Here's how this typically manifests:
Monday: You prompt the AI to create an API endpoint with strict error handling, explicit type checking, and consistent response structures.
Wednesday: You prompt the AI to create another API endpoint but forget to mention error handling standards. The AI generates code with try-catch blocks but inconsistent error response formats.
Friday: Another endpoint, this time you remember error handling but forget about the type checking constraints. The AI generates code with implicit type coercion.
Result: Three endpoints that all "work" but look like they're from different codebases:
// Endpoint 1: Fully constrained
app.post('/api/users', async (req, res) => {
const validation = validateUserInput(req.body);
if (!validation.success) {
return res.status(400).json({
error: 'VALIDATION_FAILED',
details: validation.errors
});
}
// ... consistent pattern
});
// Endpoint 2: Missing type constraints
app.post('/api/products', async (req, res) => {
try {
// Implicit type coercion happening here
const product = await createProduct(req.body);
res.json({ data: product });
} catch (err) {
res.status(500).send(err.message); // Different error format!
}
});
// Endpoint 3: Different error handling style
app.post('/api/orders', async (req, res) => {
if (!req.body.items) {
return res.status(400).send('Missing items'); // String instead of object!
}
// ... inconsistent with both previous endpoints
});
Reviewing this code is cognitively expensive because each endpoint requires context switching. You can't build a mental model and apply it uniformly.
💡 Pro Tip: Create a constraint template that you reuse across similar code generation tasks. This acts as your consistency anchor:
STANDARD API ENDPOINT CONSTRAINTS:
- All errors return JSON objects with 'error' and 'details' keys
- Input validation happens before any business logic
- Type checking is explicit (no implicit coercion)
- Success responses wrapped in { success: true, data: {...} }
- Use guard clauses for early returns
- No try-catch blocks around entire handler (specific error points only)
- Database operations have explicit transaction boundaries
Store this template and include it (or reference it) in every relevant prompt. Better yet, many AI systems allow you to set system-level context that applies to all generations in a session.
📋 Quick Reference Card: Consistency Mechanisms
| 🎯 Mechanism | 📝 Description | ⚡ When to Use |
|---|---|---|
| 🔧 Constraint Templates | Reusable lists of reviewability requirements | Similar code types (all API endpoints, all data models) |
| 🗂️ Session Context | Persistent constraints for an entire work session | Working on related features in one sitting |
| 📚 Reference Commits | Point AI to previous examples as style guides | Complex patterns that are hard to specify abstractly |
| ✅ Validation Checklist | Post-generation review checklist you always apply | Every code generation regardless of type |
Pitfall 4: Failing to Provide Architectural Context
AI doesn't inherently understand your system's architecture. When you ask it to generate a component without architectural context, you get code that might work in isolation but creates integration headaches.
⚠️ Common Mistake 4: The Isolation Chamber ⚠️
Consider this scenario: You're building a microservices system where services communicate through an event bus. You prompt the AI:
"Create a service that processes payment transactions"
The AI generates a perfectly functional payment processor—but it uses direct HTTP calls to other services, synchronous database commits, and no event emission. It works... but it violates your architectural principles. Now during review, you have to either:
- Reject the entire implementation and start over
- Manually refactor to fit your architecture (defeating the purpose of AI generation)
- Accept technical debt and create an architectural outlier
All three options waste time and create review friction.
The root cause? Missing architectural context. Here's the same prompt with proper context:
Create a service that processes payment transactions.
ARCHITECTURAL CONTEXT:
- System: Event-driven microservices architecture
- Communication: Event bus (RabbitMQ) for service-to-service communication
- Data: Event sourcing pattern, all state changes emit events
- Transactions: Saga pattern for distributed transactions
- Dependencies: Payment gateway (external), Account service (internal)
INTEGRATION CONSTRAINTS:
- Consume 'PaymentRequested' events from the event bus
- Emit 'PaymentCompleted' or 'PaymentFailed' events
- No direct HTTP calls to other services
- Implement compensation logic for saga rollback
- All external API calls must be idempotent
Now the AI generates:
class PaymentProcessor {
constructor(eventBus, paymentGateway, logger) {
this.eventBus = eventBus;
this.paymentGateway = paymentGateway;
this.logger = logger;
this.setupEventHandlers();
}
setupEventHandlers() {
// Consume events from the bus (architecture alignment)
this.eventBus.subscribe('PaymentRequested', this.handlePaymentRequest.bind(this));
this.eventBus.subscribe('PaymentRollbackRequested', this.handleRollback.bind(this));
}
async handlePaymentRequest(event) {
const { transactionId, amount, accountId } = event.payload;
try {
// Idempotent external call (integration constraint)
const result = await this.paymentGateway.charge(
transactionId, // Used as idempotency key
amount
);
// Emit success event (event sourcing pattern)
await this.eventBus.publish('PaymentCompleted', {
transactionId,
accountId,
gatewayReference: result.reference,
timestamp: new Date().toISOString()
});
} catch (error) {
this.logger.error('Payment processing failed', { transactionId, error });
// Emit failure event for saga compensation (saga pattern)
await this.eventBus.publish('PaymentFailed', {
transactionId,
accountId,
reason: error.message,
compensationRequired: true
});
}
}
async handleRollback(event) {
// Compensation logic for saga rollback
const { transactionId, gatewayReference } = event.payload;
await this.paymentGateway.refund(gatewayReference);
await this.eventBus.publish('PaymentRolledBack', {
transactionId,
timestamp: new Date().toISOString()
});
}
}
This code integrates cleanly because the AI understood the architectural context. During review, you can verify it follows architectural patterns instead of explaining why direct HTTP calls don't belong here.
🎯 Key Principle: Architectural context turns AI from a code generator into an architecture-aware code generator. The review shifts from "Does this fit our system?" to "Does this correctly implement our patterns?"
💡 Mental Model: Think of architectural context as the "blueprint" and reviewability constraints as the "building code." The AI needs both: the blueprint tells it what to build, the building code tells it how to build it reviewably.
Pitfall 5: The 'Magic Prompt' Fallacy
Perhaps the most seductive pitfall is the belief that somewhere out there exists a "magic prompt"—a perfectly crafted set of instructions that will make AI generate flawless, reviewable code forever. Developers spend hours crafting and refining prompts, seeking this Holy Grail.
⚠️ Common Mistake 5: One-Shot Perfection Seeking ⚠️
❌ Wrong thinking: "If I can just craft the perfect prompt, I'll never have to iterate or provide feedback again."
✅ Correct thinking: "Effective AI training is a system of consistent practices, not a single perfect prompt."
The magic prompt fallacy manifests in several ways:
Symptom 1: Prompt Hoarding Developers collect and save hundreds of prompts, trying to find the "perfect one" for each situation. They spend more time searching their prompt library than actually training the AI.
Symptom 2: Prompt Engineering Rabbit Holes Hours spent tweaking a prompt to get it "just right," adding more and more clauses and conditions, until the prompt itself becomes unmaintainable and hard to review.
Symptom 3: Prompt Cargo Culting Copying prompts from online sources without understanding the underlying principles, then wondering why they don't work in your context.
Here's the reality:
Magic Prompt Belief Reality of AI Training
=================== ======================
One perfect prompt Systematic feedback loop
↓ ↓
Perfect code Initial generation
↓ ↓
Done! Review feedback
↓
Refinement
↓
Learning accumulation
↓
Improving results over time
The difference is profound. The magic prompt approach assumes you can frontload all knowledge into a single prompt. The systematic approach recognizes that teaching happens through iteration.
🤔 Did you know? Research on human learning shows that spaced repetition and feedback are far more effective than trying to absorb everything at once. The same principle applies to training AI systems—consistent reinforcement of principles across multiple interactions beats trying to encode everything in one prompt.
💡 Real-World Example: A senior developer at a SaaS company spent two weeks crafting what he called "the ultimate React component prompt." It was 47 lines long and covered everything from state management to accessibility. It worked brilliantly... for about three components. Then requirements shifted slightly (they needed components with different state patterns), and the prompt generated code that didn't fit. He had optimized for one specific case, not taught transferable principles.
He switched to a simpler approach:
- Basic prompt with core constraints (10 lines)
- Generate initial component
- Provide specific feedback on what makes it hard to review
- Regenerate with improvements
- Document the feedback pattern that worked
Within a month, the AI was generating reviewable components with minimal iteration because it had learned the review principles through consistent feedback.
The Compounding Effect of Multiple Pitfalls
These pitfalls don't exist in isolation—they compound. When you over-specify implementation details (Pitfall 1) AND accept the first output (Pitfall 2) AND apply constraints inconsistently (Pitfall 3) AND omit architectural context (Pitfall 4) AND chase magic prompts (Pitfall 5), you create a complexity cascade:
Week 1: Over-specified implementation
↓
AI learns specific solutions, not principles
↓
Week 2: Inconsistent constraints on new features
↓
Style fragmentation begins
↓
Week 3: Missing architectural context
↓
Integration problems emerge
↓
Week 4: Seeking magic prompt to fix everything
↓
More time prompt engineering than training
↓
Week 6: Review time exceeds generation time savings
↓
AI becomes net negative for productivity
The good news? Each pitfall has a clear antidote:
| ⚠️ Pitfall | ✅ Antidote | 🎯 Core Practice |
|---|---|---|
| 🚫 Over-specification | Constraint-based prompting | Specify review requirements, not implementations |
| 🚫 One-and-done | Feedback loops | Always review and teach on first output |
| 🚫 Inconsistent constraints | Constraint templates | Reuse constraint sets for similar code |
| 🚫 Missing context | Architectural briefings | Include system context in every prompt |
| 🚫 Magic prompt seeking | Systematic training | Build feedback patterns, not perfect prompts |
Recognizing Pitfalls in Real-Time
The key to avoiding these pitfalls is recognizing when you're falling into them. Here are the warning signs:
🚨 Warning Sign 1: Your prompts are getting longer If your prompts keep growing (adding more specific instructions each time), you're likely over-specifying. Solution: Extract the underlying review principle and specify that instead.
🚨 Warning Sign 2: You're copying generated code without conversation If you're not having a back-and-forth dialogue with the AI about what makes code reviewable, you're in vending machine mode. Solution: Make it a rule to provide feedback on at least the first generation of any new code type.
🚨 Warning Sign 3: Each new feature feels like starting from scratch If the AI doesn't seem to "remember" your standards across features, you're applying constraints inconsistently. Solution: Create and reuse constraint templates.
🚨 Warning Sign 4: Integration is always a surprise If generated code frequently needs refactoring to fit your system, you're missing architectural context. Solution: Write a system architecture brief and include relevant parts in every prompt.
🚨 Warning Sign 5: You're searching for better prompts more than training the AI If you spend more time reading prompt engineering guides than actually working with your AI, you're chasing the magic prompt. Solution: Focus on building consistent feedback patterns.
Moving Forward: From Pitfalls to Practices
Avoiding these pitfalls isn't about perfection—it's about awareness and course correction. The developers who succeed with AI code generation aren't the ones who never make these mistakes; they're the ones who recognize them quickly and adjust.
Think of it like learning to drive. At first, you might over-correct when the car drifts, or forget to check mirrors consistently, or try to find the "perfect" way to hold the steering wheel. Over time, you develop muscle memory—practices that become automatic. The same applies to training AI for code generation.
The practices that prevent these pitfalls become second nature:
- You naturally think in terms of reviewability constraints rather than implementation details
- Feedback loops become automatic—you wouldn't dream of accepting first output without review
- Constraint templates become part of your workflow
- Architectural context feels as essential as specifying the programming language
- You focus on building teaching systems rather than seeking magic solutions
In the next section, we'll synthesize these lessons into a practical framework you can implement immediately, transforming these insights into a systematic approach to training AI for reviewable code generation.
Building Your AI Teaching System: Key Takeaways and Next Steps
You've reached the end of this foundational journey into teaching AI to generate reviewable code. Before this lesson, you likely approached AI code generation as a black box—you'd prompt, receive code, and struggle through reviews of sprawling, opaque outputs. Now you understand something fundamentally different: AI code generation is a teaching relationship, not a vending machine transaction. The quality of what AI produces directly reflects the clarity of constraints, standards, and feedback you provide.
This final section consolidates everything you've learned into an actionable framework you can implement immediately. We'll build your personal AI teaching system, complete with checklists, evaluation criteria, and connection points to the advanced techniques you'll need as you scale your AI-augmented development practice.
The Three-Phase Framework: Your Foundation for Reviewable Code
Every interaction with AI for code generation should follow this proven three-phase approach. Think of it as the scientific method for AI-augmented development:
PHASE 1: Define PHASE 2: Specify PHASE 3: Iterate
[Reviewability Standards] [Constraints] [Feedback Loop]
| | |
v v v
"What makes "How should AI "What needs
code reviewable structure the improvement
in this context?" output?" this time?"
| | |
+------------------------+---------------------------+
|
Reviewable AI Code
Phase 1: Define Reviewability Standards
Before you write a single prompt, establish what "reviewable" means for your current task. This varies by context—reviewing a database migration differs from reviewing a React component. Your standards should address:
🎯 Cognitive Load Limits: How much can you hold in working memory? For most developers, this means:
- Functions under 20 lines
- Files under 300 lines
- No more than 3 levels of nesting
- Maximum 4 function parameters
🎯 Verifiability Requirements: What can you verify quickly?
- Pure functions over stateful ones
- Explicit over implicit behavior
- Clear input/output contracts
- Testable units
🎯 Change Impact Visibility: Can you see what else might break?
- Clear dependency declarations
- Obvious side effects
- Explicit error paths
- Documented assumptions
💡 Real-World Example: When building an authentication system, your reviewability standard might be: "I need to verify security implications in under 5 minutes per file. This requires: explicit permission checks, no hidden state mutations, clear error handling, and obvious data flow from request to response."
Phase 2: Specify Constraints
With standards defined, translate them into explicit constraints that guide AI generation. These constraints form the contract between you and the AI:
Structural Constraints control the shape of code:
## Example constraint specification in your prompt
"""
Generate a user authentication module with these structural constraints:
1. FUNCTION SIZE: Maximum 15 lines per function
2. FILE ORGANIZATION: Separate files for:
- auth_core.py (authentication logic)
- auth_validation.py (input validation)
- auth_storage.py (token management)
3. DEPENDENCY INJECTION: Pass dependencies as parameters, no globals
4. ERROR HANDLING: Explicit return types: Result[User, AuthError]
"""
Stylistic Constraints ensure consistency:
// Example stylistic constraints for React components
/*
Generate components following these style rules:
1. NAMING: PascalCase for components, camelCase for utilities
2. STRUCTURE: Props interface first, component second, exports last
3. HOOKS: Maximum 3 hooks per component, extract more to custom hooks
4. COMMENTS: JSDoc for all exported functions, inline for complex logic only
5. RETURNS: Early returns for error cases, main logic without deep nesting
*/
Architectural Constraints maintain system coherence:
- Layering rules (UI → Service → Data)
- Communication patterns (events vs. direct calls)
- State management approach (local vs. global)
- Error propagation strategy
🎯 Key Principle: The more specific your constraints, the less rework you'll do during review. Vague guidance produces vague code.
Phase 3: Iterate with Feedback
No AI generates perfect code on the first try. The iteration phase is where teaching happens. Each feedback cycle should:
- Identify the gap: What specific aspect makes code hard to review?
- Articulate the problem: Why does this reduce reviewability?
- Provide corrective guidance: How should it be structured instead?
- Verify understanding: Does the next iteration show learning?
// First iteration output from AI - hard to review:
function processUserData(data: any) {
if (data) {
const result = data.users.map(u => {
if (u.active) {
return { ...u, status: u.lastLogin > Date.now() - 86400000 ? 'online' : 'offline' };
}
return u;
}).filter(u => u.status);
return result;
}
}
// Your feedback:
/*
"This function has several reviewability issues:
1. Type safety: 'any' hides the data contract
2. Magic number: 86400000 is unclear
3. Nested logic: Map + conditional + object spread is hard to trace
4. Implicit filtering: The final filter's purpose is unclear
Restructure with:
- Explicit types for input/output
- Named constants for time values
- Separate functions for each transformation
- Clear naming that shows intent
"
*/
// Second iteration after feedback - reviewable:
interface User {
id: string;
active: boolean;
lastLogin: number;
}
interface EnrichedUser extends User {
status: 'online' | 'offline';
}
const ONE_DAY_MS = 86400000;
function isRecentlyActive(lastLogin: number): boolean {
return lastLogin > Date.now() - ONE_DAY_MS;
}
function enrichUserWithStatus(user: User): EnrichedUser {
return {
...user,
status: isRecentlyActive(user.lastLogin) ? 'online' : 'offline'
};
}
function processActiveUsers(data: { users: User[] }): EnrichedUser[] {
return data.users
.filter(user => user.active)
.map(enrichUserWithStatus);
}
💡 Pro Tip: Save your feedback patterns. When you give the same type of feedback repeatedly, convert it into a reusable constraint specification. This builds your teaching library over time.
Evaluation Checklist: Does AI Understand Your Requirements?
How do you know when your teaching is working? Use this evaluative checklist after each generation cycle:
📋 Quick Reference Card: AI Understanding Assessment
| Category | ✅ Pass Criteria | ❌ Fail Indicators |
|---|---|---|
| 🔍 Scanability | Can understand code purpose in 30 seconds | Need to trace execution to understand intent |
| 📏 Size Compliance | Functions/files within specified limits | Exceeds length constraints |
| 🔗 Dependency Clarity | All imports and dependencies explicit | Hidden dependencies or global state |
| 🎯 Error Visibility | Error cases obvious and handled | Errors hidden in nested logic |
| 📝 Documentation | Comments explain "why", code shows "what" | Comments explain obvious code |
| 🧪 Testability | Can mentally simulate test cases | Would need to run code to understand |
| 🏗️ Structure Match | Follows specified architectural patterns | Introduces new patterns without discussion |
| 🔄 Consistency | Matches existing codebase conventions | Uses inconsistent naming/structure |
Scoring Guide:
- ✅ 7-8 passes: AI understands your requirements well
- ✅ 5-6 passes: AI needs more specific constraints
- ❌ 0-4 passes: Reset and clarify your standards
⚠️ Critical Point: If you consistently score below 5, the problem isn't the AI—it's the clarity of your constraint specification. Return to Phase 1 and make your standards more concrete.
Practical Exercise: Rate This Generated Code
Let's apply the checklist to a real example:
def process_payment(user_id, amount, payment_method="card"):
user = db.query(f"SELECT * FROM users WHERE id = {user_id}")
if user["balance"] >= amount:
if payment_method == "card":
charge_result = stripe.charge(user["card_token"], amount)
if charge_result:
db.execute(f"UPDATE users SET balance = {user['balance'] - amount}")
send_email(user["email"], "Payment successful")
return True
elif payment_method == "balance":
db.execute(f"UPDATE users SET balance = {user['balance'] - amount}")
return True
return False
Your Assessment:
- 🔍 Scanability: ❌ Multiple concerns mixed (validation, charging, updating, notification)
- 📏 Size: ❌ 14 lines but doing too much
- 🔗 Dependencies: ❌ Hidden dependencies (db, stripe, send_email are globals)
- 🎯 Error Visibility: ❌ Silent failures, unclear what False means
- 📝 Documentation: ❌ No docstring, unclear business rules
- 🧪 Testability: ❌ Can't test without database and Stripe
- 🏗️ Structure: ❌ SQL injection vulnerability, no separation of concerns
- 🔄 Consistency: ? (depends on existing codebase)
Score: 0-1 passes → This needs complete restructuring with explicit constraints.
Building Your Personal Constraint Library
As you iterate with AI, you'll discover constraint patterns that work for your style and domain. Build a personal library by categorizing successful specifications:
🗂️ Library Structure
1. Domain-Specific Constraints
- Web API endpoints
- Database migrations
- UI components
- Data processing pipelines
- Background jobs
2. Language-Specific Patterns
- Python: Type hints, context managers, dataclasses
- JavaScript: Async patterns, error boundaries, hooks
- Go: Error handling, interface design, concurrency
3. Review Context Templates
- Quick review (under 5 minutes)
- Security-critical review
- Performance-sensitive code
- Public API changes
💡 Pro Tip: Store your constraint library as prompt templates in your IDE or as snippets. A well-organized library reduces setup time from 10 minutes to 30 seconds.
Example Template Structure:
## Template: REST API Endpoint (Express.js)
### Context
Generate a REST API endpoint for [RESOURCE] in Express.js
### Reviewability Requirements
- Must verify security in under 3 minutes
- Must trace request flow without running code
- Must identify side effects immediately
### Constraints
#### Structure
- Separate route handler from business logic
- Maximum 3 middleware functions per route
- Request validation in dedicated middleware
- Business logic in service layer
#### Error Handling
- Use custom error classes (ValidationError, NotFoundError, etc.)
- Centralized error middleware
- Explicit status codes (no magic numbers)
#### Testing Hooks
- Inject dependencies (no direct imports of services)
- Pure business logic functions
- Mockable external calls
#### Example Structure
```javascript
// routes/[resource].js
// middleware/validate[Resource].js
// services/[resource]Service.js
// errors/[Resource]Errors.js
Success Criteria
- Can identify all data validation points
- Can trace error paths without running
- Can identify all database operations
- Can verify authorization checks
🤔 **Did you know?** Developers who maintain constraint libraries report **60% faster review times** after just 2-3 weeks of use. The library becomes your personal AI training dataset.
<div class="lesson-flashcard-placeholder" data-flashcards="[{"q":"What score indicates AI understands your requirements well?","a":"7-8 passes"},{"q":"What are the four main constraint categories?","a":"Structural, Stylistic, Architectural"},{"q":"What should comments explain in reviewable code?","a":"Why not what"}]" id="flashcard-set-16"></div>
#### Measuring Success: Metrics That Matter
You can't improve what you don't measure. Track these **key performance indicators** for your AI teaching system:
##### Primary Metrics
**1. Review Time Per Generation**
Baseline (no teaching): 45-60 minutes per significant code block After 2 weeks teaching: 20-30 minutes per block After 1 month teaching: 10-15 minutes per block Mature teaching practice: 5-10 minutes per block
📊 Track this weekly. If review time isn't decreasing, your constraints need refinement.
**2. Iteration Count to Acceptable Code**
Initial attempts: 4-6 iterations average Developing teaching: 2-3 iterations average Established patterns: 1-2 iterations average
🎯 **Key Principle**: Fewer iterations means better teaching. If you're stuck at 4+ iterations, you're not providing clear enough constraints upfront.
**3. Confidence Level in Generated Code**
Rate your confidence on a 1-10 scale before running tests:
- **1-4**: Would need significant testing to trust
- **5-7**: Moderately confident, needs validation
- **8-10**: High confidence, minimal verification needed
**Target**: Average confidence of 7+ after 1 month of deliberate teaching practice.
##### Secondary Metrics
**4. Constraint Reuse Rate**
What percentage of your prompts reuse constraint templates vs. writing new constraints?
Week 1: 10% reuse (building library) Week 4: 40% reuse (patterns emerging) Week 8: 70% reuse (mature library)
Higher reuse means you've identified your core reviewability patterns.
**5. Bug Detection Speed**
How quickly can you spot issues during review?
- ⏱️ **Initial**: 15+ minutes to find first issue
- ⏱️ **Improving**: 5-10 minutes to first issue
- ⏱️ **Proficient**: 2-5 minutes to first issue
Faster detection means code is structured for reviewability.
**6. Production Bug Rate**
The ultimate measure—bugs found after deployment:
Target: <2 bugs per 1000 lines of AI-generated code
If your bug rate exceeds this, your reviewability standards aren't catching real issues.
💡 **Mental Model**: Think of these metrics like a **teaching effectiveness scorecard**. You're not measuring the AI—you're measuring how well you've taught it.
#### Connecting to Advanced Techniques
The teaching system you've built here is the foundation for more sophisticated AI-augmented workflows. Here's how today's lesson connects to what you'll learn next:
##### Context Management (Advanced Topic)
Your constraint specifications are a form of **context**. As projects scale, you'll need:
- 📚 **Context Compression**: Distilling 50+ constraints into efficient prompts
- 🔄 **Context Persistence**: Maintaining AI's understanding across sessions
- 🎯 **Context Prioritization**: Which constraints matter most for each task?
- 🧩 **Context Composition**: Combining multiple constraint sets intelligently
**Bridge**: The personal constraint library you're building now becomes your context management database. Each template is a reusable context block.
##### Workflow Optimization (Advanced Topic)
Once AI consistently generates reviewable code, you'll optimize:
- ⚡ **Generate-Review-Deploy Cycles**: From hours to minutes
- 🔀 **Parallel Generation**: Multiple AI instances working simultaneously
- 🤖 **Automated Review Gates**: AI reviewing AI with your constraints
- 📊 **Quality Dashboards**: Real-time monitoring of generation quality
**Bridge**: The metrics you're tracking now (review time, iterations, confidence) become your optimization targets. You can't optimize workflow without baseline measurements.
##### Team Collaboration Patterns (Advanced Topic)
Sharing AI teaching approaches across teams requires:
- 📖 **Constraint Standardization**: Team-wide reviewability standards
- 🔄 **Teaching Pattern Libraries**: Shared templates and specifications
- 👥 **Peer Review of Prompts**: Reviewing constraint specs, not just code
- 📈 **Collective Learning**: Capturing what works across team members
**Bridge**: Your personal constraint library becomes a team asset. The evaluation checklist becomes your code review rubric.
YOUR CURRENT SKILL NEXT LEVEL SKILLS
|
| [Teaching AI for]
| [Reviewable Code]
|
+------------------+------------------+
| | |
v v v
[Context Mgmt] [Workflow Opt] [Team Collab]
| | |
| | |
v v v
Scaling to Automating Multiplying
Large Systems Your Practice Team Impact
⚠️ **Critical Point**: Don't jump to advanced techniques until your **iteration count averages 2 or less**. Optimizing a poorly-taught system just makes you fail faster.
#### Your Next Steps: The 30-Day Implementation Plan
Here's a concrete roadmap to implement everything you've learned:
##### Week 1: Foundation Building
🎯 **Goal**: Establish baseline metrics and create first constraint templates
**Daily Actions**:
- Day 1-2: Document your current review process and measure baseline review time
- Day 3-4: Create 3 constraint templates for your most common tasks
- Day 5-7: Use the three-phase framework on every AI generation, tracking results
**Success Indicator**: You have baseline metrics and 3 working templates
##### Week 2: Pattern Recognition
🎯 **Goal**: Identify what makes code reviewable in your domain
**Daily Actions**:
- Review each AI-generated code block using the evaluation checklist
- Document which constraints produce best results
- Add 2-3 new templates to your library
- Begin tracking iteration counts
**Success Indicator**: Iteration count decreases from baseline by 20-30%
##### Week 3: Refinement
🎯 **Goal**: Optimize constraint specifications based on results
**Daily Actions**:
- Revise constraint templates based on what worked/didn't work
- Combine successful constraint patterns
- Focus on consistency—use same templates for similar tasks
- Start measuring confidence levels
**Success Indicator**: Review time decreases by 30-40% from baseline
##### Week 4: Systematization
🎯 **Goal**: Make your teaching system automatic
**Daily Actions**:
- Organize constraint library by task type and context
- Create quick-reference shortcuts for most-used templates
- Document your personal reviewability standards
- Share successful patterns with team members
**Success Indicator**: 60%+ of prompts use template constraints, confidence averaging 7+
#### Practical Applications Starting Today
**Application 1: Refactor Your Next AI Interaction**
The next time you ask AI to generate code:
1. ⏸️ **Pause** before writing the prompt
2. 📝 **Define** what makes the code reviewable for this specific task
3. 🎯 **Specify** 3-5 concrete constraints
4. 🔄 **Use** the evaluation checklist after generation
5. 📊 **Track** your review time
This single change will reduce review time by 25-40% immediately.
**Application 2: Audit Your Last Week of AI-Generated Code**
Look back at code AI generated in the past week:
- How much time did you spend reviewing?
- How many iterations did each piece require?
- What made review difficult?
- What would have helped?
Extract 2-3 constraint specifications from your findings. These become your first templates.
**Application 3: Create Your First Constraint Template**
Pick your most frequent coding task. Right now, write:
```markdown
## Template: [Task Name]
### Reviewability Requirements
- [What do I need to verify quickly?]
- [What are the highest-risk areas?]
- [What makes this code hard to review?]
### Structural Constraints
- [Size limits]
- [Organization rules]
- [Dependency management]
### Quality Gates
- [ ] [Specific thing I can check in 30 seconds]
- [ ] [Another verifiable criterion]
- [ ] [Third verification point]
Fill this out for one task today. You've just created your first reusable teaching tool.
What You've Gained
Let's recap your transformation through this lesson:
Before This Lesson, You:
- ❌ Struggled with reviewing sprawling AI-generated code
- ❌ Spent 45-60 minutes per code review, often missing issues
- ❌ Required 4-6 iterations to get acceptable code
- ❌ Treated AI code generation as unpredictable
- ❌ Had low confidence in AI outputs without extensive testing
After This Lesson, You:
- ✅ Understand code reviewability as teachable specifications
- ✅ Have a three-phase framework (define, specify, iterate)
- ✅ Can evaluate if AI understands your requirements
- ✅ Know how to build a personal constraint library
- ✅ Can measure teaching effectiveness with concrete metrics
- ✅ See the path to 5-10 minute review times
Final Critical Points to Remember
⚠️ Teaching AI is a skill that compounds. Every constraint you specify, every feedback cycle you complete, makes future interactions better. Invest in building your constraint library—it's infrastructure for your career.
⚠️ Reviewability is context-dependent. The constraints that make a database migration reviewable differ from those for a UI component. Don't seek universal rules; build contextual templates.
⚠️ Measure religiously, but start simple. Begin with just review time and iteration count. Add more metrics as your practice matures. Data reveals what works.
⚠️ Your constraint library is your competitive advantage. Developers who can consistently teach AI to generate reviewable code in 1-2 iterations are 5-10x more productive than those who can't. This isn't just efficiency—it's career survival.
⚠️ Start today, not tomorrow. Your next AI interaction is an opportunity to practice. Use the three-phase framework once, and you'll immediately see the difference.
The Path Forward
You now have the foundational system for surviving—and thriving—as a developer when most code is generated by AI. The three-phase framework, evaluation checklist, and metrics dashboard you've learned here will serve you throughout your career.
But this is just the beginning. As your constraint library grows and your teaching becomes more sophisticated, you'll discover that the bottleneck shifts from code generation to context management. How do you maintain AI's understanding across multiple files? How do you compose constraints for complex, multi-component features? How do you teach architectural patterns that span dozens of files?
These questions lead to the advanced topics: context management strategies, workflow optimization techniques, and team collaboration patterns. With your teaching foundation solid, you're ready to explore how to scale your AI-augmented development practice from individual files to entire systems.
The future of software development isn't about writing less code—it's about teaching better and reviewing smarter. You've taken the first, most crucial step on that journey.
Now go build your constraint library. Your future self—and your code reviewers—will thank you.
🎯 Your immediate action items:
- Create your first constraint template today (10 minutes)
- Measure your next AI code review time (baseline data)
- Use the three-phase framework on your next generation task
Welcome to the new paradigm of AI-augmented development. You're ready.