Field Notes from Shipping Real Code with Claude

Wait 5 sec.

Published on June 14, 2025 4:36 PM GMTVibe Coding Isn’t Just a VibeShimmering Substance - Jackson Pollock Think of this post as your field guide to a new way of building software. Let me take you back to when this all started. Andrej Karpathy tweeted about "vibe-coding"—this idea of letting AI write your code while you just vibe. The developer community had a good laugh. It sounded like the ultimate developer fantasy: kick back, sip coffee, let the machines do the work.Then Anthropic released Sonnet 3.7 and Claude Code, and something unexpected happened. The joke stopped being funny because it started being… possible? Of course, our trusty friend Cursor had been around awhile but this new interface finally felt like true vibe coding.Vibe-coding is more like conducting an orchestra. You’re not playing every instrument—you’re directing, shaping, guiding. The AI provides the raw musical talent, but without your vision, it’s just noise.There are three distinct postures you can take when vibe-coding, each suited to different phases in the development cycle:AI as First-Drafter: Here, AI generates initial implementations while you focus on architecture and design. It’s like having a junior developer who can type at the speed of thought but needs constant guidance. Perfect for boilerplate, CRUD operations, and standard patterns.AI as Pair-Programmer: This is the sweet spot for most development. You’re actively collaborating, bouncing ideas back and forth. The AI suggests approaches, you refine them. You sketch the outline, AI fills in details. It’s like pair programming with someone who has read every programming book ever written but has never actually shipped code.AI as Validator: Sometimes you write code and want a sanity check. AI reviews for bugs, suggests improvements, spots patterns you might have missed. Think of it as an incredibly well-read code reviewer who never gets tired or cranky. The Three Modes of Vibe-Coding: A Practical FrameworkAfter months of experimentation and more than a few production incidents, I’ve settled on three distinct modes of operation. Each has its own rhythm, its own guardrails, and its own use cases.Mode 1: The PlaygroundLighter FluidWhen to use it: Weekend hacks, personal scripts, proof-of-concepts, and those “I wonder if…” moments that make programming fun.In Playground Mode, you embrace the chaos. Claude writes 80-90% of the code while you provide just enough steering to keep things on track. It’s liberating and slightly terrifying. Pro Tip: check out claude-composer for going full-YOLO mode.Here’s what Playground Mode looks like: You have an idea for a script to analyze your Spotify history. You open Claude, describe what you want in plain English, and watch as it generates a complete solution. No CLAUDE.md file, no careful prompting—just raw, unfiltered AI-written code.The beauty of Playground Mode is its speed. You can go from idea to working prototype in minutes. The danger is that this cowboy coding style is absolutely inappropriate for anything that matters. Use it for experiments, never for production. Trust me, while the amazing folks preaching otherwise, good engineering principles still matter, now more than ever.Mode 2: Pair ProgrammingCompilingWhen to use it: Projects under ~5,000 lines of code, side projects with real users, demos (you don’t want to break), or well-scoped small services in larger systems.This is where vibe-coding starts to shine. You need structure, but not so much that it slows you down. The key innovation here is the CLAUDE.md file—custom documentation that Claude automatically reads when invoked. From Anthropic’s Best practices for Claude Code:CLAUDE.md is a special file that Claude automatically pulls into context when starting a conversation:Common bash commandsCore files and utility functionsCode style guidelinesTesting instructionsRepository etiquette (e.g., branch naming, merge vs. rebase, etc.)Other information you want Claude to rememberInstead of repeatedly explaining your project’s conventions, you document them once. Here’s a real example from a recent side project:## Project: Analytics Dashboard This is a Next.js dashboard for visualizing user analytic: ### Architecture Decisions - Server Components by default, Client Components only when necessary - tRPC for type-safe API calls - Prisma for database access with explicit select statements - Tailwind for styling (no custom CSS files) ### Code Style - Formatting: Prettier with 100-char lines - Imports: sorted with simple-import-sort - Components: Pascal case, co-located with their tests - Hooks: always prefix with 'use' ### Patterns to Follow - Data fetching happens in Server Components - Client Components receive data as props - Use Zod schemas for all external data - Error boundaries around every data display component ### What NOT to Do - Don't use useEffect for data fetching - Don't create global state without explicit approval - Don't bypass TypeScript with 'any' types  With this context, Claude becomes remarkably effective. It’s like the difference between explaining your project to a new hire every single day versus having them read the onboarding docs once.But Pair Programming Mode requires more than just documentation. You need to actively guide the AI with what I call “anchor comments”—breadcrumbs that prevent Claude from wandering into the wilderness:// AIDEV-NOTE: This component uses virtual scrolling for performance // See: https://tanstack.com/virtual/latest // Don't convert to regular mapping—we handle 10k+ items export function DataTable({ items }: DataTableProps) { // Claude, when you edit this, maintain the virtual scrolling ... } These comments serve a dual purpose: they guide the AI and document your code for humans. It’s documentation that pays dividends in both directions. The key distinction between such “anchor comments” and regular comments: these are written, maintained, and meant to be used by Claude itself. Here’s an actual snippet from our project’s CLAUDE.md:## Anchor comments Add specially formatted comments throughout the codebase, where appropriate, for yourself as inline knowledge that can be easily `grep`ped for. ### Guidelines: - Use `AIDEV-NOTE:`, `AIDEV-TODO:`, or `AIDEV-QUESTION:` (all-caps prefix) for comments aimed at AI and developers. - Keep them concise (≤ 120 chars). - **Important:** Before scanning files, always first try to **locate existing anchors** `AIDEV-*` in relevant subdirectories. - **Update relevant anchors** when modifying associated code. - **Do not remove `AIDEV-NOTE`s** without explicit human instruction. Example: # AIDEV-NOTE: perf-hot-path; avoid extra allocations (see ADR-24) async def render_feed(...): ... Mode 3: Production/Monorepo ScaleRTFMWhen to use it: Large codebases, systems with real users, anything where bugs cost money or reputation.Claude can generate tremendous amounts of code, but integrating it into a complex system requires careful orchestration.Let me start with a big caveat: vibe coding at this scale does NOT scale very well, yet. I definitely do see these systems getting significantly better at handling larger codebases but, for them to be effective, significant effort is needed to help them navigate, understand, and safely hack on them without getting lost in a maze. Generally speaking, it’s better to section them into individual services, and 6sub modules when possible.As a universal principle, good engineering practices apply to large-scale projects, vibe coded or not. For example, at production scale, boundaries become critical. Every integration point needs explicit documentation:# AIDEV-NOTE: API Contract Boundary - v2.3.1 # ANY changes require version bump and migration plan # See: docs/api-versioning.md @router.get("/users/{user_id}/feed") async def get_user_feed(user_id: UUID) -> FeedResponse: # Claude: the response shape here is sacred # Changes break real apps in production ... Without these boundaries, Claude will happily “improve” your API and break every client in production. Bottom line: larger projects should definitely start adopting vibe coding in parts, and adopt methodologies that enhance that experience but, don’t expect to land large features reliably just yet. (as of June 7, 2025 / AI epoch) Infrastructure: The Foundation of Sustainable AI DevelopmentCLAUDE.md: Your Single Source of TruthLet me be absolutely clear about this: CLAUDE.md is not optional documentation. Every minute you spend updating it saves an hour of cleanup later.Think of CLAUDE.md as a constitution for your codebase. It establishes the fundamental laws that govern how code should be written, how systems interact, and what patterns to follow or avoid. Organizations that invest in developing the skills and capabilities of their teams get better outcomes—and your CLAUDE.md is that investment crystallized into documentation.Here’s an abridged version of our production CLAUDE.md structure, refined over thousands of AI-assisted commits:# `CLAUDE.md` - Julep Backend Service ## The Golden Rule When unsure about implementation details, ALWAYS ask the developer. ## Project Context Julep enables developers to build stateful AI agents using declarative workflows. ## Critical Architecture Decisions ### Why Temporal? We use Temporal for workflow orchestration because: 1. Workflows can run for days/weeks with perfect reliability 2. Automatic recovery from any failure point ### Why PostgreSQL + pgvector? 1. ACID compliance for workflow state (can't lose user data) 2. Vector similarity search for agent memory ### Why TypeSpec? Single source of truth for API definitions: - OpenAPI specs - TypeScript/Python clients - Validation schemas ## Code Style and Patterns ### Anchor comments Add specially formatted comments throughout the codebase, where appropriate, for yourself as inline knowledge that can be easily `grep`ped for. ### Guidelines: - Use `AIDEV-NOTE:`, `AIDEV-TODO:`, or `AIDEV-QUESTION:` (all-caps prefix) for comments aimed at AI and developers. - **Important:** Before scanning files, always first try to **grep for existing anchors** `AIDEV-*` in relevant subdirectories. - **Update relevant anchors** when modifying associated code. - **Do not remove `AIDEV-NOTE`s** without explicit human instruction. - Make sure to add relevant anchor comments, whenever a file or piece of code is: * too complex, or * very important, or * confusing, or * could have a bug ## Domain Glossary (Claude, learn these!) - **Agent**: AI entity with memory, tools, and defined behavior - **Task**: Workflow definition composed of steps (NOT a Celery task) - **Execution**: Running instance of a task - **Tool**: Function an agent can call (browser, API, etc.) - **Session**: Conversation context with memory - **Entry**: Single interaction within a session ## What AI Must NEVER Do 1. **Never modify test files** - Tests encode human intent 2. **Never change API contracts** - Breaks real applications 3. **Never alter migration files** - Data loss risk 4. **Never commit secrets** - Use environment variables 5. **Never assume business logic** - Always ask 6. **Never remove AIDEV- comments** - They're there for a reason Remember: We optimize for maintainability over cleverness. When in doubt, choose the boring solution. This document becomes the shared context between you and Claude. It’s like having a senior developer whispering guidance in Claude’s ear throughout the coding session.Anchor Comments: Breadcrumbs at ScaleAs your codebase grows, CLAUDE.md alone isn’t enough. You need inline guidance—what I call anchor comments. These serve as local context that prevents AI from making locally bad decisions.Think of your codebase as a city and anchor comments as street signs. Without them, even smart visitors get lost. Here’s how we use them effectively:# AIDEV-NOTE: Critical performance path - this serves 100k req/sec # DO NOT add database queries here def get_user_feed(user_id: UUID, cached_data: FeedCache) -> List[FeedItem]: # We need to avoid mutating the cached data items = cached_data.items[:] # AIDEV-TODO: Implement pagination (ticket: FEED-123) # Need cursor-based pagination for infinite scroll # AIDEV-QUESTION: Why do we filter private items here instead of in cache? # AIDEV-ANSWER: Historical context: Privacy rules can change between cache updates filtered = [item for item in items if user_has_access(user_id, item)] return filtered These comments create a narrative that helps both AI and humans understand not just what the code does, but why it does it that way.Git Workflows for AI DevelopmentOne of the most underappreciated aspects of AI-assisted development is how it changes your git workflow. You’re now generating code at a pace that can quickly pollute your git history if you’re not careful.It really only applies to very large codebases because it is not a very straightforward tool, but I recommend using git worktrees to create isolated environments for AI experiments:# Create an AI playground without polluting main git worktree add ../ai-experiments/cool-feature -b ai/cool-feature # Let Claude go wild in the isolated worktree cd ../ai-experiments/cool-feature # ... lots of experimental commits ... # Cherry-pick the good stuff back to main cd ../main-repo git cherry-pick abc123 # Just the commits that worked # Clean up when done git worktree remove ../ai-experiments/cool-feature Pro tip: Read about how to use worktrees, and check out the nifty wt tool.This approach gives you the best of both worlds: Claude can experiment freely while your main branch history stays clean and meaningful.For commit messages, we’ve standardized on tagging AI-assisted commits:feat: implement user feed caching [AI] - Add Redis-based cache for user feeds - Implement cache warming on user login - Add metrics for cache hit rate AI-assisted: core logic generated, tests human-written This transparency helps during code review—reviewers know to pay extra attention to AI-generated code. The Sacred Rule: Humans Write TestsNow we come to the most important principle in AI-assisted development. It’s so important that I’m going to repeat it in multiple ways until it’s burned into your memory:Never. Let. AI. Write. Your. Tests.Tests are not just code that verifies other code works. Tests are executable specifications. They encode your actual intentions, your edge cases, your understanding of the problem domain. High performers excel at both speed and stability—there’s no trade-off. Tests are how you achieve both.Beware…Let me illustrate why this matters with an example. Let’s say we asked Claude to implement a rate limiter:class RateLimiter: def __init__(self, max_requests: int, window_seconds: int): self.max_requests = max_requests self.window_seconds = window_seconds self.requests = defaultdict(list) def is_allowed(self, user_id: str) -> bool: now = time.time() user_requests = self.requests[user_id] # Clean old requests self.requests[user_id] = [ req_time for req_time in user_requests if now - req_time < self.window_seconds ] if len(self.requests[user_id]) < self.max_requests: self.requests[user_id].append(now) return True return False Looks reasonable, right? Claude even helpfully generated tests:def test_rate_limiter(): limiter = RateLimiter(max_requests=3, window_seconds=60) assert limiter.is_allowed("user1") == True assert limiter.is_allowed("user1") == True assert limiter.is_allowed("user1") == True assert limiter.is_allowed("user1") == False # Limit reached But here’s what Claude’s tests missed—what only a human who understands the business requirements would test: Claude’s implementation has a memory leak. Users who hit the API once and never return leave their data in memory forever. The AI-generated tests check the happy path but miss this critical production concern.Vibe coding at its bestThis is why humans write tests. We understand the context, the production environment, the edge cases that matter. At Julep, our rule is absolute:## Testing Discipline | What | AI CAN Do | AI MUST NOT Do | |------|-----------|----------------| | Implementation | Generate business logic | Touch test files | | Test Planning | Suggest test scenarios | Write test code | | Debugging | Analyze test failures | Modify test expectations | If an AI tool touches a test file, the PR gets rejected. No exceptions. Your tests are your specification. They’re your safety net. They’re the encoded wisdom of every bug you’ve fixed and every edge case you’ve discovered. Guard them zealously. Scaling Without Drowning: Token Economics and Context ManagementOne of the most counterintuitive lessons in AI-assisted development is that being stingy with context to save tokens actually costs you more. It’s like trying to save money on gas by only filling your tank halfway—you just end up making more trips to the gas station.Token budgets matter. Provide focused prompts, reduce diff length, and avoid large-file bloat by summarizing intent in advance. But “focused” doesn’t mean “minimal”—it means “relevant and complete.”Let me show you the false economy of starved prompts:Starved Prompt Attempt:"Add caching to the user endpoint" Claude’s Response: Implements caching… but:Uses in-memory cache (won’t work with multiple servers) No cache invalidation strategy No metrics or monitoring No consideration of cache stampedeResult: 3 more rounds of fixes, 4x the tokens spent.Proper Context-Rich Prompt:Add Redis caching to the GET /users/{id} endpoint. Context: - This endpoint serves 50k requests/minute - We run 12 API servers behind a load balancer - User data changes infrequently (few times per day) - We already have Redis at cache.redis.internal:6379 - Use our standard cache key pattern: "user:v1:{id}" - Include cache hit/miss metrics (we use Prometheus) - Implement cache-aside pattern with 1 hour TTL - Handle cache stampede with probabilistic early expiration See our caching guide: docs/patterns/caching.md The lesson? Front-load context to avoid iteration cycles. Think of tokens like investing in good tools—the upfront cost pays for itself many times over.In fact, I recommend that all projects should routinely ask Claude to look through the codebase changes, and add context to CLAUDE.mdFresh Sessions and Mental ModelsHere’s another counterintuitive practice: use fresh Claude sessions for distinct tasks. It’s tempting to keep one long-running conversation, but this leads to context pollution.Think of it like this: you wouldn’t use the same cutting board for vegetables after cutting raw chicken. Similarly, don’t use the same Claude session for database migrations after discussing frontend styling. The context bleeds through in subtle ways.Our rule: One task, one session. When the task is done, start fresh. This keeps Claude’s “mental model” clean and focused. Things Claude Should Never Touch (Carved in Stone)Let’s be crystal clear about boundaries. These aren’t suggestions—they’re commandments. Violate them at your peril.The Sacred List of Never-Touch❌ Test Files# This is SACRED GROUND # No AI shall pass def test_critical_business_logic(): """This test encodes $10M worth of domain knowledge""" pass Tests encode human understanding. They’re your safety net, your specification, your accumulated wisdom. When Claude writes tests, it’s just verifying that the code does what the code does—not what it should do.❌ Database Migrations-- migrations/2024_01_15_restructure_users.sql -- DO NOT LET AI TOUCH THIS -- One wrong move = data loss = career loss ALTER TABLE users ADD COLUMN subscription_tier VARCHAR(20); UPDATE users SET subscription_tier = 'free' WHERE subscription_tier IS NULL; ALTER TABLE users ALTER COLUMN subscription_tier SET NOT NULL; Migrations are irreversible in production. They require understanding of data patterns, deployment timing, and rollback strategies that AI cannot grasp.❌ Security-Critical Code# auth/jwt_validator.py # HUMAN EYES ONLY - Security boundary def validate_token(token: str) -> Optional[UserClaims]: # Every line here has been security-reviewed # Changes require security team approval # AI suggestions actively dangerous here ❌ API Contracts Without Versioning# openapi.yaml # Breaking this = breaking every client # AI doesn't understand mobile app release cycles paths: /api/v1/users/{id}: get: responses: 200: schema: $ref: '#/definitions/UserResponse' # FROZEN until v2 ❌ Configuration and Secrets# config/production.py DATABASE_URL = os.environ["DATABASE_URL"] # Never hardcode STRIPE_SECRET_KEY = os.environ["STRIPE_SECRET_KEY"] # Obviously FEATURE_FLAGS = { "new_pricing": False, # Requires product decision } The Hierarchy of AI MistakesNot all AI mistakes are equal. Here’s how we categorize them:Level 1: Annoying but HarmlessWrong formatting (your linter will catch it)Verbose code (refactor later)Suboptimal algorithms (profile will reveal)Level 2: Expensive to FixBreaking internal APIs (requires coordination)Changing established patterns (confuses team)Adding unnecessary dependencies (bloat)Level 3: Career-LimitingModifying tests to make them passBreaking API contractsLeaking secrets or PIICorrupting data migrationsYour guardrails should be proportional to the mistake level. Level 1 mistakes teach juniors. Level 3 mistakes teach you to update your LinkedIn. The Future of Coding: Where This Is HeadingAs I write this in 2025, we’re in the awkward adolescence of AI-assisted development. The tools are powerful but clumsy, like a teenager who just hit a growth spurt. But the trajectory is clear, and it’s accelerating.Good documentation is foundational for successfully implementing DevOps capabilities. The teams that excel will be those who treat documentation as code, who maintain their CLAUDE.md files with the same rigor as their test suites.What I see coming (~roughly in order of arrival):Proactive AI that suggests improvements without promptingAI that learns your team’s patterns and preferencesPersistent memory across sessions and projectsAI that understands entire codebases, not just filesBut even as capabilities expand, the fundamentals remain: humans set direction, AI provides leverage. We’re tool users, and these are simply the most powerful tools we’ve ever created. The Bottom Line: Start Here, Start TodayIf you’ve made it this far, you’re probably feeling a mix of excitement and trepidation. That’s the right response. AI-assisted development is powerful, but it requires discipline and intentionality.The most important thing? Start. Start small, start careful, but start. The developers who master this workflow aren’t necessarily smarter or more talented—they’re just the ones who started earlier and learned from more mistakes.Discuss