Visual AI's Dirty Secret: Pixels Are a Dead End, Code Is the Future

For years, visual AI has been judged by one metric: how good does it look? Diffusion models changed everything—turning text prompts into photorealistic images and videos that made Photoshop look quaint. But according to a new analysis from Andreessen Horowitz, the most interesting visual AI work happening right now isn't about generating pixels at all. It's about generating code.

The Two Stacks of Visual Generation

The article identifies two fundamentally different approaches to visual generation. Pixel-native systems generate images or videos directly, usually in latent space—they excel at texture, atmosphere, lighting, and realism. Think Midjourney or Sora. These are great for moodboards, cinematic shots, and exploratory visuals. But there's a ceiling: each output is essentially a dead end. If one curve in a logo is wrong, you're masking, inpainting, regenerating, or redrawing manually. Code-native systems take a different path entirely. Instead of outputting pixels, they generate representations that get executed by another engine—SVG files, HTML/CSS layouts, React components, Lottie JSON animations, Blender scripts, USD scene graphs, shaders, or game-engine scenes. The visual output is still pixels at the end, but the source of truth is a structured program you can edit, version, reuse, and integrate into your software stack.

Why Code Wins for Production Workflows

The real value emerges when designers need to iterate. A generated image is useful as an output; a generated visual program is useful as an artifact. If the spacing in a UI mockup is wrong, change the CSS. If a logo curve is off, edit the SVG path. If an animation feels sluggish, adjust the timing parameters. This is why platforms like Quiver are already designing logos as editable vector code rather than static exports. This distinction becomes critical when you consider test-time compute—the inference budget that gets spent after initial generation. In pixel-native systems, more compute typically means sampling more outputs: generate twenty images, pick the best one, maybe try again. Every attempt is essentially a new roll of the dice with imprecise feedback. But code-native generation enables something fundamentally better: Code → Render → Inspect → Revise. The model produces an artifact, renders it, sees what broke, and patches the source directly. Each iteration improves the underlying program itself rather than just producing another sample. This is why visual code generation sits on a direct path to benefiting from generating more tokens and test-time compute—the agent is debugging a visual program in a closed-loop, verifiable environment.

3D: The Next Frontier

While product design and UI are obvious near-term applications, the analysis suggests 3D artifacts stand to benefit most from this reframing. A 2D design can sometimes be useful if it simply looks right. A rendered image of a chair is not a chair—it's a picture of a chair. For an asset to work in a game, simulation, or editing tool, it needs consistent geometry, materials, part hierarchy, and scene context that holds up across views, edits, and interactions. Projects like VIGA and Articraft3D illustrate where this is heading. VIGA uses Blender as both rendering engine and feedback environment, giving agents semantic tools for observation and modification plus memory over prior attempts—they can inspect from better viewpoints, diagnose issues, and make targeted edits to the underlying 3D representation. Articraft3D goes even more directly at asset structure by framing articulated 3D generation as writing programs that define parts, geometry, joints, and behavioral tests.

Market Implications and What's Coming

The market is starting to organize around runtimes—the environments where artifacts get executed or rendered. Browser for HTML/CSS, SVG renderers for vectors, Lottie players for motion graphics, Blender or game engines for 3D scenes, simulators for articulated assets. Each runtime creates a different wedge with its own source representation, feedback loop, and production workflow. The implications are significant: renderers become feedback environments where agents test and improve their work, similar to how coding agents leverage sandboxes today. The quality of iteration context becomes more important than ever—the model needs to know not just that something looks wrong, but which part of the source to change. Small errors in structure, rendering, or feedback can compound quickly across iterations.

Key Takeaways

Code-native visual generation (SVG, React, Lottie JSON) beats pixel-native models for production workflows requiring iteration and editability
The test-time compute loop is fundamentally different: each retry improves the underlying artifact rather than sampling a new output
3D represents the biggest opportunity—assets need consistent geometry, hierarchy, and behavioral constraints that hold across views and interactions
Projects like VIGA and Articraft3D demonstrate how semantic tooling and memory enable agents to debug visual programs effectively

The Bottom Line

Visual AI is having an identity crisis, and that's a good thing. For too long, the industry conflated 'looks impressive' with 'actually useful.' The real unlock isn't prettier pixels—it's treating design as a coding problem that can be iterated on, versioned, and validated. If you're building in this space, your move is to own the loop: generate, render, inspect, revise. That's where test-time compute actually compounds.

> Visual AI's Dirty Secret: Pixels Are a Dead End, Code Is the Future