Can anyone build an AI Agent from scratch?

Abstract art of person at computer

I’ve been following Thorsten Ball for a while now and recently discovered his YouTube series with SourceGraph, where he’s working on an AI Agent called Amp . Thorsten did something unique: he shared a video of himself pair programming on the actual SourceGraph product, walking through his thought process for building the agent. It wasn’t a polished video with fancy editing or transitions—just an authentic look at him coding. While watching, I realized this wasn’t some insanely complex technical feat. Even someone like me could give it a shot (famous last words, I know). Inspired, I started building my own AI agent as a VSCode extension. A few days in, I’m hooked and can’t stop working on it.

Here are some of the things I’ve learned along the way…

  1. AI Agents like Windsurf and Replit AI are not very good at VSCode Extensions (this has much to do with ability to debug issues, more on this later).
  2. Building the UI was much harder than building the tooling necessary for the LLM to make file edits.
  3. Having an LLM synthesize and make workspace rules around best practices, such as those outlined in Building Effective Agents , really improved the tool chain instructions it wrote.

My AI Agent VSCode Extension

I built a VSCode extension that exclusively uses Claude-3.5 Sonnet (for now) and gives Claude the ability to not only answer your code questions, but to read, edit, create and delete files.

I started by trying to get Replit Agent v2 to give me the scaffolding, but it had a hard time setting things up correctly. Plus, since Replit is in the cloud the back and forth from cloud → github → local → debug was too long of an iteration cycle to quickly iterate. I gave that up and switched to using Windsurf exclusively. Windsurf + Claude 3.7 (thinking) did the best job at setting up boilerplate for a VSCode Extension.

The biggest challenge was the feature iteration cycle. I’m sure there are faster ways to iterate, but I found the quickest approach was having the LLM make changes, then starting the debugger to test them. If I ran into errors or unexpected behavior, I’d report those back to the LLM for the next round. Ideally, the LLM could “see” the output in the VSCode Extension and test the updates itself.

This long iteration cycle made getting the UI right the hardest part. It took significant cycles to get the UI styling looking right, and my UI is pretty basic. Once I had the UI looking okay, I could get to the fun part — building the agent.

Building the Agent

The sweet spot for writing code with LLM’s for me has been to use a mixture of Grok3 and Claude-3.5 / 3.7 (thinking). When one LLM couldn’t resolve an issue or implement a feature, I switched to the other. I found this back and forth to really help with working through some difficult bugs.

For the Grok chatbox experience, I use the XML logic from McKay Wrigley’s Takeoff Prompt . I built superpromptor a few weeks ago to help me with the iterative process of copying code to a chatbox and inserting the chatboxes changes to my repository quickly.

Getting the tool call prompts right was another big obstacle. I found that Claude was too chatty about what tools it was using, and would output way too much clutter in the chatbox. I used the system prompt and tool call commands to tell it to suppress these messages.

One really helpful thing was having Grok condense the famous article Building Effective Agents into a rules file for windsurf to use. Having a quick summary of rules for codifying tool calls significantly improved Claude’s ability to make code changes to my codebase. I’ll provide the Grok provided rules at the bottom of this post.

After many iterations using these methods, I had a pretty good working demo of Claude interacting with my codebase . Not only do I more fully understand what is going on under the hood, but it was absolutely addicting to build. Every time I solved a problem (more so the LLM solved) I could immediately see what the next feature I needed to implement was. It became an endless cycle where I spent hours on end going from one feature to the next.

There is still so much to add to this Agent, but having a working demo of Claude editing and improving my codebase from a VSCode Extension in a single weekend was much easier than I thought it would be. My extension built multiple basic Go and Python projects and edited / improved a few existing projects I already have. This was all just with file edits + Claude-3.5. The next step will be to add the ability to run terminal commands to give it even more power to make changes to my code.

So to answer the question, “Can anyone build an AI Agent from scratch?”, the answer is of course, yes. But it will require you to do some homework and understand what exactly you want your agent to do, and what tools you need to give the LLM to give it the powers it requires to complete your requests. I included a few helpful links at the bottom, listing resources I read to prepare for building this agent.

Feel free to reach out to me if you have any questions!


Rules for Building Effective Agents

Here is the Grok output from consuming Building Effective Agents and creating rules for windsurf to follow. I found it extremely helpful while building the Agent.

## Rules for Building Effective AI Agents

These rules provide guidelines for creating effective, reliable, and maintainable AI agents. They are based on best practices from the "Building Effective Agents" article and are tailored for your code editor AI Agent to follow while helping you develop an AI Agent code helper.

**Note**: Tool development is critical for agent success. Dedicate significant effort to crafting tools that are intuitive for the LLM, with clear documentation and well-tested interfaces.

### General Principles

1. **Start Simple**: Begin with the simplest solution possible. Optimize single LLM calls using retrieval and in-context examples before moving to multi-step agentic systems.
2. **Evaluate Trade-offs**: Before implementing an agent, determine if the performance gain justifies the increased cost and latency.
3. **Understand Frameworks**: If using a framework, ensure you understand its underlying code to avoid problems caused by abstraction layers.
4. **Maintain Simplicity**: Design the agent to be as simple as possible, adding complexity only when it clearly improves results.
5. **Prioritize Transparency**: Ensure the agent's planning steps are explicit and transparent to aid debugging and comprehension.
6. **Customize Capabilities**: Tailor the augmented LLM's capabilities (retrieval, tools, memory) to match the specific needs of your coding agent.

### Workflow Patterns

7. **Prompt Chaining**: Use for tasks that can be broken into sequential steps, such as generating code and then refining it, to boost accuracy.
8. **Routing**: Apply when inputs (e.g., coding queries) can be categorized and directed to specialized processes, like bug fixes vs. feature additions.
9. **Parallelization**: Use to run independent coding subtasks simultaneously (e.g., reviewing different files) or to aggregate multiple outputs for better reliability.
10. **Orchestrator-Workers**: Implement for complex coding tasks where subtasks (e.g., editing multiple files) are determined dynamically by the agent.
11. **Evaluator-Optimizer**: Use for iterative coding improvements, such as refining code based on test feedback, when clear evaluation criteria exist.

### Agent Design

12. **Open-Ended Tasks**: Deploy agents for coding problems with unpredictable steps, such as resolving complex GitHub issues, where fixed workflows won't suffice.
13. **Trustworthy Decision-Making**: Ensure the agent's decisions are reliable by testing extensively in sandboxed environments and adding guardrails to limit errors.
14. **Tool Documentation**: Provide clear, detailed documentation for all tools to ensure the LLM can use them correctly during coding tasks.
15. **Adapt Patterns**: Combine and adjust workflow patterns based on performance metrics and the specific demands of your coding project.

### Tool Development

16. **Effective ACI**: Build a clear agent-computer interface (ACI) with well-documented, thoroughly tested tools to support coding tasks.
17. **Intuitive Interfaces**: Design tools with simple parameters and usage patterns that the LLM can easily understand and apply.
18. **Detailed Documentation**: Include examples, edge cases, and distinctions between similar tools in documentation to guide the LLM's usage.
19. **Extensive Testing**: Test tools with the LLM using varied coding scenarios and refine them based on observed performance.
20. **Simplify Formats**: Use straightforward, natural formats for tool inputs and outputs (e.g., markdown over JSON for code) to minimize errors.
21. **Mistake-Proofing**: Design tools to prevent errors, such as requiring absolute file paths instead of relative ones, following poka-yoke principles.