When this site launched in 2023, the big question about AI was: "How good is ChatGPT at writing?" In 2026, the question has shifted: "What can an AI do on its own?" The conversation has moved from AI as a sophisticated autocomplete to AI as an autonomous actor in the world. That shift is what "AI agents" means — and it's worth understanding.

What's an AI Agent, Exactly?

A language model, at its core, takes text in and produces text out. That's it. ChatGPT, Claude, Gemini — underneath all the polish, they're next-token predictors. What makes an agent different is that the model is given tools: the ability to search the web, read and write files, execute code, call APIs, click buttons in a browser. The model decides when and how to use these tools based on a goal you give it.

The result is a system that can plan, take actions in the real world, observe results, and adjust — all without you having to hold its hand through every step. You give it a task, and it figures out the steps.

What Changed to Make This Possible?

Three things came together at roughly the same time:

1. Models got much better at following instructions

Early GPT models were impressive at generating text but unreliable at doing exactly what you asked. The instruction-following models that emerged from RLHF (Reinforcement Learning from Human Feedback) — the same technique that produced ChatGPT — are dramatically more precise. When you tell a modern model "search for X, then summarize the top three results," it actually does that, step by step, rather than just writing about what that might look like.

2. Tool-use became a first-class feature

OpenAI's function-calling API (2023) and Anthropic's tool-use feature for Claude made it possible to define external functions that a model can call as part of generating a response. The model produces a structured call like {"tool": "search", "query": "latest AI news"}, the application runs the actual search, returns the results to the model, and the model continues. It's remarkably clean.

3. Context windows exploded in size

Agents need to hold a lot in mind at once — the original goal, the steps taken so far, results from tool calls, errors encountered. In 2023, context windows were around 8,000 tokens. By 2025, the flagship models had context windows of 128,000 to 200,000 tokens — enough to fit an entire codebase, a long research project, or hours of conversation history. That's what makes complex, multi-step tasks tractable.

What Can Agents Actually Do Today?

The honest answer is: quite a lot, with caveats. Here's what's working well:

  • Software development: AI coding agents (GitHub Copilot Workspace, Cursor, Claude Code) can take a task description, write code across multiple files, run tests, read error messages, and iterate until tests pass. This is probably the most mature use case today.
  • Research and synthesis: Agents that can browse the web, pull papers, summarize, and cross-reference sources have meaningfully sped up literature reviews and competitive research.
  • Data analysis: Give an agent a spreadsheet and a question; it can write and execute Python, debug its own errors, and produce an answer with visualizations.
  • Browser automation: Agents can navigate websites, fill out forms, and extract information — tasks that previously required custom scraping scripts.

The caveats are real, though. Agents are brittle in unpredictable ways. They can confidently take a wrong turn and barrel down it for many steps before realizing the mistake. Error recovery is still a weak point. Long-horizon tasks that require subtle judgment calls across many steps still require human oversight.

The Shift in How We Think About AI

When AI was primarily a text-generation tool, the mental model was "AI as a very fast writer." You prompt, it writes, you edit. The human is always in the loop because every output goes through human review before anything happens.

AI agents change that model. When an agent is browsing the web, writing to a database, or sending API calls, actions happen. Some of them are irreversible. This is why so much of the current discussion in AI safety has shifted from "is the text accurate" to "is the AI taking the right actions for the right reasons." The consequences of a hallucination are very different when the output is a paragraph versus when it's a database write.

Where This Is Going

The trajectory is clear, even if the destination isn't. Models are getting better, tools are getting richer, and the infrastructure for deploying agents reliably is maturing quickly. The 2023 version of this site asked "What can AI write?" The 2026 version might reasonably ask "What can AI build on its own, end to end?"

I don't think we know the answer yet. But it's the most interesting question in technology right now. And unlike the breathless predictions that AI would write all software by 2025, the reality is more nuanced and more interesting: AI has become a genuine partner in building things, with real strengths, real limits, and a trajectory that's still pointing up.