Where’s the holistic AI productivity data?

For most of my career I ran a very small company. When you run a tiny company your resources (both time and money) are limited, and you want to use them on the things that will have the most impact. You have to quickly stop doing things that aren’t cost-effective, to avoid “throwing good money after bad”. Ideally, you do a small trial of something new and measure the results before rolling it out more widely, to avoid going all in on something untested. As tech news starts to publish stories about how companies are realising that AI costs more than the humans it was supposed to replace, I’m wondering why it took them so long to figure this out. I’ve spent the last two years watching companies large and small diving headlong into AI. Rarely do I see an attempt to measure the actual costs, financial and otherwise, of that decision.

It’s certainly possible, for a skilled person, to speed up certain processes while also maintaining quality with the use of an agent. I’ve a number of examples that have been successful, and enabled improvements across content sets that would have been hard to justify the work on otherwise.

However, as I document this work I realise how the success of it relies on the things I know. I can spot when the AI tool goes off track, I review its work in the way I’d review the work of a very junior writer. I couldn’t just hand this stuff off to anyone, and them be able to replicate what I can do in terms of the quality of the end result. When you do that, what you get is something that looks on the surface like the same output, but is a pastiche of the result when someone with actual knowledge is behind it.

The same thing seems to be playing out with agentic coding. You can get yourself something that looks like a functioning application. However, without a great deal of knowledge about how to build a functioning application, what you have is often just a reasonably functional mockup. At best you’ve got a handy personal tool that should never escape into production.

Individual productivity enhancements have a ceiling, what you can do with the tools is limited by the need to review the output. As everyone talks about productivity, I’m just not seeing any real research that demonstrates AI is measurably increasing productivity when you take a holistic view.

Individual AI productivity gains

Individually it’s clearly possible to use an LLM to increase productivity. As I’ve already described, a skilled individual can selectively introduce an AI tool to perform specific (usually rote but not quite scriptable) tasks. There are improvements to be had there, but they are similar to the bump you get when you finally figure out how to use a spreadsheet properly, or learn how to automate tasks with some simple coding. If you can already do those things, then AI use can, in some circumstances, automate some additional tasks or make it quicker to create those automations.

This level of improvement is appearing in research data, for example the London School of Economics found in their report Bridging the Generational AI Gap: Unlocking Productivity for All Generations that professionals using AI save an average of 7.5 hours per week. I have a theory that in many cases for non-coders, AI has just solved coding’s image problem, and these gains could have been achieved without AI.

However, another way someone might report increased individual productivity is by shifting the work onto someone else. That might be another person or team—writers end up fixing slop drafts and having to correct obvious errors, code reviewers wade through Pull Requests, and QA teams spend more time dealing with bugs. It also might be your reader or user who now has to wade through paragraphs of slop, is misled by inaccurate documentation, runs into bugs in your app, or finds it inaccessible to them. In this case you might feel more productive, but all you’ve done is move the work around, make someone else’s job or experience measurably worse, and reduce quality.

It’s for this second reason that a holistic approach needs to be taken to truly assess productivity across an organisation. If we look at specific individuals or even teams, we’re likely to miss task reallocation based on AI use.

AI as a forcing function for accessible data

In addition to the issue of task reallocation, there’s another reason why it’s hard to quantify how useful AI actually is. AI tooling has forced a lot of data to become available and easily consumed. This makes it easier to perform non-AI automations.

People who refused to write documentation in the past are now churning out skills, which are documentation. We can use these to easily identify the process needed to achieve tasks. Identifying repetitive processes is the first step of any automation attempt.

Many of my processes are enabled through the easy access to the data required, such as MCP servers, or sites giving me a nice clean markdown export rather than me having to search through messy div soup HTML. This makes more of what I’m doing possible with regular scripting. I’ve found myself moving more things into Python over time, and using the AI tools for more discrete tasks on reliable data returned from a script.

We can’t justify costs we don’t understand

It’s hard to find anything other than anecdata from individuals telling us how AI has made them individually more productive. If AI really was creating measurable improvements in productivity across entire organisations, wouldn’t we be seeing that data? How can we justify the cost (financial, environmental, and human) of AI, if the reality is a relatively small bump in productivity that could have happened by teaching more people to automate tasks using existing tools or simple coding? Why aren’t businesses encouraging people to use non-AI methods where possible, saving the AI only for where it adds value? Given the societal costs, and the benefit to a business of bringing onboard and training people, perhaps on a balance of things even those tasks where AI is needed are better performed by people.

The lack of rigour disquiets me. I’ve been lucky enough to spend the majority of my life working with people who care. The sort of people who like things to make sense, who want to do the right thing, even if it takes longer. We thrived in an industry that prided itself on being data driven. Now so many of us are burning out. It’s exhausting trying to do the work you’ve spent a lifetime building expertise in when people around you are trying to figure out how to replace you with AI, based on vibes that it should be possible. I worry that by the time this all plays out, many of the experienced people the web needs will have left the industry. I see no evidence that AI can come close to replacing the expertise we’ll lose.

Leave a Reply

To respond on your own website, enter the URL of your response which should contain a link to this post's permalink URL. Your response will then appear (possibly after moderation) on this page. Want to update or remove your response? Update or delete your post and re-enter your post's URL again. (Find out more about Webmentions.)