Luke Davis


Not sure about this AI productivity thing

Filed under: AI | tech

As you might have seen in my posts (on social media or here), I use local LLMs. Not daily and not for anything serious but I do still use them (on my MacBook via Ollama). And it seems that a lot of developers use LLMs locally, albeit bigger and “better” models like Claude or agents like Cline. But something I find puzzling is how okay they are about them not being that good, despite saying they help with productivity.

From thinker to tinkerer

I’ve seen a few use cases in videos and in blogs, some more subjectively well-meaning than others, and can’t get my head around using one that needed so much iteration to then maybe get it right? And then it hit me. I didn’t have the foundational knowledge like they did. They were using models to speed up processes they already knew about and that’s why they could then check the code and reject and iterate the prompts.

But that still doesn’t explain where the productivity improvement comes from since you’re spending unnecessary amounts of time correcting outputs when you could arguably code something better yourself, right? I may never truly know if it’s confirmation bias or genuine productivity improvements at play but it emphasises a gap that nobody talks about.

The vibes of life

Sadly, this has brought me back to the dreaded vibe coding thing and how that has just gone and messed up people’s expectations of coding (and someone’s database; honestly, I wish that guy had never posted that damn tweet).

You need to know at least enough of the thing to use an LLM to potentially speed up making the thing AND know what’s not quite right about the outputted thing to correct it and have as close to the thing as you’re happy with. Adding that level of subjectivity to something more objective than, say, creative writing feels weird and, again, nobody really talks about it within this space and I think it’s important to manage expectations and be transparent. Like vibe coding, you’re reporting a false ideology (intentionally or not) and that’s how people misinterpret and then the progenitors start blaming everyone else for their own lack of clarity.

Where are the tests?

Productivity is the word on every AI fan’s lips. But surely we’re more productive than we’ve ever been? How could we become MORE productive? And if so, why would we need to add something rather than subtract? Don’t worry your pretty little head about such things. And please absolutely ignore studies like this one by Kiana Jafari Meimandi, Gabriela Aránguiz-Dias, Grace Ra Kim, Lana Saadeddin, Mykel J. Kochenderfer called The Measurement Imbalance in Agentic AI Evaluation Undermines Industry Productivity Claims which suggests that an evaluation imbalance overinflates the importance of technical metrics in assessments, while human-centered, safety, and economic assessments remain peripheral, with only 15% incorporating both technical and human dimensions.

You should also not look at this Fortune article titled “Experienced software developers assumed AI would save them a chunk of time. But in one experiment, their tasks took 20% longer

Am I cherrypicking? Yeah, probably. But at least people tested the claims. Why can’t individual developers do the same? The truth is that a lot of the AI buzz is based on unscientific claims, hyperbole, and trust me broisms but underneath it all is genuine machine learning techniques and decades of breakthroughs and findings. We’re doing it all a disservice by purporting claims without backing them up ourselves. If you find something interesting enough to extol its virtues and want others to join in, put some numbers behind it.

Prove me right/wrong

I may never know how much more productive I could be by using an LLM vs. learning to code something and then doing it. That’s because I don’t have strong feelings about it. If anything, I wish I had the knowledge that didn’t need these models at all.

I like to automate stuff as much as I can, whether it’s making little Python packages with functions that I use repeatedly. I understand the allure of LLMs; they get you there quicker if it’s not too taxing (your mileage may vary of course). My main gripe, though, is people who allege unmeasured improvements and I wonder when my compelling arguments to believe them will return from the war.

API vs. chatbots? They're singing from the same hymn sheet GitHub as a CMS?