Should I Use LLM?
Table of Contents
We used to wear clothes that held craftsmanship, we used to eat food we planted. But now we use polyester clothes and eat food packed in plastic bags. The price is cheaper, but does the drawbacks worth?
During 7 months I used no LLM, but last 50 days I've been using the cutting-edge models from bigcorps again (mainly Opus from Anthropic) and I got some thoughts.
The evolution of 7 months
Before my last use I was mainly using GPT-4.1 and GPT-5-mini.1 Now with Opus the change is perceptible.
Not only the models. Previously I mainly used Aider, which didn't had subagents feature. Now with Claude Code and OpenCode the capability increases:
- Parallel subagent dispatching
- Smarter prompt usage (AKA skills)
- Per-subagent specialized prompts
- MCP servers
OK, not a list of mind-blowing innovations. Most of innovation lies in the SaaS' inference system. However since frontier models are getting very capable, client-side customizations are working really better now. In 2024 wasn't easy to get a local llama 3.1 7b use the tools correctly, and frontier models performed better at Aider editing style (plain text replace boundaries like Git conflict marker) than tool-calling (JSON)
I have nothing to complain regarding the technology advancement, really impressive.
I tinkered a little and did some agents and skills, making a setup for reusable prompts across SKILLs/AGENTs with Org Mode and cross compatibility with both OpenCode and Claude Code and it even reply the way I want.
The good
Generates code fast
All languages I played with it worked pretty fluid:
- Go
- TypeScript
- Python
- Nim
- Bash
- Awk
- jq
- mongosh (JS)
- Org Mode
- YAML
Most of the time its output is usable. Syntax errors are really rare. What I remember was mostly in YAML, which added unquoted strings with colons:
yq <<<'text: "If quoted: OK"' yq <<<'text: If unquoted: Syntax error' 2>&1
text: "If quoted: OK" Error: bad file '-': yaml: mapping values are not allowed in this context
However, obvious as it sounds, it doesn't means you'll deliver features at speed it can generate.
Really cheap
Considering the real costs of self-hosting the LLM models, SaaS inference from bigcorps are ridiculously cheap.
Technical debt is now past
That 37 TODOs you left in the codebase for the future-you can be now solved with low-quality requests:
$ rg TODO: internal/mypackage refactor this thing
Speed means money
It's for sure a strategical tool for taking advantage in the market: prototyping apps, testing new approaches and data refining.
Most of people are using, so the strategical bonus of using is way lower than the loss of not using.
Subagents can be parallel
I sticked into this, the main agent delegate parallel subagents is awesome.
The bad
Privacy
This is the most critical point for me but was underrated in the previous post.
As someone that picks self-hosted and privacy-friendly alternatives for common bigtechs solutions, I faced a new limitation now: My computer cannot run a LLM that really replaces their solutions.
LLMs becoming the standard interface between human and computer, the companies
that provides it will own a copy of all the projects you work on. It's fair to
assume they can misuse the data once you accept their terms. So now your project
is shared with them. Even I had cases where LLM read my .env (which contains
pass calls) and called pass to build a curl by writing the env in plain
text. There's a lot of room for improvement in security hardening.
Beyond project's business rules and envs leakage, there's also the personal fingerprinting which can evolve to personalized advertising/propaganda inside the chatbots.
An alternative is test models in SaaS GPUs and evaluate if worth buying the hardware.
But if self-hosted models in consumer-grade GPUs is still not enough, there's might be some privacy/security mitigations for SaaS LLM inference, ie.:
- Local small language model to rewrite human text
- Reduce fingerprinting and filter content.
- Virtualize the agentic application
- OS-level security and manageable exposition.
- Local data poisoning and redaction
- Clean sensitive data, provide fake signs. Might bring drawbacks.
Excessive speed
The speed it generate complex code is absurd for humans, this brings anxiety and makes harder to think in the problem. It's like a momentum that pushes faster than you can think, so you end up by just vomiting replies back to LLM, ignoring even some typos.
Potential for cognition harm
There's a initial study about harms of misused LLM,2 personally I feel the tendency to avoid thinking by my own and "brainstorm" with LLM.
Is needed a good discipline to handle the "vomit-back" behavior and keep track of what's going on during a long session.
It owns the code
When you're the author of the code, you know exactly where change. When LLM edits, your changes will first require a code review, like we always did when working with others.
With insanely fast code generation, you see as more productive to just ask LLM to do the edit.
We're adopting a no-code and no-depth-think working style, which engineers becomes users of a chat-bot.
Expensive technology
Frontier LLM models aren't consumer-grade software, you can't self-host in own hardware without a good amount of cash, and (yet) still not be comparable to the cutting-edge models.
I think it will get more efficient and/or powerful enough computers will get more accessible.
SSD and RAM in retail
It doubled the price. Now a SSD I bought from BRL 400 is BRL 1k at Amazon BR.
Are homelabs in potential threat? Hope this is temporary.
Considerations
LLMs might get restricted
The money these companies raises from investors can cease and since it requires a immense amount of computational power, the service price can raise.3 Or hardware can get even more expensive.
If this be true, we're in a opportunity gap to generate massive amounts of texts.
Conclusion
The two main issues might have a solution:
- Privacy
- Cognition
I think is worth looking for solving, its usage is reasonable.
Adapt the workflow for a denser pre-flight planning might reduce bad habits harming cognition.
So the follow-up directions:
- Run frontier OSS models in a GPU rent infra replacing Opus 4.8.
- If frontier models still be game-changer, consider some workaround with SaaS LLM inference to reduce fingerprint leak.
Footnotes:
GPT-5 were excessive slow at first release
Thought from this interesting FAQ: https://hledger.org/AI.html#ais-environmental-impact-