thisago's blog


Should I Use LLM?

Table of Contents

We used to wear clothes that held craftsmanship, we used to eat food we planted. But now we use polyester clothes and eat food packed in plastic bags. The price is cheaper, but does the drawbacks worth?

During 7 months I used no LLM, but last 50 days I've been using the cutting-edge models from bigcorps again (mainly Opus from Anthropic) and I got some thoughts.

The evolution of 7 months

Before my last use I was mainly using GPT-4.1 and GPT-5-mini.1 Now with Opus the change is perceptible.

Not only the models. Previously I mainly used Aider, which didn't had subagents feature. Now with Claude Code and OpenCode the capability increases:

  • Parallel subagent dispatching
  • Smarter prompt usage (AKA skills)
  • Per-subagent specialized prompts
  • MCP servers

OK, not a list of mind-blowing innovations. Most of innovation lies in the SaaS' inference system. However since frontier models are getting very capable, client-side customizations are working really better now. In 2024 wasn't easy to get a local llama 3.1 7b use the tools correctly, and frontier models performed better at Aider editing style (plain text replace boundaries like Git conflict marker) than tool-calling (JSON)

I have nothing to complain regarding the technology advancement, really impressive.

I tinkered a little and did some agents and skills, making a setup for reusable prompts across SKILLs/AGENTs with Org Mode and cross compatibility with both OpenCode and Claude Code and it even reply the way I want.

The good

Generates code fast

All languages I played with it worked pretty fluid:

  • Go
  • TypeScript
  • Python
  • Nim
  • Bash
  • Awk
  • jq
  • mongosh (JS)
  • Org Mode
  • YAML

Most of the time its output is usable. Syntax errors are really rare. What I remember was mostly in YAML, which added unquoted strings with colons:

yq <<<'text: "If quoted: OK"'
yq <<<'text: If unquoted: Syntax error' 2>&1
text: "If quoted: OK"
Error: bad file '-': yaml: mapping values are not allowed in this context

However, obvious as it sounds, it doesn't means you'll deliver features at speed it can generate.

Really cheap

Considering the real costs of self-hosting the LLM models, SaaS inference from bigcorps are ridiculously cheap.

Technical debt is now past

That 37 TODOs you left in the codebase for the future-you can be now solved with low-quality requests:

$ rg TODO: internal/mypackage
refactor this thing

Speed means money

It's for sure a strategical tool for taking advantage in the market: prototyping apps, testing new approaches and data refining.

Most of people are using, so the strategical bonus of using is way lower than the loss of not using.

Subagents can be parallel

I sticked into this, the main agent delegate parallel subagents is awesome.

The bad

Privacy

This is the most critical point for me but was underrated in the previous post.

As someone that picks self-hosted and privacy-friendly alternatives for common bigtechs solutions, I faced a new limitation now: My computer cannot run a LLM that really replaces their solutions.

LLMs becoming the standard interface between human and computer, the companies that provides it will own a copy of all the projects you work on. It's fair to assume they can misuse the data once you accept their terms. So now your project is shared with them. Even I had cases where LLM read my .env (which contains pass calls) and called pass to build a curl by writing the env in plain text. There's a lot of room for improvement in security hardening.

Beyond project's business rules and envs leakage, there's also the personal fingerprinting which can evolve to personalized advertising/propaganda inside the chatbots.

An alternative is test models in SaaS GPUs and evaluate if worth buying the hardware.

But if self-hosted models in consumer-grade GPUs is still not enough, there's might be some privacy/security mitigations for SaaS LLM inference, ie.:

Local small language model to rewrite human text
Reduce fingerprinting and filter content.
Virtualize the agentic application
OS-level security and manageable exposition.
Local data poisoning and redaction
Clean sensitive data, provide fake signs. Might bring drawbacks.

Excessive speed

The speed it generate complex code is absurd for humans, this brings anxiety and makes harder to think in the problem. It's like a momentum that pushes faster than you can think, so you end up by just vomiting replies back to LLM, ignoring even some typos.

Potential for cognition harm

There's a initial study about harms of misused LLM,2 personally I feel the tendency to avoid thinking by my own and "brainstorm" with LLM.

Is needed a good discipline to handle the "vomit-back" behavior and keep track of what's going on during a long session.

It owns the code

When you're the author of the code, you know exactly where change. When LLM edits, your changes will first require a code review, like we always did when working with others.

With insanely fast code generation, you see as more productive to just ask LLM to do the edit.

We're adopting a no-code and no-depth-think working style, which engineers becomes users of a chat-bot.

Expensive technology

Frontier LLM models aren't consumer-grade software, you can't self-host in own hardware without a good amount of cash, and (yet) still not be comparable to the cutting-edge models.

I think it will get more efficient and/or powerful enough computers will get more accessible.

SSD and RAM in retail

It doubled the price. Now a SSD I bought from BRL 400 is BRL 1k at Amazon BR.

Are homelabs in potential threat? Hope this is temporary.

Considerations

LLMs might get restricted

The money these companies raises from investors can cease and since it requires a immense amount of computational power, the service price can raise.3 Or hardware can get even more expensive.

If this be true, we're in a opportunity gap to generate massive amounts of texts.

Conclusion

The two main issues might have a solution:

  • Privacy
  • Cognition

I think is worth looking for solving, its usage is reasonable.

Adapt the workflow for a denser pre-flight planning might reduce bad habits harming cognition.

So the follow-up directions:

  1. Run frontier OSS models in a GPU rent infra replacing Opus 4.8.
  2. If frontier models still be game-changer, consider some workaround with SaaS LLM inference to reduce fingerprint leak.

Footnotes:

1

GPT-5 were excessive slow at first release

3

Thought from this interesting FAQ: https://hledger.org/AI.html#ais-environmental-impact-