I’ll say that during a recent week where I was forced to use an LLM, I found Claude Opus to be extremely poor at referencing this guide: https://mywiki.wooledge.org/BashPitfalls
it took almost an hour to get Claude to write me a shell script which I considered to be of acceptable quality. It completely hallucinated about several of the points in that guide, requiring me to just go read the guide myself to verify that the language model was falsifying information. That same task would have taken me about 5 minutes.
I believe that GIGO applies here. 99% of shell scripts on the internet are unsafe and terrible (looking at you, set -euo pipefail), and Claude is much more likely to generate god awful garbage because of the inherent bias present in the training data.
And as for unit tests? Imo, anything other than property-based testing is irrelevant. If you’re using something like Pydantic, you can auto-generate a LOT of your tests using the rich type annotations available in that library along with hypothesis. I tend to write a testing framework once, and then special case property tests for things that fall outside of my models. None of this is super helpful for big ugly codebases with a lot of inertia around practices, but that’s not been my environment, thankfully.
Shellcheck, while good, doesn’t capture all best practices in my opinion. There are many items in that doc which shellcheck would happily allow, worst of all being set -euo pipefail.
I’ll say that during a recent week where I was forced to use an LLM, I found Claude Opus to be extremely poor at referencing this guide: https://mywiki.wooledge.org/BashPitfalls
it took almost an hour to get Claude to write me a shell script which I considered to be of acceptable quality. It completely hallucinated about several of the points in that guide, requiring me to just go read the guide myself to verify that the language model was falsifying information. That same task would have taken me about 5 minutes.
I believe that GIGO applies here. 99% of shell scripts on the internet are unsafe and terrible (looking at you,
set -euo pipefail), and Claude is much more likely to generate god awful garbage because of the inherent bias present in the training data.And as for unit tests? Imo, anything other than property-based testing is irrelevant. If you’re using something like Pydantic, you can auto-generate a LOT of your tests using the rich type annotations available in that library along with hypothesis. I tend to write a testing framework once, and then special case property tests for things that fall outside of my models. None of this is super helpful for big ugly codebases with a lot of inertia around practices, but that’s not been my environment, thankfully.
Why not just give it shellcheck and have it run that on every script it creates?
Shellcheck, while good, doesn’t capture all best practices in my opinion. There are many items in that doc which shellcheck would happily allow, worst of all being
set -euo pipefail.