Low code vibes to chill and relax to

Copilot - guy writing code at computer in a 90's neon fluorescent style

LLM's have been an intensely controversial topic in recente years, some of the distaste is for good reason, some seems a little overblown. I've never really engaged with the debate personally and I still don't intend to. But I did want to share a recent experience of writing an open source project with the help of an LLM.

I've used LLM's to assist me for several years generally to ask questions like: hey mr. robot tell me how to do this thing I have very little knowledge of. This usually gives me the wrong answer in varying degrees but also gives me a reasonable enough starting point to start the journey. Occasionally it nails it and I save myself a whole lot of time.

Roughly a year or two ago I made my first full blown attempt to take my hands off the wheel and go hey mr. robot make me a whole webapp that does some boring stuff. I did so mostly to try out some tools like replit and aider which were new and exciting at the time. I didn't have great success, mostly what I created was a mess that barely started. Still kinda felt like the future but not especially useful.

Time slipped by as it does and great strides have been made in the LLM space, so I decided to try again with Cursor and by extension Claude Sonnet 3.5 / 3.7. This time around I had been thinking about making a docker-socket-proxy as it was a problem I had been pondering but not critically important to me if it never happened.

Well, this time around things went significantly better, as evidenced by the link to a real code repository.

First can't vibe code without setting the vibe, this should do. I prompted it with a config schema and a Readme with an outline of the expected output, then let it get to work. Within a couple of hours I had a mostly functional prototype! I was pretty impressed!

Here's a rough version of the original schema I started with...

rules:
  acls:
    - match:
        path: "/v1.*/networks"
        method: "GET"
      action: "deny"
      reason: "Listing volumes is restricted"

  rewrites:
    - match:
        path: "/v1.*/containers/create"
        method: "POST"
      patterns:
        - field: "Env"
          action: "upsert"
          value: "FUN=yes"
        - field: "Env"
          action: "replace"
          match: "DEBUG=true"
          value: "DEBUG=false"
        - field: "Env"
          action: "delete"
          match: "PANTS=*"

I quickly realised that I could make the whole thing more powerful by collapsing the ACL and rewrite functions into a single rules key.

rules:
  - match:
      path: "/v1.*/volumes"
      method: "GET"
    actions:
      - action: "deny"
        reason: "Listing volumes is restricted"

  - match:
      path: "/v1.*/containers/create"
      method: "POST"
    actions:
      - action: "upsert"
        update:
          Env:
            - "FUN=yes"
      - action: "replace"
        contains:
          Env:
            - "DEBUG=true"
        update:
          Env:
            - "DEBUG=false"
      - action: "delete"
        contains:
          Env:
            - "PANTS=.*"

Over the next couple of nights I adjusted the configuration schema and added another feature to enable propagating the proxy-socket to sub-containers. Things zoomed along until I hit around ~1500 lines of code, that's when it started to show signs of confusing itself and making questionable changes to otherwise working code. It was also starting to generate duplicate code and creating golang packages that didn't make an awful lot of sense. This was a bit of a nightmare on-top of the fact the initial code was pretty messy in the first place.

I was able to get things back on-track again by...

Manually getting rid of the worst of the duplicate code
Manually refactoring the weird inconsistent packages it invented
Ask the LLM to spend some time refactoring the most offensive files for better readability.
Putting a greater emphasis on tests that ensured we weren't destroying functional behaviour.

Once that was done I found success once more by being more specific about what context was needed as well as what needed to change. This next part went far slower and so slow that I am not really sure if I saved time, perhaps some on the basis I am a little out of practice writing go. All told I spent about 2-weeks of evenings here and there plugging away at it.

There's still plenty of gross stuff in the code base, but for now I've got a working tool that is not entirely a war crime. I'm pleased that it was able to make docker-socket-proxy exist and the jump from where we were a couple of years ago to now with LLM coding is pretty massive.

Some notes I would leave to myself about trying this next time are...

You need to either let the AI write the tests or the implementation but not both. You have to be certain one of those things works as intended first. It's a good argument for TDD.
Refactor early and often to try and reduce the number of unconstrained vibes that need reigning in.
Fight the urge to accept legitimate looking vibes that you have not adequately code reviewed. Treat the LLM changes like you would a PR, a reasonable level of scrutiny before accepting the changes.