I just had a similar experience last week trying out ‘vibe coding’ with WindSurf: “Write Python code that connects to Plaid API, download all my financial account balances and write them on my GSheet financial dashboard.” I actually coded all of this by hand 3 years ago, but since both Plaid and GSheet APIs and SDKs have evolved in the past 3 years, I figure I let AI accomplish this simple task. I was lead on a 2 hour cluster-f where WindSurf (which used Anthropic Claude 3.5 Sonnet) is bouncing back and forth between plaid and python-plaid package namespace, and between Country Code as a number and Country Code as an enum. As an experienced coder and someone who had manually coded Plaid Python SDK integration before, I can clearly see Claude is struggling between two corpus of code examples, the old plaid-python SDK and the newer plaid SDK - and AI cannot distinguish between what is currently relevant SDK and what has been deprecated. I even strongly promoted: “You are wrong, you are going down the wrong path, ‘plaid’ is the current SDK namespace.” Claude would apologize, correct all the code it did, walk down the wrong rabbit hole, and then revert back to the same mistakes it made 3-4 iterations ago - there is zero reasoning, memory, nor cognitive understanding of what it’s doing. Like true statistical path walking, it is literally flipping a coin and choosing a cluster of code that seems popular for this prompt. Imagine having minimal coding or Plaid/GSheet integration experience, you would be vibe coding for days going through a unproductive walkabout.
What LLM cannot do:
1) Understand new SDKs. It had zero insight into the code in the SDK, and hence is just guess how to integrate with it based on the scant examples it can crawl and find
2) Simulate a compiler and runtime, yet. It doesn’t understand why it produced compile-able or non-compile-able code.
3) Understand systems, integration, dependencies and architecture. It’s great at building standalone code that has zero dependencies. The moment you introduce dependencies that are new, are secure by nature, and/or don’t have too many public examples, LLM behave like a newborn child - except a overconfident newborn child who insists: “I know what the problem is. I’ve updated the dependencies and code. Try this now and tell me if it works.” Over and over again.
Unfortunately LLM still has extreme hubris and overconfidence, proposing code like it’s the exact answer to what you seek, only to have to go in circles and spend more time deciphering its mistakes than to code it yourself. LLM is an overconfident search engine prone to AI-splaining.
The reports of humanity’s replacement by AI are greatly exaggerated.
Discussion about this post
No posts
as always, Gene, viciously on point and immensely humorous, to boot :)