AI to replace programmers?
Dario Amodei, Anthropic CEO, says that in 12 months all code will be written by bots
Dario Amodei, Anthropic CEO, claims that AI will be writing 90% of the programming code in 3 to 6 months, and writing 100% of the code in 12 months. Students who took this claim seriously might be dissuaded from pursuing education in coding. Business executives who took this claim seriously might be dissuaded from hiring software developers. But this prediction is based on a number of faulty assumptions:
1. Well over 90% of code isn’t already written by AI.
2. The last 10% of a problem is as easy as the first 10%. Once 90% of the code is generated by machines, the remaining 10% can’t be very difficult.
3. All the code the could ever be needed could already be generated by the model.
4. Copying and pasting code is all there is to being a software developer.
Bots already produce way more than 90% of the code that runs computers. At least since the 1970s, a vast majority of code has been compiled or interpreted by artificial intelligence, particularly finite state automata. Finite state automata are a kind of language model that has been around since the 1940s.
Computers do not run directly on languages like C or Python. You are probably familiar with the fact that computer memory consists of binary strings as does the machine code used to operate on that memory. “For example, on the Zilog Z80 processor, the machine code 00000101, … causes the CPU to decrement the B general-purpose register” (Wikipedia). A compiler or an interpreter translates the high-level human-readable language to machine code using a complex program called a compiler. One line of Python may call a number of libraries, each of which consists of many lines of code, which eventually gets interpreted into a byte code and ultimately into many thousands of lines of machine code. The ratio of machine code to human-written code is enormous, so it would not be too far out of line to claim that close to all program code is AI generated, as directed by the human-written high-level code. Now the fantasy is that ambiguous natural language will be able to replace the structured programming languages that developers use today.
Amodei claims that GenAI models are nearly to the level where they can produce 90% of the code and maybe he is right about that both in the sense described previously and in the sense that most of the code in an application is boilerplate or at least repetitive. Functions get reused, coding style practice requires repetitive structures, and so on. Add to that the code that is easily looked up and only a small percentage of the code in an application is actually original.
But even if original code is a small portion of the total code, the idea that that it is like the repetitive kind is laughable. The majority of the code may be easy to automate, but it is the marginal, novel, innovative, genius, code that make the difference and makes marketable applications. Edison is frequently quoted to have said that “Genius is 98% perspiration and 2% insipiration.” It would be folly to think that a model that is capable of even coding 98% would likely be able to provide the remaining 2% using the same approach.
Language models trained on the code in various repositories would be most likely to be able to produce the most likely code, but the likelihood of its success would diminish on the specifics. It would fail completely on code (or its paraphrases) that had never been seen before. For example, ChatGPT performed 48% better on problems that were published in a well-known coding benchmark (LeetCode) before the model was trained than it did on problems that were added later. Model performance on well-known problems is not predictive of how well it will perform on unknown problems.
More generally, even more recent models were found to fail miserably on novel mathematical reasoning tasks when tasks like them were not included in their training. Petrov and his colleagues evaluated several recent reasoning models on problems from the 2025 USAMO (USA Math Olympiad) shortly (within hours) after the problems’ release. Although the tested models performed well on previous tests, Petrov found that they performed poorly on this new test, achieving less than 5% success. The importance of the exact training data used is rarely appreciated. Pretraining on the tests or similar problems may, in fact, be what is needed to achieve success with these models, which does not portend well for replacing coders. These findings support the idea that language models mimic reasoning on which they have been trained without having the recited reasoning play a causal role in their results.
There is more to software development than cutting and pasting code. According to O*NET, these are the activities performed by a software developer.
· Analyze information to determine, recommend, and plan installation of a new system or modification of an existing system.
· Analyze user needs and software requirements to determine feasibility of design within time and cost constraints.
· Confer with data processing or project managers to obtain information on limitations or capabilities for data processing projects.
· Confer with systems analysts, engineers, programmers and others to design systems and to obtain information on project limitations and capabilities, performance requirements and interfaces.
· Consult with customers or other departments on project status, proposals, or technical issues, such as software system design or maintenance.
· Coordinate installation of software system.
· Design, develop and modify software systems, using scientific analysis and mathematical models to predict and measure outcomes and consequences of design.
· Determine system performance standards.
· Develop or direct software system testing or validation procedures, programming, or documentation.
· Modify existing software to correct errors, adapt it to new hardware, or upgrade interfaces and improve performance.
· Monitor functioning of equipment to ensure system operates in conformance with specifications.
· Obtain and evaluate information on factors such as reporting formats required, costs, or security needs to determine hardware configuration.
· Prepare reports or correspondence concerning project specifications, activities, or status.
· Recommend purchase of equipment to control dust, temperature, or humidity in area of system installation.
· Specify power supply requirements and configuration.
· Store, retrieve, and manipulate data for analysis of system capabilities and requirements.
· Supervise and assign work to programmers, designers, technologists, technicians, or other engineering or scientific personnel.
· Supervise the work of programmers, technologists and technicians and other engineering and scientific personnel.
· Train users to use new or modified equipment.
There is clearly a lot more to being a software developer than just copying and pasting code. On the other hand, if a model were capable of replacing coders, then it should be possible for a manager to paste this list into the prompt for a coding model and have it write an application to perform all of these functions. According to Amodei’s boast, an LLM should be able, within about 12 months, to produce such an application. I’m skeptical.
Besides whether AI can produce code correctly, there is a large amount of technical debt: A study by the Software Engineering Institute at Carnegie Mellon (sei.cmu.edu/reports/2024) reveals that code generated by AI assistants can introduce up to 35% more technical debt compared to traditionally written code.
This argues against using AI code without tracking the technical debt. Technical debt compounded on technical debt will quickly out strip benefits.