Why isn't LLM progress the same everywhere?

The more a task/job is verifiable, the more amenable it is to automation in the new programming paradigm. If it is not verifiable, it has to fall out from neural net magic of generalization fingers crossed, or via weaker means like imitation. This is what's driving the "jagged" frontier of progress in LLMs. Tasks that are verifiable progress rapidly, including possibly beyond the ability of top experts (e.g. math, code, amount of time spent watching videos, anything that looks like puzzles with correct answers), while many others lag by comparison (creative, strategic, tasks that combine real-world knowledge, state, context and common sense).

Software 1.0 easily automates what you can specify. Software 2.0 easily automates what you can verify.

https://x.com/karpathy/status/1990116666194456651?s=61

Really important distinction for where LLMs are a great fit (and one of many reasons that software is a good use). But even with the current tools and methods, if you can verify something as correct, then the process can be automated.

Common sense, can’t be automated. How someone feels about a certain thing can’t be verified. And this is why some places, AI will take a long time to get a foot hold. But why law is also such a good fit, it’s right or wrong.