philecho 18 hours ago

I wrote an essay outlining why common AI benchmarks are not terribly useful, instead arguing we should mostly use normal user experience instead.

Key reasons: 1) Most questions are not simply ‘wrong’ or ‘right’ 2) Most user problems are poorly defined 3) Agents are getting popular, and they pose interconnections of these problems