Samsung TRUEBench tests AI chatbots against real workplace tasks, yet skepticism persists about benchmarks capturing true human job complexity.