‘Like drinking from a firehose’ – what it’s like to be the human in the AI loop

Wait 5 sec.

Getty ImagesThe government’s promised overhaul of New Zealand’s public service has made much of the potential of artificial intelligence (AI) to streamline operations and compensate for a radically reduced workforce.This is in keeping with generally utopian visions of generative AI (GenAI) tools unleashing creativity, removing mundane, repetitive work, and “freeing up humans” for more fulfilling tasks. However, this may be naive. It’s true, GenAI tools can create efficiencies and cost savings for organisations as they become more powerful and their implementation becomes more sophisticated. In this win-win world, organisations and the people who work in them benefit.But there’s another side to this story as we become more aware of the downsides of GenAI tools – security risks, hallucinations, bias, a “dumbing down” of human input and lack of ethical insight. However, one thing that is not debated is the need for human oversight of GenAI work. For legal and reputational reasons, organisations require a “human in the loop” who is responsible for reviewing GenAI outcomes, and has the authority to overturn them. Easier said than done. As we discovered earlier this year when we held an industry panel discussion on GenAI for business students, being the human in the loop can be a role with great responsibility and pressure. Faster with fewer peopleHumans are expected to check and approve outputs, make decisions in ambiguous situations, provide feedback to improve the performance of GenAI tools, and offer ethical oversight and judgement. The main reason is that GenAI-tools cannot be held accountable for any of their outputs or decisions. GenAI tools are legally considered to be “property” not “persons” and they cannot hold rights or incur duties, meaning final accountability falls with humans. However, exactly which humans can vary. The organisation implementing the GenAI tool is most frequently considered responsible for any of its behaviours and outputs. In other cases, especially if the tool can be shown to be faulty, the developers or tool vendors may be responsible. If a problem can be traced to incorrect or biased data, the provider of the data may have some responsibility. An unexpected negative consequence of GenAI implementation paradoxically arises from its success. Successful GenAI use means executives and managers are expecting to get things done faster with fewer people. Tasks that used to be done in days or weeks are expected to be done in hours. As a senior manager of a large multinational business told us: Our goal in the next 18 months is to cut the engineering team down to a quarter of its current size and we need to find out how to leverage AI tools to achieve this.The pressures on human reviewersWhen the overall volume of outputs is lifted substantially by AI tools, the human in the loop can become a major bottleneck. Within organisations there are now emerging “content creators” who know how to prompt GenAI tools to quickly generate proposals, reports and presentations even in domains where they lack expertise. These outputs will be sent to the “content reviewers” for “sanity checks”. Those reviewers are domain experts. They are expected to rectify errors, remove nonfactual hallucinated statements, improve quality and provide accountability and final endorsement. On one hand, a GenAI-powered “creator” can generate a plausible 50-page report in a matter of 15–30 minutes. On the other, the “reviewer” will have to spend hours reading, rectifying and rewriting to make the final report ready for the audience. This has transposed the workload distribution between “creators” and “reviewers”. At one time a creator would be responsible for around 80% of the total time and effort to produce an advanced draft or prototype, and a reviewer would use the remaining 20% to polish it. Now the distribution is less than 20% required from the creator, and more than 80% from the reviewer. One of our panellists described this as like “drinking from a firehose”.Sometimes reviewers have to “let it go”, as they cannot cope with the speed and volume of content coming towards them. But this coping strategy has potentially dire consequences for the organisations they serve. ‘Workslop’ and burnoutThere is also a personal cost. Subject-matter experts exposed to unrealistic expectations suffer from burnout, low job satisfaction and high turnover in the organisations we spoke with. They are overloaded while junior colleagues are losing their jobs or aren’t being hired in the first place. If expert reviewers resign, they may be replaced by more junior colleagues, who are more prepared to trust AI-generated content and sign it off rapidly. This can become a cycle of decreasing quality, and also raises the question of where the next generation of expert reviewers will come from. Generating “workslop” (content that seems professional but is of uncertain quality) is cheap and fast, while genuine accountability is difficult. Simply having a nominal human in the loop is not enough.Quality human oversight needs to be designed in, budgeted for, valued and supported by organisational processes and culture.The authors do not work for, consult, own shares in or receive funding from any company or organisation that would benefit from this article, and have disclosed no relevant affiliations beyond their academic appointment.