1960s chatbot ELIZA beat OpenAI’s GPT-3.5 in a recent Turing test study

GPT-4: fail. Humans: pass?

Ultimately, the study's authors concluded that GPT-4 does not meet the success criteria of the Turing test, reaching neither a 50 percent success rate (greater than a 50/50 chance) nor surpassing the success rate of human participants. The researchers speculate that with the right prompt design, GPT-4 or similar models might eventually pass the Turing test. However, the challenge lies in crafting a prompt that mimics the subtlety of human conversation styles. And like GPT-3.5, GPT-4 has also been conditioned not to present itself as human. "It seems very likely that much more effective prompts exist, and therefore that our results underestimate GPT-4’s potential performance at the Turing Test," the authors write.

As for the humans who failed to convince other humans that they were real, that may reflect more on the nature and structure of the test and the expectations of the judges, rather than on any particular aspect of human intelligence. "Some human witnesses engaged in ‘trolling’ by pretending to be an AI," write the authors. "Equally some interrogators cited this behavior in reasons for human verdicts. As a consequence, our results may underestimate human performance and overestimate AI performance."

A previous Turing test study done by AI21 Labs from May found that humans correctly identified other humans about 73 percent of the time (failing to ID them in 27 percent of encounters). Informally, one might suspect that humans could succeed far more than 63 percent or 73 percent of the time. Whether we should actually expect higher success rates for humans or not is unclear, but the 27-37 percent failure gap may have implications in a future where people may deploy AI models to deceive others.

In an unrelated study from November (Miller, et al), researchers found that people thought AI-generated images of humans looked more real than actual humans. Given that knowledge, and allowing for technology improvements, if an AI model could surpass the 63-73 percent barrier, its communications could hypothetically appear more human than an actual human. The future is going to be an interesting place indeed.

transform and roll out —

1960s chatbot ELIZA beat OpenAI’s GPT-3.5 in a recent Turing test study

AI chatbot deception paper suggests that some bots (and people) aren't very persuasive.

GPT-4: fail. Humans: pass?

Further Reading

Channel Ars Technica

GPT-4: fail. Humans: pass?

Further Reading

reader comments

Channel Ars Technica