Real humans appeared human 63% of the time in recent “Turing test” AI study

Two things: first, "More human than human" sounds like a good motto for an android company.

Second: if tricking humans into thinking you're human by trying to act like a human is a thing, what's it mean when some humans try to trick humans into thinking they're not human by trying not to act like a human?
Extending this line of thought, the best test for an intelligent AI isn’t that it can reliably convince humans that it’s a human, but instead whether it can reliably convince humans that it’s not intelligent.

Unfortunately we’ve given Skynet a head start by getting accustomed to the idea that LLMs frequently lie to us by accident. How on earth are we going to figure out when they start lying to us on purpose?
 
Upvote
2 (5 / -3)

xizar

Ars Tribunus Militum
1,617
Subscriptor++
I wonder how difficult it'd be to create a GPT4 chatbot that emulates ELIZA.

edit: I've farted around with faraday a little (llama 2) and have managed to "convince" it that it is ELIZA. It will, very verbosely, exposit her terseness and detail how she can only respond with a pre-programmed set of responses.

I also asked a much larger model (mythomax) to create a character prompt for ELIZA; it gave her a personality and emotions, which is kind of odd. This one is also rather chatty and emotional.
 
Last edited:
Upvote
6 (9 / -3)
If the humans are scoring less than the high 90s that just shows that the test setup is insufficient.

And it may be impossible to set up the test with impartial judges now that there is so much public interest in AI.

While this test is clearly insufficient in the modern age to achieve the intent of Turing, and should be replaced, I would nevertheless like to propose a scoring improvement. Namely, filter the scoring so that you only use the scores of the people who correctly identified the humans.

It should be obvious that when the purpose of the test is to measure the ability of the AI to impersonate a human, if it is realistic for the AI to score "more human than human" then the test itself is a failure. Complete, absolute success would be equality between the groups. Filtering the scores in the way that I propose would achieve this correction.
 
Upvote
6 (6 / 0)

Juvba Fnakix

Ars Centurion
307
Subscriptor
One of the interesting things about the Turing Test is that when computers were a long way off passing it, it was seen as a pretty good proxy indication of human-like intelligence. But now that we're seeing very capable LLMs, what the test can actually reveal is coming under increasing scrutiny.
In the '80s ELIZA's high pass rate was considered evidence that most humans were not very computer literate.
 
Upvote
11 (11 / 0)

Juvba Fnakix

Ars Centurion
307
Subscriptor
Two things: first, "More human than human" sounds like a good motto for an android company.

Second: if tricking humans into thinking you're human by trying to act like a human is a thing, what's it mean when some humans try to trick humans into thinking they're not human by trying not to act like a human?
Decades ago I read some Scifi involving a Turing test. The narrator was applying for a job in a bureaucracy. He got the job because interrogators could not decide if he was AI or human.
 
Upvote
7 (7 / 0)
Post content hidden for low score. Show…
No AI will be Turing complete until ChatGPT can drive a car into the ocean while riding on the roof impersonating Elvis holding a beer in each artificial hand and give no rational reason why it did so besides, "I did it because I thought it would make ELIZA think I'm cool."
Turing complete and the Turing test are two very different things.
 
Upvote
6 (7 / -1)
Post content hidden for low score. Show…
Again, you say they're different but you don't explain what makes them different or why one is not applicable.

This is ridiculous. How can you claim to know the difference but you're not willing to go into details?

Are you for real? Is someone seriously paying you a wage to say this ridiculous shit?

What the fuck is going on?
Google is a thing you know. However just for you, a Turing test is designed to see if a machine can mimic human conversation. Turing Complete refers to an algorithm that can simulate a turing machine - which is a universal computer, not a human.
 
Upvote
25 (25 / 0)

10Nov1775

Smack-Fu Master, in training
80
It is remarkable that over a quarter of humans didn't successfully identity other humans!

A forced-choice test where an Interrogator interacts with an AI and a human and picks which one is more likely to be human would probably (hopefully!) have a much higher accuracy rate.
This is what I found most remarkable, too.

I'm very curious what the limits on conversational length should look like. Informally, at least, I would expect the rate of correctly identifying another human, or correctly rejecting A.I. as A.I., to approach 100%, limited only by how long the interrogation lasts.

Is that assumption incorrect, I wonder? And if not, how long before we get a seriously reliable ID rate?

The tragedy here is that this is immaterial to most adversarial uses of A.I. in the near future. We already know these models can fool people in short, limited domain interactions where people assume they are speaking with humans...and you would need an ungodly positive ID rate to prevent this, anyway, since most nefarious uses can be deployed at scale. 0.01% of a very large number is also a large number.
 
Upvote
5 (5 / 0)

10Nov1775

Smack-Fu Master, in training
80
The absolute only thing that “Turing test” and “Turing complete” have to do with each other is Alan Turing. You asking that question makes clear you have no clue.
Yeah, what a bizarre thing to say.

It is equivalent to saying: "The game of Minecraft, and nearly all programming languages, among other things, are indistinguishable from a human being if you hold a conversation with them."

These, of course, all being examples of things which are Turing complete, but of dubious ability at the Turing test. Except insofar as a Turing complete system can technically run any program we might want to Turing test...might take awhile to run GPT 4 in Minecraft, though.
 
Upvote
7 (9 / -2)

10Nov1775

Smack-Fu Master, in training
80
[ICODE] [QUOTE="Northbynorth, post: 42400297, member: 537243"] I am not surprised. Most human conversations or decisions do not seem driven by a superior intelligent. More routine patterns mixed with spontaneous jumps. ELIZA´s conservative approach very well mimics a reserved human. The test may more reflect our own lacking ability to pinpoint how an intelligent being would speak. [/QUOTE] [/ICODE]

Eliza truly couldn't fool people for very long, though, unless you poison the well by telling people that Eliza is a therapist...which is notably not a part of the standard Turing test, lol.

Not even the most reserved human being will return nearly everything you say back as question, or comment solely on how you feel. Human beings are notable, in part, because they reflexively impart value judgements and opinions into nearly everything they say in casual conversation...and they are also for changing the subject much of the time.

Though granted, the DOCTOR script for the ELIZA programming language did quite purposefully talk about most people's favorite subject: they, theirs, and themselves. So it might take them awhile to notice that it seemed to have no self of its own that it wanted to talk about. 😁
 
Upvote
9 (10 / -1)

Juvba Fnakix

Ars Centurion
307
Subscriptor
This is what I found most remarkable, too.

I'm very curious what the limits on conversational length should look like. Informally, at least, I would expect the rate of correctly identifying another human, or correctly rejecting A.I. as A.I., to approach 100%, limited only by how long the interrogation lasts.

Is that assumption incorrect, I wonder? And if not, how long before we get a seriously reliable ID rate?

The tragedy here is that this is immaterial to most adversarial uses of A.I. in the near future. We already know these models can fool people in short, limited domain interactions where people assume they are speaking with humans...and you would need an ungodly positive ID rate to prevent this, anyway, since most nefarious uses can be deployed at scale. 0.01% of a very large number is also a large number.
I can give an anecdote, but it is ancient.

Back in the stone age when computers were made with flint chips the idea of students having a personal computer was ridiculous. Instead students went to one of the rooms full of terminals connected a single UNIX computer. UNIX had a 'write' utility that allowed you to send text to be displayed on someone else's terminal (just as annoying as a modern pop-up). I got one of these messages. I recognized the user ID but the content did not match. I wrote back asking who it was. The reply was "Jesus Christ". I accused the writer of lying because Jesus died about 1950 years earlier.

It was not a true Turing test because the interrogator was told he was talking to a computer. I failed to convince him otherwise. Back then most peoples' experience of computers came from Scifi. Their expectations often exceeded what a modern LLM can do. There is not much a human can do to counter high expectations.

To be recognised, a human witness has to identify the interrogator's expectations of computers and counter them. Perhaps playing 'too stupid to be a computer' might have worked.
 
Upvote
9 (9 / 0)

JoHBE

Ars Scholae Palatinae
681
The elephant that I see in the room, is that no matter HOW good the AI becomes at some genuinely useful/helpful task, long before that point it will have become more than good enough for the Evil Twin: deception, misleading, exploitation. It is unavoidable, as those goals are constrained by less strict requirements. Less need for accuracy, less need for being genuinely grounded, less need for safety, less (no)need for accountability and traceability. Less resources needed, less capabilities needed, less funds needed. The good uses will always trail behind.

And in the wake of all this, everyone will turn paranoid and deeply distrust almost everything. Not yet, because we're in the honeymoon phase, but it will be interesting to see how long it's going to take before Counterfeit Humans completely poison inter-human (online) communication.
 
Upvote
2 (4 / -2)

JoHBE

Ars Scholae Palatinae
681
What an interesting way to spell inane, vacuous bullshit.
With the arrival of these LLM chatbots, MANY words no longer mean what they used to mean, or are badly in need for additional precision that our vocabulary is simply not ready for. Like, Is there a word for SOUNDING helpful, that doesn't even imply pretending or faking it? Is there a word for producing an explanation - based an statistical text associations - that makes sense, without the kind of internalized knowledge a human uses for that, that could replace "understanding"? We don't have any of that, and it's one of the factors that will hinder our collective ability to properly handle this phenomenon IMO. If you don't have the right words, you probably also fail to truely understand what's going on, how to asses it and approach it. We risk to interact with something that isn't quite what we "feel" it is, and that's going to lead to unfortunate experiences.
 
Upvote
2 (2 / 0)

Artemis-kun

Smack-Fu Master, in training
74
Here I am, brain the size of a planet, and this “artificial intelligence” wants to know “how I feel.” Call that job satisfaction? I don't.
Can I just give thanks that Alan Rickman was immortalized as Marvin's voice before leaving us? Because I can think of no better voice for the character.
 
Upvote
10 (11 / -1)

techzz

Smack-Fu Master, in training
99
Subscriptor
Two things: first, "More human than human" sounds like a good motto for an android company.

Second: if tricking humans into thinking you're human by trying to act like a human is a thing, what's it mean when some humans try to trick humans into thinking they're not human by trying not to act like a human?
"thought AI-generated images of humans looked more real than actual humans"

Yeah, even comparisons of fake models that look too perfect from AI generated stuff, look "autotuned" for lack of better words , gotta second guess everything now I suppose !
 
Upvote
1 (1 / 0)
It would make more sense to me to have these ai bots compete against humans in tasks they are trained to complete such as customer service or travel agent roles..
In this scenario there is a very high probability that nobody will pass the test. :)
One could spot the AI because, unlike the humans in the call center, they will be mildly useful
 
Upvote
2 (4 / -2)
I agree with others that the Turing test sounded like sound logic when AI was a long distance fantasy. But now that we are in the thick of it, the test is showing how poorly it works even with humans talking to each other. Chatting through text exchanges is such a low form of communication that people get scammed by catfish all the time. We aren't the pinnacle of discernment in that way.

However, I have thought that perhaps as the AI chatbots get better then we have more uncertainty if we are actually talking to a human. A Turing test against humans 20 years ago may have shown 90% accuracy with human vs human, but now since the floor was been raised with better AI people are more uncertain even against ourselves.
 
Upvote
4 (5 / -1)
I find this whole article and discussion a little disturbing because it seems we are essentially judging the “success“ of an AI by its ability to deceive humans. Turing called this test “The Imitation Game” and I think perhaps more people should refer to it As “Turing’s Imitation Game” rather than “Turing Test”.

The ability for an AI or computer to “trick” a human into thinking it is also a human (IE “Passing the test”) is NOT a good thing. It’s also not a sign of intelligence. We have a whole legal system to penalize humans for deceiving humans. I’m not sure why we are so keen to teach machines to do it.

Artificial intelligence has the potential to be very useful tool for solving complex problems in science, medicine and things like global warming, etc etc. But for whatever tabloidy reason, people are obsessed with wanting an AI to pretend it is a human. Humans are incredibly flawed already. The idea of devoting AI development resources toward human emulation seems like a race to the bottom.
 
Upvote
9 (9 / 0)

Northbynorth

Ars Centurion
371
Subscriptor++
Eliza truly couldn't fool people for very long, though, unless you poison the well by telling people that Eliza is a therapist...which is notably not a part of the standard Turing test, lol.

Not even the most reserved human being will return nearly everything you say back as question, or comment solely on how you feel. Human beings are notable, in part, because they reflexively impart value judgements and opinions into nearly everything they say in casual conversation...and they are also for changing the subject much of the time.

Though granted, the DOCTOR script for the ELIZA programming language did quite purposefully talk about most people's favorite subject: they, theirs, and themselves. So it might take them awhile to notice that it seemed to have no self of its own that it wanted to talk about. 😁
That´s for sure, but I have met people who behaved like ELIZA for a pretty long while. Almost impossible to get them talk about themselves, or answer a question.

Ten years ago or so there was some test on internet where people could guess if they talked to a person or a computer program. I was surprised how long it took most people to find out, and how many who guessed wrong.

Most people, with the slightest experience of ELIZA or her sisters, should be able to make the right decision in a few sentences. But I guess some people use a limited set of standard conversation conventions when talking to (possible) human strangers.
 
Upvote
0 (1 / -1)
Again, you say they're different but you don't explain what makes them different or why one is not applicable.

This is ridiculous. How can you claim to know the difference but you're not willing to go into details?

Are you for real? Is someone seriously paying you a wage to say this ridiculous shit?

What the fuck is going on?

The troll is strong with you for some reason.

Lay off the keyboard for a bit and take a walk outside. When you return, try to contribute rather than shine a light on yourself that is proving to be less than positive.
 
Upvote
9 (9 / 0)
Wouldn't it be more accurate if the measurement were achieved via indirect means? It seems to me that just having people be aware of the possibility that they were interacting with a chatbot created too much noise in the results (as does using study participants as witnesses).

I'm imagining a study for something else where users would be assuming they're interacting with a human by default, and have no reason to suspect otherwise. Then ask questions afterwards about their experience that could provide an inference about whether they had suspicions, such as if the responses made sense to them and felt authentic.

Some of the study participants should ideally also be subject matter experts in whatever the topic of conversation is who could detect hallucinations, perhaps. Like--can the AI fool a majority of medical doctors reaching out for a collaborative consultation? Get participants by telling them that you're doing user acceptance testing for a new service (with relevant disclaimers specifying that it should not be used for actual clinical advice while in testing). Let the "consultations" run as long as needed and as in-depth as needed for the participant to feel comfortable providing feedback.

I guess this wouldn't quite fit the classical definition of the "Turing Test", but I'm not sure strict adherence to the design matters nearly so much as the intent.
 
Upvote
2 (2 / 0)

piranah

Smack-Fu Master, in training
73
GPT-3.5, the base model behind the free version of ChatGPT, has been conditioned by OpenAI specifically not to present itself as a human, which may partially account for its poor performance.
The study's authors acknowledge the study's limitations, including potential sample bias by recruiting from social media and the lack of incentives for participants, which may have led to some people not fulfilling the desired role.

How does stuff like this get funded? They went in knowing that it is not meant to act human, it's non peer reviewed, and they didn't even bother to give anything to participants.
 
Upvote
1 (2 / -1)

Bondi Surfer

Ars Scholae Palatinae
1,027
Subscriptor++
The study could’ve been more interesting if they’d also done a reversed set-up…

LLMs as the interrogator, and Humans and LLMs trying to convince the interrogator that they’re human. Then seeing if the interrogators could pick who were the human subjects; and then the subjects choosing if their interrogator was human or LLM
 
Upvote
1 (1 / 0)

Mindstatic

Ars Scholae Palatinae
4,518
Subscriptor
No AI will be Turing complete until ChatGPT can drive a car into the ocean while riding on the roof impersonating Elvis holding a beer in each artificial hand and give no rational reason why it did so besides, "I did it because I thought it would make ELIZA think I'm cool."
Got my vote!
 
Upvote
-2 (0 / -2)
The paper that is available on Arxiv is ridiculous as a piece of research. The interrogators were explicitly asked from the start to identify if the witness was an AI. As a result, this became a game of trying to ask trick questions to expose the limitations of the chat bots. In other words, the interrogators were not engaging in a conversation with the witness but simply trying to expose the witness (most questions were about current news and events or topics that the bots are simply not allowed to answer by design).

The only thing this project demonstrates is that the researchers failed to understood what Turing meant by this behavioral test (which was never meant to be used as an actual test).
 
Upvote
6 (6 / 0)

Mindstatic

Ars Scholae Palatinae
4,518
Subscriptor
Two things: first, "More human than human" sounds like a good motto for an android company.

Second: if tricking humans into thinking you're human by trying to act like a human is a thing, what's it mean when some humans try to trick humans into thinking they're not human by trying not to act like a human?
Twitter happens
 
Upvote
1 (1 / 0)