Real humans appeared human 63% of the time in recent “Turing test” AI study

Geebs · Friday at 10:21 PM

uire said:
Two things: first, "More human than human" sounds like a good motto for an android company.

Second: if tricking humans into thinking you're human by trying to act like a human is a thing, what's it mean when some humans try to trick humans into thinking they're not human by trying not to act like a human?

Extending this line of thought, the best test for an intelligent AI isn’t that it can reliably convince humans that it’s a human, but instead whether it can reliably convince humans that it’s not intelligent.

Unfortunately we’ve given Skynet a head start by getting accustomed to the idea that LLMs frequently lie to us by accident. How on earth are we going to figure out when they start lying to us on purpose?

xizar · Friday at 10:30 PM

I wonder how difficult it'd be to create a GPT4 chatbot that emulates ELIZA.

edit: I've farted around with faraday a little (llama 2) and have managed to "convince" it that it is ELIZA. It will, very verbosely, exposit her terseness and detail how she can only respond with a pre-programmed set of responses.

I also asked a much larger model (mythomax) to create a character prompt for ELIZA; it gave her a personality and emotions, which is kind of odd. This one is also rather chatty and emotional.

AngryEssay · Friday at 10:35 PM

Stuff like these shows two clear conclusions in my opinion: 1) how bad the Turing Test is, and 2) how much Kurzweilian futro-waffling relies on things like these for their paragraphs.

jpera · Friday at 11:14 PM

simonbp said:
How does that make you feel?

Hi, Eliza. I 'm happy to see you posting comments,

RubyPanther · Saturday at 12:03 AM

If the humans are scoring less than the high 90s that just shows that the test setup is insufficient.

And it may be impossible to set up the test with impartial judges now that there is so much public interest in AI.

While this test is clearly insufficient in the modern age to achieve the intent of Turing, and should be replaced, I would nevertheless like to propose a scoring improvement. Namely, filter the scoring so that you only use the scores of the people who correctly identified the humans.

It should be obvious that when the purpose of the test is to measure the ability of the AI to impersonate a human, if it is realistic for the AI to score "more human than human" then the test itself is a failure. Complete, absolute success would be equality between the groups. Filtering the scores in the way that I propose would achieve this correction.

Juvba Fnakix · Saturday at 12:12 AM

SnoopCatt said:
One of the interesting things about the Turing Test is that when computers were a long way off passing it, it was seen as a pretty good proxy indication of human-like intelligence. But now that we're seeing very capable LLMs, what the test can actually reveal is coming under increasing scrutiny.

In the '80s ELIZA's high pass rate was considered evidence that most humans were not very computer literate.

Juvba Fnakix · Saturday at 12:19 AM

uire said:
Two things: first, "More human than human" sounds like a good motto for an android company.

Second: if tricking humans into thinking you're human by trying to act like a human is a thing, what's it mean when some humans try to trick humans into thinking they're not human by trying not to act like a human?

Decades ago I read some Scifi involving a Turing test. The narrator was applying for a job in a bureaucracy. He got the job because interrogators could not decide if he was AI or human.

Vidi Vici Veni · Saturday at 12:55 AM

UserIDAlreadyInUse said:
No AI will be Turing complete until ChatGPT can drive a car into the ocean while riding on the roof impersonating Elvis holding a beer in each artificial hand and give no rational reason why it did so besides, "I did it because I thought it would make ELIZA think I'm cool."

Turing complete and the Turing test are two very different things.

Atterus · Saturday at 1:12 AM

Feels like vindication...

Vidi Vici Veni · Saturday at 1:23 AM

earlyberd said:
Again, you say they're different but you don't explain what makes them different or why one is not applicable.

This is ridiculous. How can you claim to know the difference but you're not willing to go into details?

Are you for real? Is someone seriously paying you a wage to say this ridiculous shit?

What the fuck is going on?

Google is a thing you know. However just for you, a Turing test is designed to see if a machine can mimic human conversation. Turing Complete refers to an algorithm that can simulate a turing machine - which is a universal computer, not a human.

TakeABabyLeaveABaby · Saturday at 1:44 AM

poochyena said:
Honestly, misspelling a word or multiple words would probably be a simple way to prove you are human. Or perform some basic math/logic problems.

"Can you withstand a powerful magnetic field?"

10Nov1775 · Saturday at 2:28 AM

benwaggoner said:
It is remarkable that over a quarter of humans didn't successfully identity other humans!

A forced-choice test where an Interrogator interacts with an AI and a human and picks which one is more likely to be human would probably (hopefully!) have a much higher accuracy rate.

This is what I found most remarkable, too.

I'm very curious what the limits on conversational length should look like. Informally, at least, I would expect the rate of correctly identifying another human, or correctly rejecting A.I. as A.I., to approach 100%, limited only by how long the interrogation lasts.

Is that assumption incorrect, I wonder? And if not, how long before we get a seriously reliable ID rate?

The tragedy here is that this is immaterial to most adversarial uses of A.I. in the near future. We already know these models can fool people in short, limited domain interactions where people assume they are speaking with humans...and you would need an ungodly positive ID rate to prevent this, anyway, since most nefarious uses can be deployed at scale. 0.01% of a very large number is also a large number.

10Nov1775 · Saturday at 2:37 AM

Steve austin said:
The absolute only thing that “Turing test” and “Turing complete” have to do with each other is Alan Turing. You asking that question makes clear you have no clue.

Yeah, what a bizarre thing to say.

It is equivalent to saying: "The game of Minecraft, and nearly all programming languages, among other things, are indistinguishable from a human being if you hold a conversation with them."

These, of course, all being examples of things which are Turing complete, but of dubious ability at the Turing test. Except insofar as a Turing complete system can technically run any program we might want to Turing test...might take awhile to run GPT 4 in Minecraft, though.

10Nov1775 · Saturday at 2:46 AM

[ICODE]
[QUOTE="Northbynorth, post: 42400297, member: 537243"]
I am not surprised.

Most human conversations or decisions do not seem driven by a superior intelligent. More routine patterns mixed with spontaneous jumps.

ELIZA´s conservative approach very well mimics a reserved human. The test may more reflect our own lacking ability to pinpoint how an intelligent being would speak.
[/QUOTE]

[/ICODE]

Eliza truly couldn't fool people for very long, though, unless you poison the well by telling people that Eliza is a therapist...which is notably not a part of the standard Turing test, lol.

Not even the most reserved human being will return nearly everything you say back as question, or comment solely on how you feel. Human beings are notable, in part, because they reflexively impart value judgements and opinions into nearly everything they say in casual conversation...and they are also for changing the subject much of the time.

Though granted, the DOCTOR script for the ELIZA programming language did quite purposefully talk about most people's favorite subject: they, theirs, and themselves. So it might take them awhile to notice that it seemed to have no self of its own that it wanted to talk about.

Juvba Fnakix · Saturday at 4:31 AM

10Nov1775 said:
This is what I found most remarkable, too.

I'm very curious what the limits on conversational length should look like. Informally, at least, I would expect the rate of correctly identifying another human, or correctly rejecting A.I. as A.I., to approach 100%, limited only by how long the interrogation lasts.

Is that assumption incorrect, I wonder? And if not, how long before we get a seriously reliable ID rate?

The tragedy here is that this is immaterial to most adversarial uses of A.I. in the near future. We already know these models can fool people in short, limited domain interactions where people assume they are speaking with humans...and you would need an ungodly positive ID rate to prevent this, anyway, since most nefarious uses can be deployed at scale. 0.01% of a very large number is also a large number.

I can give an anecdote, but it is ancient.

Back in the stone age when computers were made with flint chips the idea of students having a personal computer was ridiculous. Instead students went to one of the rooms full of terminals connected a single UNIX computer. UNIX had a 'write' utility that allowed you to send text to be displayed on someone else's terminal (just as annoying as a modern pop-up). I got one of these messages. I recognized the user ID but the content did not match. I wrote back asking who it was. The reply was "Jesus Christ". I accused the writer of lying because Jesus died about 1950 years earlier.

It was not a true Turing test because the interrogator was told he was talking to a computer. I failed to convince him otherwise. Back then most peoples' experience of computers came from Scifi. Their expectations often exceeded what a modern LLM can do. There is not much a human can do to counter high expectations.

To be recognised, a human witness has to identify the interrogator's expectations of computers and counter them. Perhaps playing 'too stupid to be a computer' might have worked.

wondering22334455 · Saturday at 6:17 AM

Eliza isn't even vaguely human

I said "I feel your breasts" and it varies responses between "why do you "I feel your breasts?" and "its been nice chatting with you. do you often feel my breasts?"

ma-moru · Saturday at 7:32 AM

simonbp said:
How does that make you feel?

Well, I really think you should quit smoking

torp · Saturday at 7:45 AM

researchers found that people thought AI-generated images of humans looked more real than actual humans

Interesting, so people think those Instagram photos of people aren't photoshopped to hell and back?

JoHBE · Saturday at 7:46 AM

The elephant that I see in the room, is that no matter HOW good the AI becomes at some genuinely useful/helpful task, long before that point it will have become more than good enough for the Evil Twin: deception, misleading, exploitation. It is unavoidable, as those goals are constrained by less strict requirements. Less need for accuracy, less need for being genuinely grounded, less need for safety, less (no)need for accountability and traceability. Less resources needed, less capabilities needed, less funds needed. The good uses will always trail behind.

And in the wake of all this, everyone will turn paranoid and deeply distrust almost everything. Not yet, because we're in the honeymoon phase, but it will be interesting to see how long it's going to take before Counterfeit Humans completely poison inter-human (online) communication.

JoHBE · Saturday at 8:02 AM

-Locke- said:
What an interesting way to spell inane, vacuous bullshit.

With the arrival of these LLM chatbots, MANY words no longer mean what they used to mean, or are badly in need for additional precision that our vocabulary is simply not ready for. Like, Is there a word for SOUNDING helpful, that doesn't even imply pretending or faking it? Is there a word for producing an explanation - based an statistical text associations - that makes sense, without the kind of internalized knowledge a human uses for that, that could replace "understanding"? We don't have any of that, and it's one of the factors that will hinder our collective ability to properly handle this phenomenon IMO. If you don't have the right words, you probably also fail to truely understand what's going on, how to asses it and approach it. We risk to interact with something that isn't quite what we "feel" it is, and that's going to lead to unfortunate experiences.

Artemis-kun · Saturday at 8:15 AM

BadSuperblock said:
Here I am, brain the size of a planet, and this “artificial intelligence” wants to know “how I feel.” Call that job satisfaction? I don't.

Can I just give thanks that Alan Rickman was immortalized as Marvin's voice before leaving us? Because I can think of no better voice for the character.

jefito · Saturday at 9:32 AM

earlyberd said:
What the fuck is going on?

Seems to be a concise summation of your understanding of Turing Test vs. Turing complete.

techzz · Saturday at 9:34 AM

uire said:
Two things: first, "More human than human" sounds like a good motto for an android company.

Second: if tricking humans into thinking you're human by trying to act like a human is a thing, what's it mean when some humans try to trick humans into thinking they're not human by trying not to act like a human?

"thought AI-generated images of humans looked more real than actual humans"

Yeah, even comparisons of fake models that look too perfect from AI generated stuff, look "autotuned" for lack of better words , gotta second guess everything now I suppose !

jefito · Saturday at 9:35 AM

Artemis-kun said:
Can I just give thanks that Alan Rickman was immortalized as Marvin's voice before leaving us? Because I can think of no better voice for the character.

Well, Stephen Moore, the voice of Marvin in the original radio and TV series did it pretty well, too...

wolfigor · Saturday at 10:14 AM

unequivocal said:
It would make more sense to me to have these ai bots compete against humans in tasks they are trained to complete such as customer service or travel agent roles..

In this scenario there is a very high probability that nobody will pass the test.

One could spot the AI because, unlike the humans in the call center, they will be mildly useful

J Carr · Saturday at 11:52 AM

Why do the values in the graph not equal the values in the text?

I prefer a human answer this question.

TMilligan · Saturday at 12:05 PM

I agree with others that the Turing test sounded like sound logic when AI was a long distance fantasy. But now that we are in the thick of it, the test is showing how poorly it works even with humans talking to each other. Chatting through text exchanges is such a low form of communication that people get scammed by catfish all the time. We aren't the pinnacle of discernment in that way.

However, I have thought that perhaps as the AI chatbots get better then we have more uncertainty if we are actually talking to a human. A Turing test against humans 20 years ago may have shown 90% accuracy with human vs human, but now since the floor was been raised with better AI people are more uncertain even against ourselves.

SparkE · Saturday at 12:15 PM

I find this whole article and discussion a little disturbing because it seems we are essentially judging the “success“ of an AI by its ability to deceive humans. Turing called this test “The Imitation Game” and I think perhaps more people should refer to it As “Turing’s Imitation Game” rather than “Turing Test”.

The ability for an AI or computer to “trick” a human into thinking it is also a human (IE “Passing the test”) is NOT a good thing. It’s also not a sign of intelligence. We have a whole legal system to penalize humans for deceiving humans. I’m not sure why we are so keen to teach machines to do it.

Artificial intelligence has the potential to be very useful tool for solving complex problems in science, medicine and things like global warming, etc etc. But for whatever tabloidy reason, people are obsessed with wanting an AI to pretend it is a human. Humans are incredibly flawed already. The idea of devoting AI development resources toward human emulation seems like a race to the bottom.

Northbynorth · Saturday at 12:23 PM

10Nov1775 said:
Eliza truly couldn't fool people for very long, though, unless you poison the well by telling people that Eliza is a therapist...which is notably not a part of the standard Turing test, lol.

Not even the most reserved human being will return nearly everything you say back as question, or comment solely on how you feel. Human beings are notable, in part, because they reflexively impart value judgements and opinions into nearly everything they say in casual conversation...and they are also for changing the subject much of the time.

Though granted, the DOCTOR script for the ELIZA programming language did quite purposefully talk about most people's favorite subject: they, theirs, and themselves. So it might take them awhile to notice that it seemed to have no self of its own that it wanted to talk about.

That´s for sure, but I have met people who behaved like ELIZA for a pretty long while. Almost impossible to get them talk about themselves, or answer a question.

Ten years ago or so there was some test on internet where people could guess if they talked to a person or a computer program. I was surprised how long it took most people to find out, and how many who guessed wrong.

Most people, with the slightest experience of ELIZA or her sisters, should be able to make the right decision in a few sentences. But I guess some people use a limited set of standard conversation conventions when talking to (possible) human strangers.

webGLdude · Saturday at 12:34 PM

earlyberd said:
Again, you say they're different but you don't explain what makes them different or why one is not applicable.

This is ridiculous. How can you claim to know the difference but you're not willing to go into details?

Are you for real? Is someone seriously paying you a wage to say this ridiculous shit?

What the fuck is going on?

The troll is strong with you for some reason.

Lay off the keyboard for a bit and take a walk outside. When you return, try to contribute rather than shine a light on yourself that is proving to be less than positive.

McTurkey · Saturday at 12:45 PM

Wouldn't it be more accurate if the measurement were achieved via indirect means? It seems to me that just having people be aware of the possibility that they were interacting with a chatbot created too much noise in the results (as does using study participants as witnesses).

I'm imagining a study for something else where users would be assuming they're interacting with a human by default, and have no reason to suspect otherwise. Then ask questions afterwards about their experience that could provide an inference about whether they had suspicions, such as if the responses made sense to them and felt authentic.

Some of the study participants should ideally also be subject matter experts in whatever the topic of conversation is who could detect hallucinations, perhaps. Like--can the AI fool a majority of medical doctors reaching out for a collaborative consultation? Get participants by telling them that you're doing user acceptance testing for a new service (with relevant disclaimers specifying that it should not be used for actual clinical advice while in testing). Let the "consultations" run as long as needed and as in-depth as needed for the participant to feel comfortable providing feedback.

I guess this wouldn't quite fit the classical definition of the "Turing Test", but I'm not sure strict adherence to the design matters nearly so much as the intent.

Aelix · Saturday at 1:25 PM

dwrd said:
Dr. Sbaitso unavailable for comment.

Holy shit, I haven’t thought of that in decades and I immediately heard the Sound Blaster voice saying “Hello, I am Dr. Sbaitso.” Why can I remember this but not my siblings birthdays.

piranah · Saturday at 1:41 PM

GPT-3.5, the base model behind the free version of ChatGPT, has been conditioned by OpenAI specifically not to present itself as a human, which may partially account for its poor performance.

The study's authors acknowledge the study's limitations, including potential sample bias by recruiting from social media and the lack of incentives for participants, which may have led to some people not fulfilling the desired role.

How does stuff like this get funded? They went in knowing that it is not meant to act human, it's non peer reviewed, and they didn't even bother to give anything to participants.

Bondi Surfer · Saturday at 2:34 PM

The study could’ve been more interesting if they’d also done a reversed set-up…

LLMs as the interrogator, and Humans and LLMs trying to convince the interrogator that they’re human. Then seeing if the interrogators could pick who were the human subjects; and then the subjects choosing if their interrogator was human or LLM

Pluto Pup of Hades · Saturday at 3:37 PM

Dr SBAITSO is not impressed.

Mindstatic · Saturday at 5:19 PM

UserIDAlreadyInUse said:
No AI will be Turing complete until ChatGPT can drive a car into the ocean while riding on the roof impersonating Elvis holding a beer in each artificial hand and give no rational reason why it did so besides, "I did it because I thought it would make ELIZA think I'm cool."

Got my vote!

arc-tu-rus · Saturday at 5:19 PM

The paper that is available on Arxiv is ridiculous as a piece of research. The interrogators were explicitly asked from the start to identify if the witness was an AI. As a result, this became a game of trying to ask trick questions to expose the limitations of the chat bots. In other words, the interrogators were not engaging in a conversation with the witness but simply trying to expose the witness (most questions were about current news and events or topics that the bots are simply not allowed to answer by design).

The only thing this project demonstrates is that the researchers failed to understood what Turing meant by this behavioral test (which was never meant to be used as an actual test).

Mindstatic · Saturday at 5:22 PM

uire said:
Two things: first, "More human than human" sounds like a good motto for an android company.

Second: if tricking humans into thinking you're human by trying to act like a human is a thing, what's it mean when some humans try to trick humans into thinking they're not human by trying not to act like a human?

Twitter happens

Real humans appeared human 63% of the time in recent “Turing test” AI study

Ars Tribunus Militum

Ars Tribunus Militum

Wise, Aged Ars Veteran

Seniorius Lurkius

Ars Scholae Palatinae

Ars Centurion

Ars Centurion

Ars Scholae Palatinae

Ars Scholae Palatinae

Ars Scholae Palatinae

Ars Centurion

Smack-Fu Master, in training

Smack-Fu Master, in training

Smack-Fu Master, in training

Ars Centurion

Ars Centurion

Smack-Fu Master, in training

Ars Tribunus Militum

Ars Scholae Palatinae

Ars Scholae Palatinae

Smack-Fu Master, in training

Ars Centurion

Smack-Fu Master, in training

Ars Centurion

Wise, Aged Ars Veteran

Wise, Aged Ars Veteran

Ars Scholae Palatinae

Seniorius Lurkius

Ars Centurion

Wise, Aged Ars Veteran

Ars Scholae Palatinae

Ars Scholae Palatinae

Smack-Fu Master, in training

Ars Scholae Palatinae

Seniorius Lurkius

Ars Scholae Palatinae

Ars Centurion

Ars Scholae Palatinae