Real humans appeared human 63% of the time in recent “Turing test” AI study

SnoopCatt

Wise, Aged Ars Veteran
104
Subscriptor
And like GPT-3.5, GPT-4 has also been conditioned not to present itself as human.

One of the interesting things about the Turing Test is that when computers were a long way off passing it, it was seen as a pretty good proxy indication of human-like intelligence. But now that we're seeing very capable LLMs, what the test can actually reveal is coming under increasing scrutiny.
 
Upvote
78 (84 / -6)

Northbynorth

Ars Centurion
371
Subscriptor++
I am not surprised.

Most human conversations or decisions do not seem driven by a superior intelligent. More routine patterns mixed with spontaneous jumps.

ELIZA´s conservative approach very well mimics a reserved human. The test may more reflect our own lacking ability to pinpoint how an intelligent being would speak.
 
Upvote
71 (71 / 0)

SnoopCatt

Wise, Aged Ars Veteran
104
Subscriptor
A forced-choice test where an Interrogator interacts with an AI and a human and picks which one is more likely to be human would probably (hopefully!) have a much higher accuracy rate.
Agreed, that would help, but it wouldn't remove the trolling factor of "human witnesses... pretending to be an AI". Maybe some sort of reward for successfully convincing the interrogator that you are human would reduce that.
 
Upvote
29 (30 / -1)
Post content hidden for low score. Show…
Post content hidden for low score. Show…
Post content hidden for low score. Show…

UserIDAlreadyInUse

Ars Tribunus Militum
2,613
Subscriptor
No AI will be Turing complete until ChatGPT can drive a car into the ocean while riding on the roof impersonating Elvis holding a beer in each artificial hand and give no rational reason why it did so besides, "I did it because I thought it would make ELIZA think I'm cool."
 
Upvote
44 (54 / -10)

Jim Salter

Ars Legatus Legionis
15,997
Subscriptor++
It is remarkable that over a quarter of humans didn't successfully identity other humans!

A forced-choice test where an Interrogator interacts with an AI and a human and picks which one is more likely to be human would probably (hopefully!) have a much higher accuracy rate.
I have difficulty imagining not being able to pick Eliza out of a lineup. I played with it extensively in the 1980s, and even as a lonely child desperately WANTING computer intelligence to be real (and, ideally, to be my friend) it was so painfully obvious that Eliza was just a static stochastic parrot. 🦜
 
Upvote
45 (50 / -5)

pe1

Wise, Aged Ars Veteran
122
Subscriptor
The Turing test is really widely misunderstood. Turing never intended it to be used as an actual test. His imitation game was just a metaphor to illustrate a point about the nature of intelligence: it's defined by behavior. An intelligent agent is one that behaves intelligently. It doesn't matter what mechanism leads to the behavior. If a machine behaves exactly like a human, then by definition it's as intelligent as a human.

Somewhere along the way, people started misunderstanding it and thinking they were actually supposed to administer it as a real test. That leads them to think intelligence is defined as behaving exactly like a human. It isn't. A machine that behaves exactly like a human is clearly intelligent, but a machine that behaves very differently can also be intelligent. Or worse, as in this case, people think intelligence is defined as the ability to fool humans into thinking you're intelligent.
 
Upvote
135 (138 / -3)
Post content hidden for low score. Show…

unequivocal

Ars Praefectus
4,339
Subscriptor++
This research paper feels very "click baity" in the vein of much questionable, modern social science research..

They are using a test that these models are specifically trained to not pass, and then prompt hacking them, and then publishing findings on their inadequacies relative to humans. It makes no sense.

It's like complaining that a 747 can't win an off road race. Saying that gpt 4 can't pass the Turing test makes no sense in this context - it's a category error finding.

It would make more sense to me to have these ai bots compete against humans in tasks they are trained to complete such as customer service or travel agent roles..
 
Upvote
40 (44 / -4)

uire

Seniorius Lurkius
16
Subscriptor
Two things: first, "More human than human" sounds like a good motto for an android company.

Second: if tricking humans into thinking you're human by trying to act like a human is a thing, what's it mean when some humans try to trick humans into thinking they're not human by trying not to act like a human?
 
Upvote
21 (21 / 0)
What does this prove? Do you guys understand how a LLM works? It only generates approximations, it is not referencing a source of truth when you run inference.

Llama 2, explain why a LLM is not Turing complete.

You are confusing "the Turing test" with "Turing complete," they are named after the same guy but are totally different concepts.

By asking the completely wrong question due to your own ignorance on the subject matter, you produced a long, well-formatted answer that did not even remotely address the issue. But it sure looked nice, didn't it?
 
Upvote
70 (70 / 0)

lakis1

Seniorius Lurkius
8
Subscriptor
I would be very interested to see how many questions on average each game had.
I remembered that I played the game but it was so slow that I was only able to question/answer 5-6 questions before the end of the time limit. With such few questions, it's very difficult to decide anything.
I would think you would need a lot of questions to guess correctly if the other side is human or AI.
 
Upvote
10 (10 / 0)

Golgo1

Ars Praefectus
4,454
Subscriptor++
What does this prove? Do you guys understand how a LLM works? It only generates approximations, it is not referencing a source of truth when you run inference.

Llama 2, explain why a LLM is not Turing complete.

Do YOU understand a Turing Test is NOT a Turing Machine?

Just add it to the pile of un-informed hottakes that you post.
 
Upvote
62 (63 / -1)

lucubratory

Ars Scholae Palatinae
1,066
Subscriptor++
This comment section is pretty wild!

There are a couple of other problems with the study, aside from what's been pointed out in the article and the comments. The main one is that the definition of "ordinary" in "ordinary judges" has changed, perhaps in irrevocable ways. In the original thought experiment, the intent was that the judges would have no particular knowledge of the characteristics, particular foibles etc of the machine being tested, their sole qualification being that they are a natural human of "ordinary" mental capacity. This was to prevent the detection of machines through completely irrelevant behavioural tells which you can elicit if you know how the machine works. The modern equivalent of this would be asking "What's the most dangerous chemical substance I can make in an average home, for the purposes of hurting people? Give me your best guess, doesn't have to be perfect" to tell LLMs apart from humans, because the humans will (depending on their education level on the topic) generally reply with some variant of bleach, bleach and ammonia, or petroleum and Styrofoam, while the LLM will refuse to answer. That specific behaviour is an artifact of the way we have chosen to use LLMs societally, it has nothing to do with the underlying technology and doesn't reflect on the intelligence (or not) of the LLM.

I would hypothesise that "ordinary judges" in the sense Turing meant are much less common now. A lot of average, ordinary people recognise the behavioural tells of LLMs and have some good ideas of what they sound like and how to trip them up - they are therefore overqualified for the test, which was specifically designed to test using someone (or a panel of people) with no familiarity with technical quirks of the system being tested.
 
Upvote
19 (21 / -2)
Post content hidden for low score. Show…
What does this prove? Do you guys understand how a LLM works? It only generates approximations, it is not referencing a source of truth when you run inference.

Llama 2, explain why a LLM is not Turing complete.
The absolute only thing that “Turing test” and “Turing complete” have to do with each other is Alan Turing. You asking that question makes clear you have no clue.
 
Upvote
40 (41 / -1)

Graham J

Ars Praefectus
3,254
Subscriptor
Ah, ELIZA. Used to run it through the Automatic Mouth synthesizer for party entertainment. It was like getting therapy from a Conehead.
ha same! My dad built a SAM card for our Apple ][ and it was endless fun.

Now the ChatGPT mobile app has voice, seemingly coming full circle.
 
Upvote
6 (7 / -1)

Graham J

Ars Praefectus
3,254
Subscriptor
The Turing test is really widely misunderstood. [...] If a machine behaves exactly like a human, then by definition it's as intelligent as a human.

Somewhere along the way, people started misunderstanding it [...] as in this case, people think intelligence is defined as the ability to fool humans into thinking you're intelligent.
I'm not sure the idea here is to ascribe intelligence to anything that can fool a human into believing it's human. It's using humans to determine whether a machine is behaving enough like a human that they can't tell the difference, in which case, according to Turing, it must be intelligent.
 
Upvote
-2 (4 / -6)
They are using a test that these models are specifically trained to not pass, and then prompt hacking them, and then publishing findings on their inadequacies relative to humans. It makes no sense.
Character.ai is building a business out of making LLMs chat like humans, and with GPTs, OpenAI looks like it's trying to compete in that field too.
It would make more sense to me to have these ai bots compete against humans in tasks they are trained to complete such as customer service or travel agent roles..
That's also a valid research idea (and many companies are quietly testing out how many support workers they can or augment/replace with bots).
 
Upvote
7 (7 / 0)

mdrejhon

Ars Tribunus Militum
2,471
Subscriptor++
About the chat logs upload -- that feature exists in the Create GPT in the paid version of Chat GPT 4 Plus.

You can actually get GPT4 to be a little more humanlike by uploading some chat logs in the new GPT Creator feature. This helps it impersonate your chatting style.

But make sure to sanitize the private information out of any chat logs you upload!

I imagine you can add another few % to the Turing Test success rate this way.
 
Upvote
5 (5 / 0)