Real humans appeared human 63% of the time in recent “Turing test” AI study

Dmytry

Ars Tribunus Angusticlavius
9,402
Bottled knowledge it may (it's a good description of current models). It is currently being iterated to becoming more capable of creating more original outputs than earlier AI's. Example: An original joke on URL being Unusually Rowdy Lemurs, something that doesn't even exist on Google via a quoted search.
A simple thought experiment.

A cache of a billion jokes is discovered, and a sealed letter describing provenance: the jokes either originate from humans, or they originate from any version of ChatGPT.

You may not care much about that letter. The jokes are going to be funny, or not funny, regardless of what the letter says. Burn the letter, it makes little difference.

OpenAI, or any other AI company, in fact would care a great deal about the contents of that letter. If the letter certifies that jokes are human made, they would add jokes to the training dataset. If the letter certifies that jokes are AI generated, they would not. Even as they may not be able to tell it by the jokes themselves, they actually, in a very practical sense, need to know what texts are AI generated and what texts are human written.

The way these models operate is that even seemingly original outputs are harmful to be fed back to training.

edit:
With regards to Sam Altman and recent bwohaha, I think their problem was that they believe they will create AGI with current approach, and he has more tempered expectations. E.g. he was saying this on a recent podcast:

SAM ALTMAN: I don’t think it needs that. But I wouldn’t say any of this stuff with certainty, like we’re deep into the unknown here. For me, a system that cannot go significantly add to the sum total of scientific knowledge we have access to, kind of discover, invent, whatever you want to call it, new fundamental science, is not a super intelligence. And to do that really well, I think we will need to expand on the GPT paradigm in pretty important ways that we’re still missing ideas for. But I don’t know what those ideas are. We’re trying to find them.

This to me displays his, everyday, practical understanding that there's a huge practical difference between humans writing another internet worth of text for his AI to train on (that would be great and improve the AI), and the current AI generating another internet worth of text for next AI to train on (that would not add anything the original internet worth of training text didn't have).

Meanwhile members of the board who tried to out him fervently believe that this is nearing an actual AGI.
 
Last edited:
Upvote
0 (0 / 0)
I'm curious how well chatGPT could identify the humans as an interrogator. has anyone tried any studies like that?

I imagine it could be the same basic setup, but prompt GPT at the start with "you are trying to determine if the participant after this prompt is a human or not. you have up to 5 minutes. what is your first message to them?"
 
Upvote
0 (0 / 0)

mdrejhon

Ars Tribunus Militum
2,471
Subscriptor++
LLMs only transform text. Your prompt is not an instruction you are giving to the model, it is just a vector that the LLM is trying to find the closest match in its training data.

Stop trying to bullshit all of us with this nonsense. This was a bad test and you all should be ashamed of your collective poor grasp of how LLMs work. What a disgrace.
True, but.... we can't be Issac Newton in an era of Einstein.

At OpenAI, LLMs are increasingly only one portion of an increasingly exploding AI algorithm.

Here's why:
  • 5 years ago, we couldn't really naturally chat to an AI.
  • 3 years ago, AI wasn't even helpful enough to do much (e.g. helping with code tutoring)
  • 2 years ago, we couldn't generate usable images.
  • GPT-4 now uses multiple metaphorical "brain centers" (multiple different sets of parametric groups that interacts with each other), as an architecture modification that created a major upgrade to LLMs. Observe that GPT-4 Turbo was shrunk to the same size as GPT-3.5 non-Turbo (total parametric count), yet the GPT-4 multi-center system actually produced a noticeable "intelligence"(sic) upgrade over GPT-3.5
  • The November edition of GPT4-Turbo (paid verson) now has a much larger short term memory (500KB of ASCII text, compressed to 128K token), and they're working on medium term memories (model tuning systems) that is between the hardcoded LLM and the short-term token memory. So the multitiered memory systems are also coming.
  • Now Q* is able to reason/think/plan, there's a new LLM+algorithm architecture, that when allowed to continuously execute, now resembles a continuously thinking brain better than ever before, and can accomplish grade school math without just autocompleting the numbers.
One can now have a multihours long conversation with the November paid edition of GPT4-Turbo, and it still remember lots of details earlier in the chat, unlike the earlier pre-November GPT-4 or the free GPT-3.5

The improvements to the LLM architectures have slowly upgraded it a bit closer and closer to AGI goals. Not quite there, but every step along the way.

So, admittedly (when we're talking about emerging AGIs like what OpenAI or Google is working on) makes the now-outdated classical synonymic "LLM"="AI" an infantile admonishment at least for these emerging proto-AGIs.

This is because there's additional things happening. Yes, LLMs are involved in the AI "algorithm", but LLMs is just increasingly one component of a ginormously fast-growing AI "algorithm" ("brain")(sic). Parrotting knowledge is also a skill human has too (in addition to thinking/reasoning/planning/etc). But, clearly AGI has to go far beyond that. That's why the smart AI companies are modifying the whole AI workflow (the "algorithm" or "brain" or what you prefer to call it) and focussing on that, rather than focussing on just classical LLMs. That's obvious.

See below post. There is an apparent breakthrough that allows LLMs to iterate (think/plan/reason) more native.

LLMs (at OpenAI) are but just one mechanism of a now very giant AI "algorithm"(sic) that involves a giant number of things that helps emerging AGIs function. And they are managing to reproduce intelligence upgrades by tweaking all of this so shockingly rapidly iteratively.
 
Last edited:
Upvote
0 (0 / 0)

mdrejhon

Ars Tribunus Militum
2,471
Subscriptor++
edit:
With regards to Sam Altman and recent bwohaha, I think their problem was that they believe they will create AGI with current approach, and he has more tempered expectations. E.g. he was saying this on a recent podcast:

SAM ALTMAN: I don’t think it needs that. But I wouldn’t say any of this stuff with certainty, like we’re deep into the unknown here. For me, a system that cannot go significantly add to the sum total of scientific knowledge we have access to, kind of discover, invent, whatever you want to call it, new fundamental science, is not a super intelligence. And to do that really well, I think we will need to expand on the GPT paradigm in pretty important ways that we’re still missing ideas for. But I don’t know what those ideas are. We’re trying to find them.

This to me displays his, everyday, practical understanding that there's a huge practical difference between humans writing another internet worth of text for his AI to train on (that would be great and improve the AI), and the current AI generating another internet worth of text for next AI to train on (that would not add anything the original internet worth of training text didn't have).

Meanwhile members of the board who tried to out him fervently believe that this is nearing an actual AGI.
I just want to point out a time error in your post:

You confused the time windows (April podcast versus recent breakthrough + November boardroom).

That's an April 2023 podcast -- far before their recent massive in-house AI breakthrough, whose existence was leaked in November [Google, MANY sources] they led to Q* that caused the fracas. The unreleased brand new AI architecture is apparently able to do thinking/reason/plan, in a new AI architecture alluded to the earlier podcast. Q* is the codename for some kind sequel to GPT-4.

"We're trying to find them" from April podcast = They found at least one line item after April and before November.

Something so big, that is what caused 95% of the company to threaten to quit when Sam Altman got fired. Fired because the board was informed about this seriously huge new AI architecture upgrade. The previous board was spooked by how scary an upgrade that AI was, and the board's mission was to protect humankind from AI dangers.

Sam....fired....because of a massive AI intelligence upgrade.

It's because of the boardroom of OpenAI (parent company) is actually "Protect Humankind" and not "Protect Profits".

A non-profit holding company (with boardroom) was trying to control the for-profit portion of OpenAI. It's an unusual company structure they have.

1701737214753.png

Yes, OpenAI's webpage says that the non-profit OpenAI charity owns a for-profit OpenAI that is the subsidary with the breakthrough that scared the boardroom. Even the word "subsidary" [OpenAI] is there.

The AI architecture upgrade is arguably no ASI, but it extends LLM into a new architecture that includes thinking/reasoning/planning arena, but the "Protect Humankind" boardroom got worried by alarmed staff and fired Sam [Google].

Despite this, the alarmed staff did not expect the board to take such drastic action; and still stayed loyal -- almost the entire OpenAI company was prepared to quit after Sam's firing. Just imagine how exciting the AI intelligence architecture upgrade breakthrough was for the staff.

Still a massive intelligence upgrade, nontheless. While presses is calling it a model, it's actually a new architecture that allegedly modifies the model architecture to includes more active thinking mechanisms, rather than external iteratives / only classic surge-executes / etc. It's actually able to more continually think/plan/reason (when the model+algorithm is allowed to execute continuously), and successfully does grade school math without classical autocomplete.

This is newer information that came to light, that wasn't known by the press earlier, and now all the same press has been publishing this new information.

There's a lot of speculation in the FUD, but it's pretty clear some modicum of breakthrough was involved in the conflict between the non-profit OpenAI non-profit charity parent organization ("Protect Humankind") and the OpenAI for-profit subsidary company ("Improve AI Scary Fast") that led to Sam's firing.

Now, do you understand better how big upgrade the AI architecture it was, for the firing to have happened, YET for the whole company to stay loyal because what the employees are seeing, are so exciting to them.

As I alluded to, we can't afford to be Issac Newton in an era of Albert Einstein. As the post right before this one said, it is no longer true that "AI"="LLM" is synonym anymore. LLM is only one component now, with a rapidly expanding algorithm that tries to copycat the brain (e.g. GPT-4 "multiple brain center" approach by breaking down parameteric groups, etc) and other algorithms like that. There's additional algorithms involved at the big AI companies now, including whatever OpenAI cooked up for Q*. For those stuck on classical LLMs, the left field stuff is very enlightening.

In addition to the Google-Fu in the above links, I suggest this Google-Fu too: Sam Altman fired because of Q* [Google].
 
Last edited:
Upvote
-2 (0 / -2)

Doug K

Smack-Fu Master, in training
70
Subscriptor
no-one ever reads the original Turing paper. Bruce Sterling did, and pointed out that the Turing test as originally written, consists of a human who is trying to determine which of A and B is a woman. One of the two is a woman, the other is the machine pretending to be a woman. This is subtler than the popular version of the test which just uses humans. Perhaps the turingtest.live could retry with the original configuration ?
 
Upvote
0 (0 / 0)
You’ve got a little boy. He shows you his butterfly collection plus the killing jar.
Indeed. ELIZA was very easy to force into a loop. If you kept changing the subject, it could seem more realistic, but was very simple to derail once you dug more deeply and asked more-direct questions.
 
Upvote
0 (0 / 0)
Honestly, misspelling a word or multiple words would probably be a simple way to prove you are human. Or perform some basic math/logic problems.
The bots were told to make some spelling and grammar errors. And looking at the results of the study, the most common reason people realized they were talking to an AI was that it was "too informal", oops.

Also it looks like they failed to give the AIs a unique personality?

Anyway, it's a cute study...

But it sounds like the AIs didn't have the best possible prompts.
 
Last edited:
Upvote
0 (0 / 0)
This comment section is pretty wild!

There are a couple of other problems with the study, aside from what's been pointed out in the article and the comments. The main one is that the definition of "ordinary" in "ordinary judges" has changed, perhaps in irrevocable ways.
Yes, among many other failings the was study like "our results might be biased by male 20-somethings, who spend too much time on the internet, with a post-graduate education, studying LLMs, who were intentionally trolling each other..."
 
Upvote
1 (1 / 0)