Real humans appeared human 63% of the time in recent “Turing test” AI study

Steve austin · Saturday at 6:06 PM

earlyberd said:
If you have a better explanation, why do you not provide it?
And complexity
Do you understand the difference? Is there a difference?

I don't get it. You claim that I don't understand, but how am I wrong? Are you going to explain what it means to be Turing complete in a way that is truthful to your experience and knowledge?

Did the LLM get something wrong? If so, can you explain the reasoning for the error?

I'm sick of the bullshit. Be concise in your explanations, don't just tell me I'm stupid. That's not an arguable point, that's just your opinion.

Edit: You know what, I'll even provide a source:

https://cs.stackexchange.com/questi...to-do-with-turing-completeness-or-turing-mach

In other words, in order to pass a Turing test the machine needs to first be Turing complete.

Get it?

Your response?

The Stackexhange thread you link says there is no relationship. Still…

Turing complete is a description applied to a system that can simulate a Turing machine, which is a formalism developed by Alan Turing for studying computability and complexity theories (which are formal mathematics/computer science fields). It is related to Church’s lambda calculus. As it turns out, all modern digital computers are (aside from finiteness) Turing complete.

The Turing test is an informal “game” that Alan Turing proposed as a way to determine whether a machine is acting intelligently. It has nothing to do with actual machines, but rather with a theory or mind or intelligence - it was definitely not a formal test.

These are completely unrelated. In the grand scheme of things, the former is far more important than the latter. Feel free to check Wikipedia for more info on both - that’s a lot more solid than looking at Stackexchange for that sort of info.

For whatever it’s worth, Alan Turing was a giant in numerous fields and his name is attached to many things that were often unrelated (other than generally having a connection to mathematics or computer science). Again, feel free to check Wikipedia’s article on Turing.

TheShark · Saturday at 6:37 PM

pe1 said:
The Turing test is really widely misunderstood. Turing never intended it to be used as an actual test. His imitation game was just a metaphor to illustrate a point about the nature of intelligence: it's defined by behavior. An intelligent agent is one that behaves intelligently. It doesn't matter what mechanism leads to the behavior. If a machine behaves exactly like a human, then by definition it's as intelligent as a human.

Somewhere along the way, people started misunderstanding it and thinking they were actually supposed to administer it as a real test. That leads them to think intelligence is defined as behaving exactly like a human. It isn't. A machine that behaves exactly like a human is clearly intelligent, but a machine that behaves very differently can also be intelligent. Or worse, as in this case, people think intelligence is defined as the ability to fool humans into thinking you're intelligent.

Who says you can't make it an actual test? Next you're going to tell me I should let my cat out of the box he's in!!!

jimlux · Saturday at 7:42 PM

TheShark said:
Who says you can't make it an actual test? Next you're going to tell me I should let my cat out of the box he's in!!!

But is there really a cat in the box? And is it able to get out of the box, or is it dead?

Dmytry · Saturday at 8:53 PM

In an unrelated study from November (Miller, et al), researchers found that people thought AI-generated images of humans looked more real than actual humans.

That study had a huge (and deliberate) flaw introduced into it - the "actual humans" were selected from the generator's training dataset using another classification AI - for each AI image they picked a real image that the classifier placed closest to the AI image.

Which is bizarre, and they didn't publish the result for the obvious original version of the experiment where the actual humans would be selected on random from the training dataset. Neither a justification nor a measure of the effect of this bizarre procedure was given.

As typical they used images stolen off the social media, i.e. of unknown provenance rather than non-manipulated images taken in a standardized setting. It could just as well be that the training dataset included some heavily retouched images that were more frequently chosen as "matches" to the AI images.

I would imagine that if you used some writing style classification AI to pick "witnesses" who resembled ChatGPT the most, you could end up selecting the above mentioned trolls who deliberately pretend to be an AI.

graylshaped · Saturday at 10:08 PM

Dmytry said:
That study had a huge (and deliberate) flaw introduced into it - the "actual humans" were selected from the generator's training dataset using another classification AI - for each AI image they picked a real image that the classifier placed closest to the AI image.

Which is bizarre, and they didn't publish the result for the obvious original version of the experiment where the actual humans would be selected on random from the training dataset. Neither a justification nor a measure of the effect of this bizarre procedure was given.

As typical they used images stolen off the social media, i.e. of unknown provenance rather than non-manipulated images taken in a standardized setting. It could just as well be that the training dataset included some heavily retouched images that were more frequently chosen as "matches" to the AI images.

I would imagine that if you used some writing style classification AI to pick "witnesses" who resembled ChatGPT the most, you could end up selecting the above mentioned trolls who deliberately pretend to be an AI.

Your post makes little sense, though I think I catch your drift, with which I tend to agree. Transparency on training sources would be a good thing.

PopeBenedict · Saturday at 10:34 PM

If GPT-4 was fine-tuned to beat the Turing test, we might see very different results and that would be a much more significant result.

Dmytry · Saturday at 10:57 PM

graylshaped said:
Your post makes little sense, though I think I catch your drift, with which I tend to agree. Transparency on training sources would be a good thing.

It is rather hard to explain what was done in the [Miller, et al] paper, because what they done makes basically no sense.

So, the face dataset used in the paper comes from another paper, where an AI was trained on a set of real faces, then a set of faces was synthesized using the AI, then rather than having humans rank AI faces versus randomly selected real faces, something utterly bizarre was done:

For each synthesized face, we collected a matching real face (in terms of gender, age, race, and overall appearance) from the underlying face database used in the StyleGAN2 learning stage. A standard convolutional neural network descriptor (15) was used to extract a low-dimensional, perceptually meaningful (16) representation of each synthetic face. The extracted representation for each synthetic face—a 4,096-D real-valued vector —was compared with all other facial representations in the data set of 70,000 real faces to find the most similar face. The real face with representation with minimal Euclidean distance to , and satisfying our qualitative selection criteria, is selected as the matching face. As with the synthetic faces, to reduce extraneous cues, we only included images 1) with a mostly uniform background, 2) with unobstructed faces (e.g., no hats or hands in front of face), 3) in focus and high resolution, and 4) with no obvious writing or logos on clothing. We visually inspected up to 50 of the best matched faces and selected the one that met the above criteria and was also matched in terms of overall face position, posture, and expression, and presence of glasses and jewelry. Shown in Fig. 4 are representative examples of these matched real and synthetic faces.

Emphasis mine.

Basically, real faces used for comparison were cherry picked from the dataset of 70 000 real faces, for "similarity" to AI faces.

No attempt to quantify the effect of that selection is reported. You would think that before going through all the trouble of cherry picking matched faces, one got to try to compare AI generated faces to real faces, non cherry picked but chosen at random (the AI should produce faces with a matching distribution to the training dataset anyway, so there shouldn't be any need to select matching faces).

edit: Also, they justify it as "reducing extraneous clues".

Ultimately the main issue is that AI faces and real faces are not treated equally. AI faces are merely selected a bit to eliminate ones with background and glasses or jewellery. Real faces are then extremely cherry picked to match those AI faces, first with an automatic method, then worse yet, manually.

This face hyperrealism study was absolute garbage. A non garbage way of doing it would be to generate a number of AI faces, compare them to randomly selected real faces, and note the outcome (e.g. the AI faces are more likely to be identified as AI faces). Then you can further form a hypothesis that this occurs for example because of backgrounds, and you can perform another study, having run both sets of faces through identical background-blanking process (just fill all backgrounds with uniform color). That would be actual science - you would find out if AI images are actually hyperrealistic or not, then you would find out if maybe the actual faces in the images are hyperrealistic but backgrounds are not, etc.

You can also classify images into those with jewellery and those without and see if "realism" differs.

Fuzzypiggy · Sunday at 12:34 AM

Dapd Funk said:
For brilliant mind, Turing seems confused on the topic of machines thinking. Read any of his papers on the subject, e.g. in Copeland, The Essential Alan Turing. One almost wonders if he is pulling everyone’s leg in these papers.

I don't think he was confused at all, more that based on his very limited exposure to computers he designed, the test was more a theoretical "conversation starter" about how far computers will take us. He wasn't a CS student, he was a mathematician who helped kick start the CS field. So for his time he was amazing but his time came and went, so now we we must hold him to the time he was living in and the level of technology he was able to imagine, design and build.

Most good quality CS students could run rings around Turing now 70 years later, doesn't make him or his work any less valid, but it's like comparing Hogarth's sketches to something like Pixar's work, similar in nature but lightyears apart ( no pun intended ).

Me Myself And I1 · Sunday at 1:24 AM

Appearance of an uncooperative human vs a prolific fabulator. Humanity 2023.

The 'interpretation by psychology' hints at bias towards defending modern tech which failed to live up to the hype while the results could be easily interpreted as 'random'.

Also, I find this statement (used in a caption at the time of writing), puzzling at best.
"Ancient rules-based ELIZA outperformed GPT-3.5."

GPT is rules-based. There's more rules (and sure, there's data the model can access) but there are pre-defined rules, pre-defined algorithms. GPT is, in its fundamentals, identical to ELIZA.

The only difference is the computational power that can be used (and wasted) for much faster processing (application of pre-defined rules).

Dmytry · Sunday at 1:31 AM

Fuzzypiggy said:
I don't think he was confused at all, more that based on his very limited exposure to computers he designed, the test was more a theoretical "conversation starter" about how far computers will take us.

I think it is simply that at the time, several enabling factors of modern AI were utterly inconceivable to anyone, not even Turing.

One is processing nearly all written texts to create said AI (and the neural network greater in size yet than the entirety of the training data). Other is a possibility of “interpolating” between those texts, in some high dimensional space defined such that interpolation yields valid-ish texts (unlike e.g. simply interpolating every letter which just yields gibberish).

Those two capabilities allowed the creation, in practice, of the “Chinese Room” thought experiment - not the interpretation with a very complex set of rules (where we could say the rules embody an intelligence), but one with a very large library storing an answer for every prompt.

Of course, it is still physically impossible to losslessly store such a library, but the grand breakthrough of modern AI is that a lossy representation can be stored.

The interpolation in question can be indistinguishable from intelligence for a human (who would never be able to read let alone memorize the entirety of the training dataset). It is representing real intelligence - that went into creation of the writings that the AI was trained on.

The other test exists, of course. AI training benefits from addition of more human-written text, but would not benefit from addition of more AI-generated text- that is why they can’t simply create more training data for GPT5 using GPT4. (They can but it would be not as beneficial as another internet worth of human written text).

Northbynorth · Sunday at 2:34 AM

Remarkable that we still use a 70 year old "thought experiment" as some sort of test to prove intelligence of AIs.

As far as I remember, the last 50 years, people have been impressed how "humanlike" computers can behave. I would turn it around and say it is more a question of how routine based we humans to a large extent behave. We excel in doing repetitive predictable tasks, be it at work, farming in computer games or engaged in light conversations. It is not very hard to make a decent simulation of a human acting in these settings, and it has been done for several decades.

Defining intelligence is much harder than that and we are still not sure how it best is done. The Turing test is certainly not the way to do it.

webGLdude · Sunday at 2:44 AM

Steve austin said:
The Stackexhange thread you link says there is no relationship. Still…

Turing complete is a description applied to a system that can simulate a Turing machine, which is a formalism developed by Alan Turing for studying computability and complexity theories (which are formal mathematics/computer science fields). It is related to Church’s lambda calculus. As it turns out, all modern digital computers are (aside from finiteness) Turing complete.

The Turing test is an informal “game” that Alan Turing proposed as a way to determine whether a machine is acting intelligently. It has nothing to do with actual machines, but rather with a theory or mind or intelligence - it was definitely not a formal test.

These are completely unrelated. In the grand scheme of things, the former is far more important than the latter. Feel free to check Wikipedia for more info on both - that’s a lot more solid than looking at Stackexchange for that sort of info.

For whatever it’s worth, Alan Turing was a giant in numerous fields and his name is attached to many things that were often unrelated (other than generally having a connection to mathematics or computer science). Again, feel free to check Wikipedia’s article on Turing.

Don't feed the troll. Look at their historical comments. You're good faith engaging in dialogue with a character you probably should avoid.

volcano.authors · Sunday at 7:39 AM

RubyPanther said:
If the humans are scoring less than the high 90s that just shows that the test setup is insufficient.

And it may be impossible to set up the test with impartial judges now that there is so much public interest in AI.

While this test is clearly insufficient in the modern age to achieve the intent of Turing, and should be replaced, I would nevertheless like to propose a scoring improvement. Namely, filter the scoring so that you only use the scores of the people who correctly identified the humans.

It should be obvious that when the purpose of the test is to measure the ability of the AI to impersonate a human, if it is realistic for the AI to score "more human than human" then the test itself is a failure. Complete, absolute success would be equality between the groups. Filtering the scores in the way that I propose would achieve this correction.

I am not a behavioral psychologist. I do think the test setup has a deep flaw: the incentives it sets up may encourage humans to throw the test. (Case in point: the paper authors acknowledge trolling.)

What if the test did not reveal the full extent of what was being tested until afterwards? Also, I agree with previous comments suggesting a reward for correctly spotting humans.

ghostcarrot · Sunday at 1:05 PM

[boring retread of stuff everyone already knows cut]

ghostcarrot · Sunday at 1:22 PM

10Nov1775 said:
Yeah, what a bizarre thing to say.

It is equivalent to saying: "The game of Minecraft, and nearly all programming languages, among other things, are indistinguishable from a human being if you hold a conversation with them."

These, of course, all being examples of things which are Turing complete, but of dubious ability at the Turing test. Except insofar as a Turing complete system can technically run any program we might want to Turing test...might take awhile to run GPT 4 in Minecraft, though.

This exchange really shows why it is a bad idea for us to be putting LLMs in everything, or at least in searches. Until this comment section, I didn't know anything about Turing machines or Turing completeness. Because these are complex topics, I still, basically, don't know anything. I could do a google search and find a website willing to explain these topics in more-or-less detail in a sort of end-user-y kind of way, the quality of which will depend on the website. Or I could ask an LLM to produce absolute rubbish but, because I don't know any better, and it is well-formatted, it sounds reasonable. And then I go on my merry way "knowing" something that is wrong, and I'll have no chance to correct, unless I say something stupid to someone who knows better, because it doesn't actually impact my life, and I don't have a framework to gauge the truth-value of the statement. In the mean time, I have the opportunity to infect other people with my mistaken understanding of these concepts, because most of the people I know don't really have a reason to know what Turing machines or Turing completeness are. This is the don't-trust-what-you-read-on-social-media problem on steroids.

dwsolberg · Sunday at 3:18 PM

Pro tip: If you want it to seem human, make it opinionated and irrational.

tom-ba · Sunday at 4:09 PM

Old Bitsmasher said:
Ah, ELIZA. Used to run it through the Automatic Mouth synthesizer for party entertainment. It was like getting therapy from a Conehead.

And yet students took it seriously at MIT. Which prompted Professor Weizenbaum to write Computer Power and Human Reason: From Judgment to Calculation.

tom-ba · Sunday at 4:28 PM

Vidi Vici Veni said:
Google is a thing you know. However just for you, a Turing test is designed to see if a machine can mimic human conversation. Turing Complete refers to an algorithm that can simulate a turing machine - which is a universal computer, not a human.

Any useful computer language is Turing Complete. It's actually kind of hard to find a programming system that isn't Turing Complete, although we know that there is a hierarchy of computing models with increasing power (string grammars, push down automatons, etc.).

tom-ba · Sunday at 4:37 PM

jimlux said:
But is there really a cat in the box? And is it able to get out of the box, or is it dead?

You can't know until you open the box. Until then, the cat is both alive and dead.
---
Erwin Schrödinger actually proposed this experiment to show how ridiculous quantum mechanics seemed. It didn't work out the way he intended.

graylshaped · Sunday at 4:48 PM

tom-ba said:
You can't know until you open the box. Until then, the cat is both alive and dead.
---
Erwin Schrödinger actually proposed this experiment to show how ridiculous quantum mechanics seemed. It didn't work out the way he intended.

Respectfully, I think it worked out exactly the way he intended.

The Sheep Look Up · Sunday at 5:37 PM

uire said:
Two things: first, "More human than human" sounds like a good motto for an android company.

Second: if tricking humans into thinking you're human by trying to act like a human is a thing, what's it mean when some humans try to trick humans into thinking they're not human by trying not to act like a human?

First: your premier model should be named Jigsaw Man.

Second: I'm more concerned when some humans trick humans into thinking they are human via mimicry.

Faceless Man · Sunday at 5:48 PM

jefito said:
Well, Stephen Moore, the voice of Marvin in the original radio and TV series did it pretty well, too...

The OG. Rickman was a good choice, but Moore created the character, and is still the one I hear in my head.

The Sheep Look Up · Sunday at 5:49 PM

Juvba Fnakix said:
Perhaps playing 'too stupid to be a computer' might have worked.

I'm remembering an anecdote from a few years back where the bot that "won" a Turing "competition" (had the highest success rate at being judged human) passed itself off as an 8 year old Hungarian kid with limited English. So yeah, perfectly valid strategy.

Faceless Man · Sunday at 6:47 PM

The Sheep Look Up said:
I'm remembering an anecdote from a few years back where the bot that "won" a Turing "competition" (had the highest success rate at being judged human) passed itself off as an 8 year old Hungarian kid with limited English. So yeah, perfectly valid strategy.

Except that, by the "rules" of most attempts at creating a Turing Test, that is explicitly forbidden.

Not the only reason that particular test was questionable.

That said, I very much don't think that the Turing Test is actually a thing. Turing made a comment, essentially, and people got hung up on it. Never mind it's a completely subjective criterion to judge intelligence (much like all the others, I guess), and creating any kind of formal framework is pretty much impossible.

graylshaped · Sunday at 7:16 PM

For better or worse, I am not sure I appear human 63% of the time.

Dmytry · Sunday at 9:15 PM

The other thing is that we can say that ChatGPT is not human-level intelligent, on simply the grounds that everyone in the field knows (and it has been empirically tested with a wide variety of models and probably with ChatGPT at well), that generating another internet worth of ChatGPT outputs and adding it to the training dataset, would be bad for the results, while somehow obtaining another internet worth of human writings would be valuable. It might even be possible to prove this rigorously with theory alone.

In image AI it is often valuable to augment the training data with rotated copies of real images, but not with AI generated images. A random rotation adds - however trivially - to the sum total of available imagery, that a beautiful image from Stable Diffusion doesn't (it would be total poison for image generation AIs to start consuming AI-generated images).

My thinking on this is that ChatGPT and other large language models represent "bottled" human intelligence.

A taste comparison of bottled beer to beer straight from a brewery, can not constitue a proof that a bottle is true artificial yeast, that glassblowers had created artificial life. Likewise, a chat comparison between some "AI" which was trained on such incredibly large dataset of human writings, can not constitute a proof of intelligence beyond that of humans who wrote said dataset.

The Sheep Look Up · Sunday at 9:20 PM

Faceless Man said:
That said, I very much don't think that the Turing Test is actually a thing.

The Imitation Game is just that -- a game. It's fun to play around with but fundamentally tells you little. You can ID most bots by asking them specific personal questions, claiming postulates that run directly contrary to physical reality, memory / recall tests, and just seeing if they'll object to outright gibberish.

I find the Voight-Kampff test far more comprehensive and accurate at differentiating bots from humans.

MNP · Sunday at 9:43 PM

benwaggoner said:
It is remarkable that over a quarter of humans didn't successfully identity other humans!

A forced-choice test where an Interrogator interacts with an AI and a human and picks which one is more likely to be human would probably (hopefully!) have a much higher accuracy rate.

We have less face to face social interaction than ever, Bowling Alone etc. We are probably stiffer at communication than we were a few generations ago.

earlyberd · Sunday at 11:38 PM

Steve austin said:
The Stackexhange thread you link says there is no relationship. Still…

Turing complete is a description applied to a system that can simulate a Turing machine, which is a formalism developed by Alan Turing for studying computability and complexity theories (which are formal mathematics/computer science fields). It is related to Church’s lambda calculus. As it turns out, all modern digital computers are (aside from finiteness) Turing complete.

The Turing test is an informal “game” that Alan Turing proposed as a way to determine whether a machine is acting intelligently. It has nothing to do with actual machines, but rather with a theory or mind or intelligence - it was definitely not a formal test.

These are completely unrelated. In the grand scheme of things, the former is far more important than the latter. Feel free to check Wikipedia for more info on both - that’s a lot more solid than looking at Stackexchange for that sort of info.

For whatever it’s worth, Alan Turing was a giant in numerous fields and his name is attached to many things that were often unrelated (other than generally having a connection to mathematics or computer science). Again, feel free to check Wikipedia’s article on Turing.

But LLMs are neither Turing complete nor are they intelligent! What are you arguing?

What does the difference between all these Turing-related concepts have to do with the fact that the test is flawed? No LLM will ever pass a Turing test, that's not the point or the design of LLMs.

I am thoroughly convinced none of you have a clue what you're talking about.

I'm going to say it again, LLMs need to be Turing complete in order to pass the Turing test! The reason is because the LLMs have no memory of a conversation, and even when you do store a conversation you cannot guarantee that the LLM will remember what you said or adjust its outputs based on prior conversation context. It's basically akin to a random number generator with some guard rails.

LLMs will always fail this test because they have no memory, no intelligence, and aren't even Turing complete. They can't simulate other Turing machines and can't do basic math. They can't remember what you said to them from a previous prompt without the help of external systems. They transform text, that's it.

LLMs only transform text. Your prompt is not an instruction you are giving to the model, it is just a vector that the LLM is trying to find the closest match in its training data.

Stop trying to bullshit all of us with this nonsense. This was a bad test and you all should be ashamed of your collective poor grasp of how LLMs work. What a disgrace.

mdrejhon · Monday at 12:55 AM

Dmytry said:
The other thing is that we can say that ChatGPT is not human-level intelligent

Right, no, it is not.

That being said, we can see that GPT4-Turbo and GPT3.5 (non-Turbo) has roughly the same number of parameters, yet GPT4-Turbo is more intelligent than GPT3.5 (free ChatGPT). And it does manage to line-item converse on some topics better than the average human of 50-percentile skill on the respective topic. And for very long conversations (especially with the new 128K context memory, about 500KB of text worth of short-term memory).

And you've been hearing about the much more intelligent Q*, the very scare that caused the OpenAI fracas. We can't deny the technological pace has, to put it mildly, been very torrid.

Me Myself And I1 said:
GPT is rules-based. There's more rules (and sure, there's data the model can access) but there are pre-defined rules, pre-defined algorithms. GPT is, in its fundamentals, identical to ELIZA.

If the latest GPT is rules-based (digital rules in logic that's operating a neural network, via linking all the parameters in the digital neural network), then the human brain is also defacto rules-based (analog rules from all the linkages of synapses/neurons in the biological neural network).

Regardless, the net result is still quite shockingly more and more similar, as they keep improving the AIs, as seen from the multiple new versions of GPT (the paid one, not the free one).

Dmytry said:
My thinking on this is that ChatGPT and other large language models represent "bottled" human intelligence.

Bottled knowledge it may (it's a good description of current models). It is currently being iterated to becoming more capable of creating more original outputs than earlier AI's. Example: An original joke on URL being Unusually Rowdy Lemurs, something that doesn't even exist on Google via a quoted search.

Various amounts of bottled knowledge plus some of the improved algorithms I talked about in earlier pages of that other comments section, is going to be able to make a 99% Virtuso AGI (under DeepMind's new AGI definitions) by end of the decade in my knowledge.

AI's aren't being designed to be human-level intelligent anymore -- it is starting to line-item surpass, while still greatly inferior than others. By the time we optimized final skills to equal humans, the existing longtime AI skills will be literally ASI-league.

Q* does grade school math better than a 50%-ile human, from what is reported -- the very thing that caused almost 95% of the company to start preparing to move to Microsoft because Sam Altman got fired. They've got something seriously better of an AI that actually thinks/reasons/plans, already in the wings -- that caused the CEO to be fired. Then 95% of the company was getting ready to quit, to move to Microsoft (who offered them all jobs). Until Sam got rehired. Despite being a controversial figure (and all the copyright hoovering controversies), one can't deny that something seriously smarter has been birthed in the lab.

The path to AGI's are a complex mix of ANI, AGI, ASI rather than the yesteryear definitions. It's going to be more like talking to an alien who's hyperintelligent, moreso than 99th-percentile human on 99% of topics, and only fail on a tiny percent. This isn't going to be human-level intelligent. We're witnessing emerging AGI's that are both simultaneously inferior and superior to humans, but the number of skillsets that transfers over to the superior-side, rapidly increase. And that has been my exact experience with the paid AI's (especially the new ones with the new much larger short-term memories).

Averaging out, we're not really averaging to human-level intelligent, so I agree with you there. But it's more nuanced than that, especially witnessing what is happening with GPT 3.5 versus GPT-4 Turbo versus what we all keep hearing about Q*, the much more intelligent AI that caused the board to fire the founder of OpenAI.

I'm predicting that particular one (Q* and variants) will quickly be ranked as "Competent AGI" rather than "Emerging AGI", but not a "Expert AGI" nor "Virtuso AGI" under the new DeepMind AGI definitions (Page 6 of https://arxiv.org/pdf/2311.02462.pdf ) at least when narrowscoped to skills achievable in disembodied abilities (e.g. things that don't yet require a robot body). And even industry's working on that too.

Too many people make assumptions about how dumb AIs still are, based off free ChatGPT or the older version of paid ChatGPT, when the pace behind the scene has been rather crazy fast, and the November 6th version of GPT4 suddenly had literally something like ~15x to ~20x as much short-term memory ready for much longer conversations or even accurately meeting-minuting a multihours-long meeting successfully where it couldn't before November -- they're really improving the paid version of the AI subscription very fast.

graylshaped · Monday at 1:15 AM

mdrejhon said:
Right, no, it is not.

That being said, we can see that GPT4-Turbo and GPT3.5 (non-Turbo) has roughly the same number of parameters, yet GPT4-Turbo is more intelligent than GPT3.5 (free ChatGPT). And it does manage to line-item converse on some topics better than the average human of 50-percentile skill on the respective topic. And for very long conversations (especially with the new 128K context memory, about 500KB of text worth of short-term memory).

And you've been hearing about the much more intelligent Q*, the very scare that caused the OpenAI fracas. We can't deny the technological pace has, to put it mildly, been very torrid.

Bottled knowledge it may (it's a good description of current models). It is currently being iterated to becoming more capable of creating more original outputs than earlier AI's. Example: An original joke on URL being Unusually Rowdy Lemurs, something that doesn't even exist on Google via a quoted search.

Various amounts of bottled knowledge plus some of the improved algorithms I talked about in earlier pages of that other comments section, is going to be able to make a 99% Virtuso AGI (under DeepMind's new AGI definitions) by end of the decade in my knowledge.

AI's aren't being designed to be human-level intelligent anymore -- it is starting to line-item surpass, while still greatly inferior than others. By the time we optimized final skills to equal humans, the existing longtime AI skills will be literally ASI-league.

Q* does grade school math better than a 50%-ile human, from what is reported -- the very thing that caused almost 95% of the company to start preparing to move to Microsoft because Sam Altman got fired. They've got something seriously better of an AI that actually thinks/reasons/plans, already in the wings -- that caused the CEO to be fired. Then 95% of the company was getting ready to quit, to move to Microsoft (who offered them all jobs). Until Sam got rehired. Despite being a controversial figure (and all the copyright hoovering controversies), one can't deny that something seriously smarter has been birthed in the lab.

The path to AGI's are a complex mix of ANI, AGI, ASI rather than the yesteryear definitions. It's going to be more like talking to an alien who's hyperintelligent, moreso than 99th-percentile human on 99% of topics, and only fail on a tiny percent. This isn't going to be human-level intelligent. We're witnessing emerging AGI's that are both simultaneously inferior and superior to humans, but the number of skillsets that transfers over to the superior-side, rapidly increase. And that has been my exact experience with the paid AI's (especially the new ones with the new much larger short-term memories).

Averaging out, we're not really averaging to human-level intelligent, so I agree with you there. But it's more nuanced than that, especially witnessing what is happening with GPT 3.5 versus GPT-4 Turbo versus what we all keep hearing about Q*, the much more intelligent AI that caused the board to fire the founder of OpenAI.

I'm predicting that particular one (Q* and variants) will quickly be ranked as "Competent AGI" rather than "Emerging AGI", but not a "Expert AGI" nor "Virtuso AGI" under the new DeepMind AGI definitions (Page 6 of https://arxiv.org/pdf/2311.02462.pdf ) at least when narrowscoped to skills achievable in disembodied abilities (e.g. things that don't yet require a robot body). And even industry's working on that too.

Too many people make assumptions about how dumb AIs still are, based off free ChatGPT or the older version of paid ChatGPT, when the pace behind the scene has been rather crazy fast, and the November 6th version of GPT4 suddenly had literally something like ~15x to ~20x as much short-term memory ready for much longer conversations or even accurately meeting-minuting a multihours-long meeting successfully where it couldn't before November -- they're really improving the paid version of the AI subscription very fast.

I want my kid to learn to do math. Not rely on a chatbot to do it for him. In my era, people railed against using calculators to help. Times evolve. Tools get better, but the human needs to know what the tool is producing.

mdrejhon · Monday at 1:19 AM

graylshaped said:
I want my kid to learn to do math. Not rely on a chatbot to do it for him. In my era, people railed against using calculators to help. Times evolve. Tools get better, but the human needs to know what the tool is producing.

On that item as a Gen X I agree.

The loss of cursive handwriting too...
The loss of ability to read paper maps...
The loss of...

Nontheless, as a deaf individual and as of past special-needs student back in my day, AI has made a massive impact as being one of the best coding tutors (even for those times I don't want to use its code), as well as the best speech-to-text transcriber. I can now do captioned FaceTime thanks to Apple's adding a neural-based Live Caption feature in iOS 16+. Right Tool for Right Job for me.

graylshaped · Monday at 1:32 AM

mdrejhon said:
On that item as a Gen X I agree.

The loss of cursive handwriting too...
The loss of ability to read paper maps...
The loss of...

Nontheless, as a deaf individual and as of past special-needs student back in my day, AI has made a massive impact as being one of the best coding tutors (even for those times I don't want to use its code), as well as the best speech-to-text transcriber. I can now do captioned FaceTime thanks to Apple's adding a neural-based Live Caption feature in iOS 16+. Right Tool for Right Job for me.

There is no AI. There are decent LLMs. My kid's mom cannot read a map, and has fallen astray when on an international trip. I do not think cursive is on my son's agenda as he moves forward in life. I believe they now call it some version of italics.

ludovic.kuty · Monday at 3:37 AM

You might want to read the CoRecursive episode 78 named "The History and Mystery Of Eliza" to have additional infos about ELIZA.

Northbynorth · Monday at 6:05 AM

The limitation with most AI intelligence tests is that they measure a very limited skill set. With that method humans would just in recent years passed termites in intelligence. Since their built structures rivaled things we humans could achieve.

The great thing with our inherent capability is our adaptability. That is what sets us apart. Yes, we can build machines that do most tasks well enough, but not a single one that can be trained to do more than a tiny slice. The day ChatGPT can travel without help in the jungle and bake a cinnamon bun, my hat is off.

jlinkels · Monday at 8:39 AM

Why does the graph show a success rate of approximately 20% for Eliza while the text states 27%. I am not nitpicking, just questioning the reliability of the data and statements.

walter_white · Monday at 9:53 AM

Every day I watch TV and I see people that I would never identify as human based on what they say if I didn't see and hear the words coming out of their mouths. Does the Turing test take into account the context of the social and political environment in which it is being conducted? What about the social and political leanings and even the intelligence and gullibility of the human test participant? And what about the social and political leanings of the data used to train the AI?

General questions about training an AI: how does early exposure to training data affect subsequent training? If you were to train an AI starting with news stories from Fox, Alex Jones podcasts, and Mein Kampf and then went to Shakespeare, Isaac Newton, and Rachel Maddow, would you get a different AI than if you did the reverse?

Sjoerd Verweij · Monday at 12:52 PM

earlyberd said:
I'm going to say it again, LLMs need to be Turing complete in order to pass the Turing test!

If over a dozen people independently point out that you are wrong, and why, a sane person would think "hmm, maybe it's me".

This person, allegedly, writes LLM policy for his/her company. Sleep tight!

Tim van der Leeuw · Monday at 3:05 PM

10Nov1775 said:
Yeah, what a bizarre thing to say.

It is equivalent to saying: "The game of Minecraft, and nearly all programming languages, among other things, are indistinguishable from a human being if you hold a conversation with them."

These, of course, all being examples of things which are Turing complete, but of dubious ability at the Turing test. Except insofar as a Turing complete system can technically run any program we might want to Turing test...might take awhile to run GPT 4 in Minecraft, though.

I read an article a few years ago claiming the Rust type system is Turing-complete.

However last time I checked the Rust type system did not yet pass a Turing-test and was not (yet) sentient -- and neither is the Rust compiler.

Tim van der Leeuw · Monday at 3:07 PM

Sjoerd Verweij said:
If over a dozen people independently point out that you are wrong, and why, a sane person would think "hmm, maybe it's me".

This person, allegedly, writes LLM policy for his/her company. Sleep tight!

But instead of thinking "maybe it's me", the earlyberd goes on the offensive... making me suspect it's just a troll.

Real humans appeared human 63% of the time in recent “Turing test” AI study

Wise, Aged Ars Veteran

Ars Tribunus Militum

Ars Scholae Palatinae

Ars Tribunus Angusticlavius

Ars Legatus Legionis

Ars Centurion

Ars Tribunus Angusticlavius

Ars Scholae Palatinae

Ars Tribunus Militum

Ars Tribunus Angusticlavius

Ars Centurion

Wise, Aged Ars Veteran

Smack-Fu Master, in training

Ars Centurion

Ars Centurion

Wise, Aged Ars Veteran

Smack-Fu Master, in training

Smack-Fu Master, in training

Smack-Fu Master, in training

Ars Legatus Legionis

Smack-Fu Master, in training

Ars Tribunus Angusticlavius

Smack-Fu Master, in training

Ars Tribunus Angusticlavius

Ars Legatus Legionis

Ars Tribunus Angusticlavius

Smack-Fu Master, in training

Ars Scholae Palatinae

Account Banned

Ars Tribunus Militum

Ars Legatus Legionis

Ars Tribunus Militum

Ars Legatus Legionis

Seniorius Lurkius

Ars Centurion

Smack-Fu Master, in training

Ars Centurion

Ars Praefectus

Ars Centurion

Ars Centurion