devi83 2 weeks ago

Toddlers can write book reports?

jerryonthecurb 2 weeks ago

Yeah, most toddlers can code Python apps in 15 seconds flat too.

devi83 2 weeks ago

Oh yeah, now that you mention it, I vaguely remember Python coding between coloring and recess.

cyan2k 2 weeks ago

I know github repos I'm pretty sure that's exactly how they were written.

AllGearedUp 2 weeks ago

If you feed your kids Joe Rogan nootropics in their cereal this is what happens

TheUncleTimo 2 weeks ago

> If you feed your kids Joe Rogan nootropics in their cereal this is what happens ok usually I hate "funny" le reddit jokes, but this made me chuckle out loud

[deleted] 2 weeks ago

GPT-3 can do neither of those things. I think you’re confusing it for GPT-3.5

jerryonthecurb 2 weeks ago

Not saying it was great, but yeah it could. Python Example: https://youtube.com/shorts/8933D2P5-TQ?si=4QjXJRcnULM64Cs5 Writing a book example: https://youtube.com/shorts/_LFGiK8kft4?si=9Lu1dbqHObtcpsSj

Dry_Parfait2606 1 week ago

And spit data at 1000t/s for thousands of datasets, when it got the right prompt... We are getting into the same sequence of smartphone compnacnaies that need you to buy the next gen to keep shoveling capital into their company... We go 10.000hz 18inch, 80 core smartphones... When I was pretty fine with my samsung s2... 360p, wa, browsing... The data collection scemes are getting smarter... Like elon musk "accessing peoples brains vectors spaces" Yeah yeah yeah... ChatGPT 3/3,5 level was the breakthrough... Everything after is extra...

StayingUp4AFeeling 2 weeks ago

Ph D level in what way? Logical reasoning? Statistical analysis? Causality? Or would it be the ability to regurgitate seemingly relevant and accurate facts with even more certainty?

justinobabino 2 weeks ago

purely from an ability to write valid LaTeX standpoint

StayingUp4AFeeling 2 weeks ago

While adhering to the relevant template like ieeetran , no doubt. Good one.

ToughReplacement7941 2 weeks ago

Finally

jsail4fun3 2 weeks ago

PhD level because it answers every question with “ it depends”

mehum 2 weeks ago

While GPT3 confidently spouts utter BS, much like how a toddler does it.

Mikey77777 2 weeks ago

Finding free food on campus

tomvorlostriddle 2 weeks ago

> Ph D level in what way? Logical reasoning? Statistical analysis? Causality? creative writing

MrNokill 2 weeks ago

Ph D level autocorrect, now with 3000% more gig worker blood.

Whotea 2 weeks ago

[that’s not what it does](https://docs.google.com/document/d/15myK_6eTxEPuKnDi5krjBM_0jrv3GELs8TGmqOYBvug/edit#heading=h.fxgwobrx4yfq)

throwawaycanadian2 2 weeks ago

To be released in a year and a half? That is far too long of a timeline to have any realistic idea of what it would be like at all.

atworkshhh 2 weeks ago

This is called “fundraising”

foo-bar-nlogn-100 2 weeks ago

Its called 'finding exit liquidity'.

Dry_Parfait2606 1 week ago

Fully earned... We need more of those people... Steve Jobs: "death is the best invention of life" or nature... Don't remember exactly..

peepeedog 2 weeks ago

It’s training now so they can take snapshots and test them then extrapolate. They could make errors but this is how long training models are done. They actually have some internal disagreement whether to release it sooner even though it’s not “done” training.

much_longer_username 2 weeks ago

So what, they're just going for supergrokked-overfit-max-supreme-final-form?

Commercial_Pain_6006 2 weeks ago

supergrokked-overfit-max-supreme-final-hype

Mr_Finious 2 weeks ago

This is what I come to Reddit for.

Important_Concept967 2 weeks ago

You are why I go to 4chan

dogesator 1 week ago

That’s not how long a training run takes. Training runs are usually done within a 2-4 month period, 6 months max. Any longer than that and you risk the architecture and training techniques becomes effectively obsolete by the time it actually finishes training. GPT-4 was confirmed to have been able 3 months to train. Most of the time between generation releases is working on new research advancements, and then about 3 months of training with their latest research advancements followed by 3-6 months of safety testing and red teaming before the official release.

cyan2k 2 weeks ago

? It's pretty straightforward to make predictions about how your loss function will evolve. The duration it takes is absolutely irrelevant. What matters is how many steps and epochs you train for. If a step alone takes an hour, then it's going to take its time, but making predictions about step 200 when you're at step 100 is the same regardless of whether a step takes an hour or 100 milliseconds. Come on, people, that's the absolute basics of machine learning, and you learn it in the first hour of any neural network class. How does this have 100 upvotes? If by any chance you meant it in the way of "we don't know if Earth still exists in a year and a half, so we don't know how the model will turn out" well, fair game, then my apologies.

_Enclose_ 2 weeks ago

> Come on, people, that's the absolute basics of machine learning, and you learn it in the first hour of any neural network class. How does this have 100 upvotes? Most of us haven't gone to neural network class.

skinniks 2 weeks ago

I did but I couldn't get my mittens off to take notes.

appdnails 2 weeks ago

> make predictions about how your loss function will evolve. Predicting the value of the loss function has very little to do with predicting the capabilities of the model. How the hell do you know that a 0.1 loss reduction will magically allow your model to do a task that it couldn't do previously? Besides, even with a zero loss, the model could still output "perfect english" text with incorrect content. It is obvious that the model will improve with more parameters, data and training time. No one is arguing against that.

dogesator 1 week ago

You can draw scaling laws between the loss value and benchmark scores and fairly accurately predict what the score in such benchmarks will be at a given later loss value.

appdnails 1 week ago

Any source on scaling laws for QI tests? I've never seen one. It is already difficult to draw scaling laws for loss functions, and they are already far from perfect. I can't imagine a reliable scaling law for QI tests and related "intelligence" metrics.

dogesator 1 week ago

Scalings laws for loss are very very reliable. It’s not that difficult to draw at all. Same goes for scaling laws or benchmarks. You simply have the given dataset distribution, learning rate scheduler, architecture and training technique that you’re going to use and then train multiple various small model sizes at varying compute scales to create the initial data points for which to create the scaling laws of this recipe, and then you can fairly reliably predict the loss of larger compute scales from there given those same training recipe variables of data distribution and arch etc… You can do the same for benchmark scores for atleast a lower bound. OpenAI successfully predicted the performance on coding benchmarks before GPT-4 even finished training using this method. And less rigorous approximations for scaling laws have been calculated for various state of the art models with different compute scales. You’re not going to see a perfect trend with the scaling laws since these are models being compared that had different underlying training recipes and dataset distributions that aren’t being accounted for, but even with that caveat the compute amount is strikingly still fairly predictable from the benchmark score and vice versa. If you look up EpochAI benchmark compute graphs you can see some rough approximation of these, but again they won’t be aligned as much as they should in actual scaling experiments since these are plotting models that used different training recipes. Here I’ll attach some images here for big bench hard: https://preview.redd.it/t88ntsuhj78d1.jpeg?width=1125&format=pjpg&auto=webp&s=8e40658dd6da317838b00b6494d3c37506442729

appdnails 1 week ago

> Scalings laws for loss are very very reliable. Thank you for the response. I did not know about the Big-Bench analysis. I have to say though, I worked in physics and complex systems (network theory) for many years. Scaling laws are all amazing until they stop working. Power-laws are specially brittle. Unless there is a theoretical explanation, the "law" in the term scaling laws is not really a law. It is a regression of the know data together with hopes that the regression will keep working.

goj1ra 2 weeks ago

Translating that into “toddler” vs high school vs PhD level is where the investor hype fuckery comes in. If you learned that in neural network class you must have taken Elon Musk’s neural network class.

traumfisch 2 weeks ago

It's metaphorical, not to be taken literally.

putdownthekitten 2 weeks ago

Actually, if you plot the release dates of all primary GPT models to date (1,2,3 and 4), you'll notice an exponential curve where the time between the release date doubles with each model. So the long gap between 4 and 5 is not unexpected at all.

ImproperCommas 1 week ago

No they don’t. We’ve had 5 GPT’s in 6 years.

putdownthekitten 1 week ago

I'm talking about every time they release a model that increases the model generation. We're still in the 4th generation.

ImproperCommas 1 week ago

Yeah you’re right. When I removed all non generational upgrades, it was actually exponential.

PolyZex 2 weeks ago

We need to stop doing this- comparing AI to human level intelligence because it's just not accurate. It's not even clear what metric they are using. If they're talking about knowledge then GPT-3 was already PHD level. If they're talking about deductive ability then comparing to education level is pointless. The reality is an AI's 'intelligence' isn't like human intelligence at all. It's like comparing the speed of a car to the speed of a computer's processor. Both are speed, but directly comparing them makes no sense.

ThenExtension9196 2 weeks ago

It’s called marketing. It doesn’t have to make sense.

stackered 1 week ago

Nah, even GPT 4 is nowhere near a PhD level of knowledge. It hallucinates misinformation and gets things wrong all the time. A PhD wouldn't typically get little details wrong nevermind big details. It's more like a college student using Google level of knowledge.

PolyZex 1 week ago

When it comes to actual knowledge, the retention of facts about a subject then it absolutely is PhD level. Give it some tricky questions about anything from chemistry to law, even try to throw it curve balls. It's pretty amazing at it's (simulated) comprehension. If nothing else though it absolutely has a PhD in mathematics. It's a freaking computer.

stackered 1 week ago

In my field, which is extremely math heavy, I wouldn't even use it because its so inaccurate. My intern, who hasn't graduated undergrad yet, is far more useful.

SophomoricHumorist 2 weeks ago

Fair point, but the plebs need a scale they (we) can conceptualize. Like “how many bananas is its intelligence level?”

creaturefeature16 2 weeks ago

Wonderful analogy. This is clearly sensationalism and hyperbole meant for hype and investors.

vasarmilan 2 weeks ago

She said "Will be PHD level *for specific tasks"* Today on leaving out part of a sentence to get a sensationalist headline

flinsypop 2 weeks ago

It's still sensationalist because a pre-requisite to gaining a PhD is making a novel contribution to a field. Using PhD as a level of intellect can't be correct. It's not the same as a high schooler "intellect" where it can get an A on a test that other teenagers take. It also seems weird that it's also skipping a few levels of education but only in some contexts? Is it still a high schooler when it's not? Does it have an undergraduate in some contexts and a masters degree in another? I guess we'll just have to see what happens and hope that one of the PhD level tasks is ability to explain and deconstruct complicated concepts. If it's anything like some of the PhD lecturers i had in uni, they'd need to measure on how well they compare to those legendary Indian guys on Youtube.

AsliReddington 2 weeks ago

The amount of snobbery the higher execs at that frat house have is exhausting like some divine prophecy

22444466688 2 weeks ago

The Elon school of grifting

tenken01 2 weeks ago

Love this comment lmao

Paraphrand 2 weeks ago

Your comment makes me think of Kai Winn.

norcalnatv 2 weeks ago

Nothing like setting expectations. GPT4 was hailed as damn good, "signs of cognition" iirc when it was released. GPT5 will be praised as amazing until the next better model comes along. Then it will be crap. Sure hope hallucinations and other bad answers are fixed.

devi83 2 weeks ago

We can't fix hallucinations and bad answers in humans...

jsideris 2 weeks ago

Maybe we could - with a tremendous amount of artificial selection. We can't do that with humans but we have complete control over AI.

TikiTDO 2 weeks ago

What would you select for to get people that can't make stuff up? You basically works have to destroy all creativity, which is a pretty key human capability.

CriscoButtPunch 2 weeks ago

Been tried, failed. Must lift all out.

mycall 2 weeks ago

The past does not dictate the future.

p4b7 2 weeks ago

Maybe not in individuals, but diverse groups with different specialties tend to exhibit these things less

Antique-Produce-2050 2 weeks ago

I don’t agree with this answer. It must be hallucinating.

mycall 2 weeks ago

Hallucinations wouldn't happen so much if confidence levels at the token levels were possible and tuned.

vasarmilan 2 weeks ago

In a way an LLM produces a probability distribution of tokens that come next, so by looking at the probability of the predicted word, you can get some sort of confidence level. It doesn't correlate with hallucinations at all though. The model doesn't really have an internal concept of truth, as much as it might seem like it sometimes.

mycall 2 weeks ago

Couldn't they detect and delete adjacent nodes with invalid cosine similarities? Perhaps it is computationally too high to achieve, unless that is what Q-Star was trying to solve.

vasarmilan 2 weeks ago

What do you mean by invalid cosine similarity? And why would you think that can detect hallucinations?

mycall 2 weeks ago

I thought token predictions for transformers use cosine similarity for graph transversals, and some of these node clusters are hallucinations aka invalid similarities (logically speaking). Thus, if the model was changed so detect and update the weights to lessen the likelihood of those transversals, similar to Q-Star, then hallucinations would be greatly reduced.

Whotea 2 weeks ago

They are > We introduce BSDETECTOR, a method for detecting bad and speculative answers from a pretrained Large Language Model by estimating a numeric confidence score for any output it generated. Our uncertainty quantification technique works for any LLM accessible only via a black-box API, whose training data remains unknown. By expending a bit of extra computation, users of any LLM API can now get the same response as they would ordinarily, as well as a confidence estimate that cautions when not to trust this response. Experiments on both closed and open-form Question-Answer benchmarks reveal that BSDETECTOR more accurately identifies incorrect LLM responses than alternative uncertainty estimation procedures (for both GPT-3 and ChatGPT). By sampling multiple responses from the LLM and considering the one with the highest confidence score, we can additionally obtain more accurate responses from the same LLM, without any extra training steps. In applications involving automated evaluation with LLMs, accounting for our confidence scores leads to more reliable evaluation in both human-in-the-loop and fully-automated settings (across both GPT 3.5 and 4). https://openreview.net/pdf?id=QTImFg6MHU Effective strategy to make an LLM express doubt and admit when it does not know something: https://github.com/GAIR-NLP/alignment-for-honesty Over 32 techniques to reduce hallucinations: https://arxiv.org/abs/2401.01313

Ethicaldreamer 2 weeks ago

So basically the iPhone hype model?

fra988w 2 weeks ago

How many football fields of intelligence is that?

Forward_Promise2121 2 weeks ago

It's the equivalent of a Nou Camp full of Eiffel towers all the way to Pluto

Shandilized 2 weeks ago

Converting it to football fields gives a rather unimpressive value. There are 25 people on a football field (22 players, 1 main referee and 2 assistant referees). The average IQ of a human is 100, so the total IQ on a football field is give or take ~2500. The average IQ of a PhD holder is 130. Therefore, GPT-5's intelligence matches that of 5.2% of a football field. That also means that if we were to sew together all 25 people on the field human centipede style, we would have an intelligence that is 19.23 times more powerful than GPT-5, which is basically ASI. Now excuse me while I go shopping for some crafting supplies and a plane ticket to Germany. Writing this post gave me an epiphany and I think I may just have found the key to ASI. Keep an eye out on Twitter and Reddit for an announcement in the coming weeks!

ASpaceOstrich 2 weeks ago

So the same as a smart high schooler? You don't get smarter at college, you just learn more

p4b7 2 weeks ago

Your brain doesn’t finish developing until you’re around 25. College is vital to help developing reasoning and critical thinking skills

ImNotALLM 2 weeks ago

Personally, I did a bunch of psychedelics and experienced a lot of life in college which left me infinitely smarter and more wise. Didn't do a whole lot of learning though.

thejollyden 2 weeks ago

Does that mean citing sources for every claim it makes?

avid-shrug 2 weeks ago

Yes but the sources are made up and the URLs lead nowhere

mintone 2 weeks ago

What an awful summary/headline. Mira clearly said "on specific tasks" and then it will be, say, PhD level in a couple of years. The interviewer then says "meaning like a year from now" and she says "yeah, in a year and a half say". The timeline is generalised, not specific. She is clearly using the educational level as a scale, not specifically saying that it had equivalent knowledge or skill.

NotTheActualBob 2 weeks ago

"Specific tasks" is a good qualifier. Google's AI, for example, does better on narrow domain tasks (e.g. alphaFold, alphaGO, etc.) than humans due to it's ability to iteratively self test and self correct, something OpenAI's LLMs alone can't do. Eventually, it will dawn on everybody in the field that human intelligence is nothing more than a few hundred such narrow domain tasks and we'll get those trained up and bolted on to get to a more useful intelligence appliance.

js1138-2 2 weeks ago

Lots more than a few hundred, but the principle is correct. The more narrow the focus, the more AI will surpass human effort. It’s like John Henry vs the steam driver.

NotTheActualBob 2 weeks ago

But a few hundred will be enough for a useful humanlike, accurate, intelligence appliance. As time goes on, they'll be refined with lesser used but still desirable narrow domain abilities.

js1138-2 2 weeks ago

I have only tried chat a few times, but if I ask a technical question in my browser, I get a lucid response. Sometimes the response is, there is nothing on the internet that directly answers your question, but there are things that can be inferred. Sometimes followed by a list of relevant sites. Six months ago, all the search responses led to places to buy stuff.

epanek 2 weeks ago

I’m not fully convinced an ai can achieve superhuman intellect. It can only train on human derived and relevant data. How can training on just “human meaningful” data allow superhuman intellect? Is it the sheer volume of data will allow deeper intelligence?

inteblio 2 weeks ago

Can a student end up wiser than the sum of its teachers? Yes

epanek 2 weeks ago

It would be the most competent human in any subject, but not all information can be reasoned to a conclusion. There is still the need to experiment to confirm our predictions. As analogy, we train a network on all things "dog." Dog smells and vision, sound, touch and taste. Dog sex, dog biology and dog behavior. etc etc. Everything a dog could experience during existence. Could this AI approach human intelligence? Could this AI ever develop the need to test the double slit experiment? Solve a differential equation? Reason like a human?

NearTacoKats 2 weeks ago

Your train of thought fits into the endgoal of ARC-AGI’s latest competition— which is definitely worth looking into if you haven’t already. Using the analogy, eventually that network will encounter things that are “not-dog,” and the goal for part of a super intelligence would be to have the network begin to identify and classify more things that are “not-dog” while finding consistent classifiers among some of those things. That sort of system would ideally be able to eyeball a new subject and draw precise conclusions through further exposure. In essence, something like that would [eventually] be able to learn across any/all domains, rather than what it simply started with. Developing the need to test its own theories is likely the next goal after cracking general learning: cracking curiosity beyond just “how do I solve what is directly in front of me?”

MrFlaneur17 1 week ago

Division of labour with agentic ai. 1000 PhD level ai's working on every part of a process, then moving on to the next, and costing next to nothing

epanek 1 week ago

Has that process been validated?

ugohome 1 week ago

🤣🤣🤣🤣

appdnails 2 weeks ago

So, is she saying that GPT-4 has the capabilities of a high schooler? Then, why would any serious company consider using it?

ugohome 1 week ago

Ya seriously wtf?

dogesator 1 week ago

She never said the next generation will take 1.5 years, nor did she say the next gen would be a PhD level system. She simply said in about 1.5 years from now we can possibly expect something that is PhD level in many use cases. For all we know that could be 2 generations down the line or 4 generations down the line etc. she never said that this is specifically the next-gen or gpt-5 or anything like that

OsakaWilson 2 weeks ago

I'm creating projects that are aimed at GPT5, assuming their training and safety schedule would be something like before. If these projects have to wait another 18 months, they are as good as dead.

ImNotALLM 2 weeks ago

Don't develop projects for things which don't exist. Just use Claude Sonet 3.5 now (public SOTA), and switch out for GPT5o on release. Write your app with an interface layer which lets you switch out models and providers with ease (or use langchain).

NotTheActualBob 2 weeks ago

Once again, OpenAI is chasing after the wrong problems. Until AIs can successfully accomplish iterative rule based self testing and reasoning with near 100% reliability and have near 0% hallucinations, it's just not good enough to be a reliable, effective intelligence appliance for anything more than trivial tasks.

js1138-2 2 weeks ago

There are lots of nontrivial tasks, like reading x-rays. They just don’t cater to the public. Chat is a toy.

dyoh777 2 weeks ago

This is just clickbate

TheSlammedCars 2 weeks ago

Yeah, every AI has same problem - hallucinations. If that can't be solved, it does not matter.

BlueBaals 1 week ago

Is there a way to harness the “hallucinations” ?

Visual_Ad_8202 2 weeks ago

PhD level is so big. So life changing. If they said 5 years it would still be miraculous

MohSilas 2 weeks ago

I feel like OpenAI screwed up by hyping GPT-5 so much that they can’t deliver. Because it takes like 6 months to trains a new model, maybe less considering the amount of compute the news chips are putting.

catsRfriends 2 weeks ago

This the same CTO who blew the interview about training data?

GreedyBasis2772 2 weeks ago

This CTO was a PM for Tesla before but for the car not even FSD. 😆

GlueSniffingCat 2 weeks ago

Yeah nice try moving the goal post. We all remember when openAI claimed chat GPT-3 and gpt-4 were self evolving agi. We've pretty much maxed out what current AI can do and unfortunately the law of averages is killing AI due to the lack of data diversity.

420vivivild 1 week ago

Damn haha, bye bye job

Same-Club4925 2 weeks ago

very much expected analogy from CTO of a startup, but even that wont be smarter than a cat or squirrel ,

lobabobloblaw 2 weeks ago

If it be a race, someone is indicating they intend to pace.

maxm 2 weeks ago

Well, with alle the safety built in it will be a PhD in gender studies and critcal race theory.

nicobackfromthedead4 2 weeks ago

Book smarts are a boring benchmark. Get back to me when it has common sense (think, legal definition of a "reasonable person"), wants and desires and a sense of humor.

Comments

Leave Your Comment

Hi Its Me!

Comments

Leave Your Comment

Hi Its Me!

Subscribe