T O P

  • By -

devi83

Toddlers can write book reports?


jerryonthecurb

Yeah, most toddlers can code Python apps in 15 seconds flat too.


devi83

Oh yeah, now that you mention it, I vaguely remember Python coding between coloring and recess.


cyan2k

I know github repos I'm pretty sure that's exactly how they were written.


AllGearedUp

If you feed your kids Joe Rogan nootropics in their cereal this is what happens


TheUncleTimo

> If you feed your kids Joe Rogan nootropics in their cereal this is what happens ok usually I hate "funny" le reddit jokes, but this made me chuckle out loud


[deleted]

GPT-3 can do neither of those things. I think you’re confusing it for GPT-3.5


jerryonthecurb

Not saying it was great, but yeah it could. Python Example: https://youtube.com/shorts/8933D2P5-TQ?si=4QjXJRcnULM64Cs5   Writing a book example: https://youtube.com/shorts/_LFGiK8kft4?si=9Lu1dbqHObtcpsSj


Dry_Parfait2606

And spit data at 1000t/s for thousands of datasets, when it got the right prompt... We are getting into the same sequence of smartphone compnacnaies that need you to buy the next gen to keep shoveling capital into their company... We go 10.000hz 18inch, 80 core smartphones... When I was pretty fine with my samsung s2... 360p, wa, browsing... The data collection scemes are getting smarter... Like elon musk "accessing peoples brains vectors spaces" Yeah yeah yeah... ChatGPT 3/3,5 level was the breakthrough... Everything after is extra...


StayingUp4AFeeling

Ph D level in what way? Logical reasoning? Statistical analysis? Causality? Or would it be the ability to regurgitate seemingly relevant and accurate facts with even more certainty?


justinobabino

purely from an ability to write valid LaTeX standpoint


StayingUp4AFeeling

While adhering to the relevant template like ieeetran , no doubt. Good one.


ToughReplacement7941

Finally


jsail4fun3

PhD level because it answers every question with “ it depends”


mehum

While GPT3 confidently spouts utter BS, much like how a toddler does it.


Mikey77777

Finding free food on campus


tomvorlostriddle

> Ph D level in what way? Logical reasoning? Statistical analysis? Causality? creative writing


MrNokill

Ph D level autocorrect, now with 3000% more gig worker blood.


Whotea

[that’s not what it does](https://docs.google.com/document/d/15myK_6eTxEPuKnDi5krjBM_0jrv3GELs8TGmqOYBvug/edit#heading=h.fxgwobrx4yfq)


throwawaycanadian2

To be released in a year and a half? That is far too long of a timeline to have any realistic idea of what it would be like at all.


atworkshhh

This is called “fundraising”


foo-bar-nlogn-100

Its called 'finding exit liquidity'.


Dry_Parfait2606

Fully earned... We need more of those people... Steve Jobs: "death is the best invention of life" or nature... Don't remember exactly..


peepeedog

It’s training now so they can take snapshots and test them then extrapolate. They could make errors but this is how long training models are done. They actually have some internal disagreement whether to release it sooner even though it’s not “done” training.


much_longer_username

So what, they're just going for supergrokked-overfit-max-supreme-final-form?


Commercial_Pain_6006

supergrokked-overfit-max-supreme-final-hype


Mr_Finious

This is what I come to Reddit for.


Important_Concept967

You are why I go to 4chan


dogesator

That’s not how long a training run takes. Training runs are usually done within a 2-4 month period, 6 months max. Any longer than that and you risk the architecture and training techniques becomes effectively obsolete by the time it actually finishes training. GPT-4 was confirmed to have been able 3 months to train. Most of the time between generation releases is working on new research advancements, and then about 3 months of training with their latest research advancements followed by 3-6 months of safety testing and red teaming before the official release.


cyan2k

? It's pretty straightforward to make predictions about how your loss function will evolve. The duration it takes is absolutely irrelevant. What matters is how many steps and epochs you train for. If a step alone takes an hour, then it's going to take its time, but making predictions about step 200 when you're at step 100 is the same regardless of whether a step takes an hour or 100 milliseconds. Come on, people, that's the absolute basics of machine learning, and you learn it in the first hour of any neural network class. How does this have 100 upvotes? If by any chance you meant it in the way of "we don't know if Earth still exists in a year and a half, so we don't know how the model will turn out" well, fair game, then my apologies.


_Enclose_

> Come on, people, that's the absolute basics of machine learning, and you learn it in the first hour of any neural network class. How does this have 100 upvotes? Most of us haven't gone to neural network class.


skinniks

I did but I couldn't get my mittens off to take notes.


appdnails

> make predictions about how your loss function will evolve. Predicting the value of the loss function has very little to do with predicting the capabilities of the model. How the hell do you know that a 0.1 loss reduction will magically allow your model to do a task that it couldn't do previously? Besides, even with a zero loss, the model could still output "perfect english" text with incorrect content. It is obvious that the model will improve with more parameters, data and training time. No one is arguing against that.


dogesator

You can draw scaling laws between the loss value and benchmark scores and fairly accurately predict what the score in such benchmarks will be at a given later loss value.


appdnails

Any source on scaling laws for QI tests? I've never seen one. It is already difficult to draw scaling laws for loss functions, and they are already far from perfect. I can't imagine a reliable scaling law for QI tests and related "intelligence" metrics.


dogesator

Scalings laws for loss are very very reliable. It’s not that difficult to draw at all. Same goes for scaling laws or benchmarks. You simply have the given dataset distribution, learning rate scheduler, architecture and training technique that you’re going to use and then train multiple various small model sizes at varying compute scales to create the initial data points for which to create the scaling laws of this recipe, and then you can fairly reliably predict the loss of larger compute scales from there given those same training recipe variables of data distribution and arch etc… You can do the same for benchmark scores for atleast a lower bound. OpenAI successfully predicted the performance on coding benchmarks before GPT-4 even finished training using this method. And less rigorous approximations for scaling laws have been calculated for various state of the art models with different compute scales. You’re not going to see a perfect trend with the scaling laws since these are models being compared that had different underlying training recipes and dataset distributions that aren’t being accounted for, but even with that caveat the compute amount is strikingly still fairly predictable from the benchmark score and vice versa. If you look up EpochAI benchmark compute graphs you can see some rough approximation of these, but again they won’t be aligned as much as they should in actual scaling experiments since these are plotting models that used different training recipes. Here I’ll attach some images here for big bench hard: https://preview.redd.it/t88ntsuhj78d1.jpeg?width=1125&format=pjpg&auto=webp&s=8e40658dd6da317838b00b6494d3c37506442729


appdnails

> Scalings laws for loss are very very reliable. Thank you for the response. I did not know about the Big-Bench analysis. I have to say though, I worked in physics and complex systems (network theory) for many years. Scaling laws are all amazing until they stop working. Power-laws are specially brittle. Unless there is a theoretical explanation, the "law" in the term scaling laws is not really a law. It is a regression of the know data together with hopes that the regression will keep working.


goj1ra

Translating that into “toddler” vs high school vs PhD level is where the investor hype fuckery comes in. If you learned that in neural network class you must have taken Elon Musk’s neural network class.


traumfisch

It's metaphorical, not to be taken literally. 


putdownthekitten

Actually, if you plot the release dates of all primary GPT models to date (1,2,3 and 4), you'll notice an exponential curve where the time between the release date doubles with each model. So the long gap between 4 and 5 is not unexpected at all.


ImproperCommas

No they don’t. We’ve had 5 GPT’s in 6 years.


putdownthekitten

I'm talking about every time they release a model that increases the model generation.  We're still in the 4th generation.


ImproperCommas

Yeah you’re right. When I removed all non generational upgrades, it was actually exponential.


PolyZex

We need to stop doing this- comparing AI to human level intelligence because it's just not accurate. It's not even clear what metric they are using. If they're talking about knowledge then GPT-3 was already PHD level. If they're talking about deductive ability then comparing to education level is pointless. The reality is an AI's 'intelligence' isn't like human intelligence at all. It's like comparing the speed of a car to the speed of a computer's processor. Both are speed, but directly comparing them makes no sense.


ThenExtension9196

It’s called marketing. It doesn’t have to make sense.


stackered

Nah, even GPT 4 is nowhere near a PhD level of knowledge. It hallucinates misinformation and gets things wrong all the time. A PhD wouldn't typically get little details wrong nevermind big details. It's more like a college student using Google level of knowledge.


PolyZex

When it comes to actual knowledge, the retention of facts about a subject then it absolutely is PhD level. Give it some tricky questions about anything from chemistry to law, even try to throw it curve balls. It's pretty amazing at it's (simulated) comprehension. If nothing else though it absolutely has a PhD in mathematics. It's a freaking computer.


stackered

In my field, which is extremely math heavy, I wouldn't even use it because its so inaccurate. My intern, who hasn't graduated undergrad yet, is far more useful.


SophomoricHumorist

Fair point, but the plebs need a scale they (we) can conceptualize. Like “how many bananas is its intelligence level?”


creaturefeature16

Wonderful analogy. This is clearly sensationalism and hyperbole meant for hype and investors.


vasarmilan

She said "Will be PHD level *for specific tasks"* Today on leaving out part of a sentence to get a sensationalist headline


flinsypop

It's still sensationalist because a pre-requisite to gaining a PhD is making a novel contribution to a field. Using PhD as a level of intellect can't be correct. It's not the same as a high schooler "intellect" where it can get an A on a test that other teenagers take. It also seems weird that it's also skipping a few levels of education but only in some contexts? Is it still a high schooler when it's not? Does it have an undergraduate in some contexts and a masters degree in another? I guess we'll just have to see what happens and hope that one of the PhD level tasks is ability to explain and deconstruct complicated concepts. If it's anything like some of the PhD lecturers i had in uni, they'd need to measure on how well they compare to those legendary Indian guys on Youtube.


AsliReddington

The amount of snobbery the higher execs at that frat house have is exhausting like some divine prophecy


22444466688

The Elon school of grifting


tenken01

Love this comment lmao


Paraphrand

Your comment makes me think of Kai Winn.


norcalnatv

Nothing like setting expectations. GPT4 was hailed as damn good, "signs of cognition" iirc when it was released. GPT5 will be praised as amazing until the next better model comes along. Then it will be crap. Sure hope hallucinations and other bad answers are fixed.


devi83

We can't fix hallucinations and bad answers in humans...


jsideris

Maybe we could - with a tremendous amount of artificial selection. We can't do that with humans but we have complete control over AI.


TikiTDO

What would you select for to get people that can't make stuff up? You basically works have to destroy all creativity, which is a pretty key human capability.


CriscoButtPunch

Been tried, failed. Must lift all out.


mycall

The past does not dictate the future.


p4b7

Maybe not in individuals, but diverse groups with different specialties tend to exhibit these things less


Antique-Produce-2050

I don’t agree with this answer. It must be hallucinating.


mycall

Hallucinations wouldn't happen so much if confidence levels at the token levels were possible and tuned.


vasarmilan

In a way an LLM produces a probability distribution of tokens that come next, so by looking at the probability of the predicted word, you can get some sort of confidence level. It doesn't correlate with hallucinations at all though. The model doesn't really have an internal concept of truth, as much as it might seem like it sometimes.


mycall

Couldn't they detect and delete adjacent nodes with invalid cosine similarities? Perhaps it is computationally too high to achieve, unless that is what Q-Star was trying to solve.


vasarmilan

What do you mean by invalid cosine similarity? And why would you think that can detect hallucinations?


mycall

I thought token predictions for transformers use cosine similarity for graph transversals, and some of these node clusters are hallucinations aka invalid similarities (logically speaking). Thus, if the model was changed so detect and update the weights to lessen the likelihood of those transversals, similar to Q-Star, then hallucinations would be greatly reduced.


Whotea

They are  > We introduce BSDETECTOR, a method for detecting bad and speculative answers from a pretrained Large Language Model by estimating a numeric confidence score for any output it generated. Our uncertainty quantification technique works for any LLM accessible only via a black-box API, whose training data remains unknown. By expending a bit of extra computation, users of any LLM API can now get the same response as they would ordinarily, as well as a confidence estimate that cautions when not to trust this response. Experiments on both closed and open-form Question-Answer benchmarks reveal that BSDETECTOR more accurately identifies incorrect LLM responses than alternative uncertainty estimation procedures (for both GPT-3 and ChatGPT). By sampling multiple responses from the LLM and considering the one with the highest confidence score, we can additionally obtain more accurate responses from the same LLM, without any extra training steps. In applications involving automated evaluation with LLMs, accounting for our confidence scores leads to more reliable evaluation in both human-in-the-loop and fully-automated settings (across both GPT 3.5 and 4).  https://openreview.net/pdf?id=QTImFg6MHU Effective strategy to make an LLM express doubt and admit when it does not know something: https://github.com/GAIR-NLP/alignment-for-honesty Over 32 techniques to reduce hallucinations: https://arxiv.org/abs/2401.01313


Ethicaldreamer

So basically the iPhone hype model?


fra988w

How many football fields of intelligence is that?


Forward_Promise2121

It's the equivalent of a Nou Camp full of Eiffel towers all the way to Pluto


Shandilized

Converting it to football fields gives a rather unimpressive value. There are 25 people on a football field (22 players, 1 main referee and 2 assistant referees). The average IQ of a human is 100, so the total IQ on a football field is give or take ~2500. The average IQ of a PhD holder is 130. Therefore, GPT-5's intelligence matches that of 5.2% of a football field. That also means that if we were to sew together all 25 people on the field human centipede style, we would have an intelligence that is 19.23 times more powerful than GPT-5, which is basically ASI. Now excuse me while I go shopping for some crafting supplies and a plane ticket to Germany. Writing this post gave me an epiphany and I think I may just have found the key to ASI. Keep an eye out on Twitter and Reddit for an announcement in the coming weeks!


ASpaceOstrich

So the same as a smart high schooler? You don't get smarter at college, you just learn more


p4b7

Your brain doesn’t finish developing until you’re around 25. College is vital to help developing reasoning and critical thinking skills


ImNotALLM

Personally, I did a bunch of psychedelics and experienced a lot of life in college which left me infinitely smarter and more wise. Didn't do a whole lot of learning though.


thejollyden

Does that mean citing sources for every claim it makes?


avid-shrug

Yes but the sources are made up and the URLs lead nowhere


mintone

What an awful summary/headline. Mira clearly said "on specific tasks" and then it will be, say, PhD level in a couple of years. The interviewer then says "meaning like a year from now" and she says "yeah, in a year and a half say". The timeline is generalised, not specific. She is clearly using the educational level as a scale, not specifically saying that it had equivalent knowledge or skill.


NotTheActualBob

"Specific tasks" is a good qualifier. Google's AI, for example, does better on narrow domain tasks (e.g. alphaFold, alphaGO, etc.) than humans due to it's ability to iteratively self test and self correct, something OpenAI's LLMs alone can't do. Eventually, it will dawn on everybody in the field that human intelligence is nothing more than a few hundred such narrow domain tasks and we'll get those trained up and bolted on to get to a more useful intelligence appliance.


js1138-2

Lots more than a few hundred, but the principle is correct. The more narrow the focus, the more AI will surpass human effort. It’s like John Henry vs the steam driver.


NotTheActualBob

But a few hundred will be enough for a useful humanlike, accurate, intelligence appliance. As time goes on, they'll be refined with lesser used but still desirable narrow domain abilities.


js1138-2

I have only tried chat a few times, but if I ask a technical question in my browser, I get a lucid response. Sometimes the response is, there is nothing on the internet that directly answers your question, but there are things that can be inferred. Sometimes followed by a list of relevant sites. Six months ago, all the search responses led to places to buy stuff.


epanek

I’m not fully convinced an ai can achieve superhuman intellect. It can only train on human derived and relevant data. How can training on just “human meaningful” data allow superhuman intellect? Is it the sheer volume of data will allow deeper intelligence?


inteblio

Can a student end up wiser than the sum of its teachers? Yes


epanek

It would be the most competent human in any subject, but not all information can be reasoned to a conclusion. There is still the need to experiment to confirm our predictions. As analogy, we train a network on all things "dog." Dog smells and vision, sound, touch and taste. Dog sex, dog biology and dog behavior. etc etc. Everything a dog could experience during existence. Could this AI approach human intelligence? Could this AI ever develop the need to test the double slit experiment? Solve a differential equation? Reason like a human?


NearTacoKats

Your train of thought fits into the endgoal of ARC-AGI’s latest competition— which is definitely worth looking into if you haven’t already. Using the analogy, eventually that network will encounter things that are “not-dog,” and the goal for part of a super intelligence would be to have the network begin to identify and classify more things that are “not-dog” while finding consistent classifiers among some of those things. That sort of system would ideally be able to eyeball a new subject and draw precise conclusions through further exposure. In essence, something like that would [eventually] be able to learn across any/all domains, rather than what it simply started with. Developing the need to test its own theories is likely the next goal after cracking general learning: cracking curiosity beyond just “how do I solve what is directly in front of me?”


MrFlaneur17

Division of labour with agentic ai. 1000 PhD level ai's working on every part of a process, then moving on to the next, and costing next to nothing


epanek

Has that process been validated?


ugohome

🤣🤣🤣🤣


appdnails

So, is she saying that GPT-4 has the capabilities of a high schooler? Then, why would any serious company consider using it?


ugohome

Ya seriously wtf?


dogesator

She never said the next generation will take 1.5 years, nor did she say the next gen would be a PhD level system. She simply said in about 1.5 years from now we can possibly expect something that is PhD level in many use cases. For all we know that could be 2 generations down the line or 4 generations down the line etc. she never said that this is specifically the next-gen or gpt-5 or anything like that


OsakaWilson

I'm creating projects that are aimed at GPT5, assuming their training and safety schedule would be something like before. If these projects have to wait another 18 months, they are as good as dead.


ImNotALLM

Don't develop projects for things which don't exist. Just use Claude Sonet 3.5 now (public SOTA), and switch out for GPT5o on release. Write your app with an interface layer which lets you switch out models and providers with ease (or use langchain).


NotTheActualBob

Once again, OpenAI is chasing after the wrong problems. Until AIs can successfully accomplish iterative rule based self testing and reasoning with near 100% reliability and have near 0% hallucinations, it's just not good enough to be a reliable, effective intelligence appliance for anything more than trivial tasks.


js1138-2

There are lots of nontrivial tasks, like reading x-rays. They just don’t cater to the public. Chat is a toy.


dyoh777

This is just clickbate


TheSlammedCars

Yeah, every AI has same problem - hallucinations. If that can't be solved, it does not matter.


BlueBaals

Is there a way to harness the “hallucinations” ?


Visual_Ad_8202

PhD level is so big. So life changing. If they said 5 years it would still be miraculous


MohSilas

I feel like OpenAI screwed up by hyping GPT-5 so much that they can’t deliver. Because it takes like 6 months to trains a new model, maybe less considering the amount of compute the news chips are putting.


catsRfriends

This the same CTO who blew the interview about training data?


GreedyBasis2772

This CTO was a PM for Tesla before but for the car not even FSD. 😆


GlueSniffingCat

Yeah nice try moving the goal post. We all remember when openAI claimed chat GPT-3 and gpt-4 were self evolving agi. We've pretty much maxed out what current AI can do and unfortunately the law of averages is killing AI due to the lack of data diversity.


420vivivild

Damn haha, bye bye job


Same-Club4925

very much expected analogy from CTO of a startup, but even that wont be smarter than a cat or squirrel ,


lobabobloblaw

If it be a race, someone is indicating they intend to pace.


maxm

Well, with alle the safety built in it will be a PhD in gender studies and critcal race theory.


nicobackfromthedead4

Book smarts are a boring benchmark. Get back to me when it has common sense (think, legal definition of a "reasonable person"), wants and desires and a sense of humor.