NF: Yes, I interviewed the creators of Sora yesterday, the day before on stage at the event and it was very interesting to hear their opinion. I think we see Sora as this media production tool, that’s not their idea, that’s a side effect. Their idea is that it’s a universal simulator and that it can actually simulate any kind of behavior in the world, including saying, “Let’s make a video with Ben and Daniel and Nat and have them talk about this,” and see where the conversation goes. And their opinion is that Sora today is GPT-1 standard, not much data, not much computing, so we should expect a dramatic improvement in the future as they increase it and thirdly that there is more video data than text data on the Internet…
And then Andrej Karpathy, I was talking to him the other day as well, and he said, “Something strange is happening-“
[Ben Thompson] And a picture is worth a thousand words by the way, so the number of tokens there is astronomical.
NF: He was testing the idea that a global model with images and video models might be better than text models. You ask for a car engine, someone to fix a carburetor and the level of detail that can be there is amazing, and maybe we made a mistake by training with a script that was pulled from Common Crawl and what to do instead. I asked him for his most consistent research opinion. He said what we should do instead is train on images of web pages and if you ask the model a question, it outputs an image of the web page with the answer and maybe we’ll get more intelligence and better results from that.
That comes from his interview with Ben Thompson and Daniel Gross, who is included in the gate but should be paid.
Source link