A New Era of AI-Generated Content
MSAI director Kristian Hammond discusses OpenAI's new Sora model and its ability to create realistic video.
OpenAI, the creator of the innovative and disruptive ChatGPT system, announced in February its new Sora model, which uses artificial intelligence (AI) to generate realistic videos up to one minute in length based on text prompts.
According to the OpenAi website:
"Sora is able to generate complex scenes with multiple characters, specific types of motion, and accurate details of the subject and background. The model understands not only what the user has asked for in the prompt, but also how those things exist in the physical world."
Kristian Hammond, director of Northwestern Engineering's Master of Science in Artificial Intelligence (MSAI) program, appeared on CNN to discuss the new model. Afterward, he sat down to talk about its implications on the public and what MSAI students should know about it.
The sample videos OpenAI shared are quite realistic. What should people look for to determine if a video is generated with AI?
Video generation, image generation, and audio generation are now coming together to create a world where it is not clear that we will be able to trust the videos we see, the images we see, and the sound that we hear. No matter how hard the companies work to try to make these things safe, we are now in a world where we need a much more critical eye toward the content that we see.
We have to look for things that seem perfect to us, because those things are probably fake. This is a new era for us, and we should take a lesson from Kim Kardashian. Anything you see of Kim Kardashian has already been edited, photoshopped, and carefully curated. That is the world we are now going to be seeing — one that is edited, generated automatically, and curated for us.
You mentioned safety. When OpenAI announced Sora, the company said it's currently assessing areas for harms or risks. What was your reaction to that announcement?
It's great they're doing it, but it will not be complete. It will never be complete because what is safe and inoffensive to some people might not be to others. And there are things that look benign, but they're false. An image of me walking out of a building where you can clearly see the clock and there's a timestamp that says it was yesterday at 3 p.m. might look benign, but if I say I was someplace else at that time, it could be damaging for me.
They're looking at issues of safety from the perspective of what it is trained on, what queries look like, and how to evaluate the output to make sure it's trained on things that are good. They don't want to let people put in requests that seem untowards, and they can evaluate the output to see if anything untoward is there. The problem is there will be things that don't seem problematic but could become problematic.
What's an example of a prompt that could become problematic?
I could take a short video of President Joe Biden finishing up a press conference and then tell the generator to continue the video with 30 seconds of him looking left and right, looking confused and befuddled, not knowing where to go. That video could be devastating if someone is trying to make a point that he's old.
AI-generated content was in the news recently because of fake explicit images of Taylor Swift. What stuck out to you about that story?
Because of the pornography associated with Taylor Swift, people are noticing it and they're outraged. It's ugly, it's horrible, and it should absolutely be made illegal, but deep fakes have been a problem for years. Taylor Swift is a celebrity billionaire whose reputation was not hurt at all by this, but there are hundreds of thousands of women whose ex-boyfriends have used deep fakes to disparage them, and those women are not protected. We have to attend to them, and that hasn't gotten enough attention. We're talking about Taylor Swift because she's famous, but what about all the other women who have been abused in this way?
What should MSAI students be thinking about in regards to Sora?
Students should be thinking about Sora and all new generative models as tools they have to learn and understand really well. There's a difference between regular people typing something into ChatGPT or any of the image models and someone who knows what the model is doing and can actually shape it. That skill is few and far between. Developing an understanding of that and knowing the best way to use these tools that are emerging is going to be a huge component and differentiator for our students.
The other thing students should think about is how these tools could integrate with the rest of the world. How do they integrate into larger information systems? How do they integrate into other kinds of content generation systems? When there's a new tool, you have to figure out how to use it and how to integrate it. That is a valuable skill set for our MSAI students.