🎉 Gate Square Growth Points Summer Lucky Draw Round 1️⃣ 2️⃣ Is Live!
🎁 Prize pool over $10,000! Win Huawei Mate Tri-fold Phone, F1 Red Bull Racing Car Model, exclusive Gate merch, popular tokens & more!
Try your luck now 👉 https://www.gate.com/activities/pointprize?now_period=12
How to earn Growth Points fast?
1️⃣ Go to [Square], tap the icon next to your avatar to enter [Community Center]
2️⃣ Complete daily tasks like posting, commenting, liking, and chatting to earn points
100% chance to win — prizes guaranteed! Come and draw now!
Event ends: August 9, 16:00 UTC
More details: https://www
Actual measurement of Runway AI model Gen-2, the behind-the-scenes technology company of "The Instant Universe": There is still a long way to go to generate a movie-quality video
By Kyle Wiggers
Source: TechCrunch
In a recent interview with Collider, Joe Russo, director of Marvel films such as Avengers: Endgame, predicted that within two years, AI will be able to create a full-fledged movie. In this regard, I would say that this is a fairly optimistic estimate. But we're getting closer.
This week, Google-backed AI startup Runway (which helped develop AI image generator Stable Diffusion) released Gen-2, a model that generates video based on text prompts or existing images. (Gen-2 was previously only available on a limited waitlist.) A follow-up to the Gen-1 model that Runway launched in February, the Gen-2 was one of the first commercially available text-to-video models.
"Commercially available" is an important distinction. Text-to-video, the logical next logical frontier for generative AI after images and text, is becoming a bigger area of focus, especially among the tech giants, some of which have demonstrated text-to-video over the past year Model. But these models are still in the research phase and inaccessible to all but a handful of data scientists and engineers.
Of course, first doesn't mean better.
Out of personal curiosity and as a service to you, dear reader, I ran a few hints through the Gen-2 to see what the model could -- and couldn't -- accomplish. (Runway currently offers about 100 seconds of free video generation.) There isn't much of a method to my madness, but I'm trying to capture a range of angles that either professional or amateur directors might want to see on screen or on a laptop , type and style.
The limitations of Gen-2 became immediately apparent, with the model generating 4-second-long videos at a frame rate so low that it stuttered like a slideshow in places.
Aside from frame rate issues, I also found that Gen-2-generated clips tended to share a certain graininess or blurriness, as if they had some sort of old-fashioned Instagram filter applied to them. Also, there are artifacts elsewhere, like pixelation around objects when the "camera" (for lack of a better word) goes around them or zooms in quickly on them.
Like many generative models, Gen-2 isn't particularly consistent in terms of physics or anatomy. Like something a surrealist would create, Gen-2 produced videos of people's arms and legs fused together and then separated, while objects melted into the floor and disappeared, and shadows were distorted. And -- on cue -- the human face could be doll-like, with shiny, emotionless eyes and pale skin reminiscent of cheap plastic.
I tried a hint - "a video of an underwater utopia, filmed with an old camera, 'found footage' film style" - but Gen-2 doesn't generate such a utopia, only one that looks like a first-person view Dive video, across an anonymous coral reef. Among my other prompts, the Gen-2 also failed to generate a zoomed-in shot for a prompt that specifically asked for a "slow zoom", nor did it fully grasp what an average astronaut would look like.
Are these issues related to the Gen-2 training dataset? Maybe.
Gen-2, like Stable Diffusion, is a diffusion model, which means it learns how to gradually subtract noise from a starting image made entirely of noise to approach the cue step by step. Diffusion models learn by training on millions to billions of examples; in an academic paper detailing the Gen-2 architecture, Runway says the model was trained on a dataset of 240 million images and 6.4 million video clips. trained on the internal dataset.
Variety of examples is key. If the dataset doesn't contain many animation clips, then the model -- lacking reference points -- won't be able to generate animations of reasonable quality. (Of course, animation is a broad field, and even if the dataset did have clips of anime or hand-drawn animation, the model wouldn't necessarily generalize well to all types of animation).
Based on the prompt "A video of a CEO walking into a conference room," Gen-2 generated videos of men and women (although there were more men than women) sitting around similar conference tables. Meanwhile, Gen-2 outputs an Asian female doctor behind a desk, according to the description "Video of a Doctor Working in an Office".
The takeaway from all of this, for me, is that the Gen-2 is more of a novelty toy than a truly useful tool in any video workflow. Can these outputs be edited into something more coherent? May be. But depending on the video, this might be more work than shooting the footage in the first place.
This is not to dismiss the technology. What Runway has done is impressive, effectively beating the tech giants to take the text-to-video advantage. I'm sure some users will find that Gen-2's uses don't require realism, nor a lot of customizability. (Runway CEO Cristóbal Valenzuela recently told Bloomberg that he sees Gen-2 as a tool for artists and designers to aid in their creative process).
To avoid deepfakes, Runway says it is using a combination of artificial intelligence and human moderation to prevent users from producing videos that include pornography or violence or violate copyrights. I can confirm that Gen-2 has a content filter -- a little bit too much, in fact. These are not foolproof methods, and we'll have to see how well they work in practice.
But at least for now, filmmakers, animators, CGI artists and ethicists can rest easy. It will be at least a few iterations before Runway's technology comes close to producing cinematic-quality video -- assuming it gets there.