The surprising evolution of digital avatars

The road to artificial intelligence has many forks. One of them leads to the creation of hyper-realistic videos, a technique that is perfected with every presentation given by the market giants.

The expressions are almost flawless. Since I know that what I'm seeing is not “real,” I can recognize a “weird” mouth movement, a slightly exaggerated smile, or an “excessive” eye blink. However, if I watched the videos without knowing what I was dealing with, I wouldn't notice anything. They are just ordinary people talking about different topics. Except they're not humans, but digital constructs created by artificial intelligence, capable of reproducing real life with disturbing accuracy.

Microsoft last week unveiled a new artificial intelligence (AI), called VASA-1, that is capable of creating highly realistic human avatars from a photo and an audio file.

This technology can bring our photos to life, add expressions and lip-sync to the audio clip. It's such perfect synchronization that it even makes Da Vinci's Mona Lisa sing the tune. PhotographersAnne Hathaway, among the examples of videos revealed by Microsoft.

How does VAZA-1 work?

According to the researchers, VASA-1 captures the full range of human expressions, including natural head movements, to create truly believable talking avatars. This is possible by separating elements such as facial features, head position and expressions, allowing detailed control over each attribute and the ability to edit content separately.

Microsoft's AI uses a 3D approach to capture more details about the face and how it moves in 3D space. The diffusion model accepts additional cues, such as basic gaze direction and head distance, in addition to emotions. Using the same audio path, the VASA-1 can create happy, angry or nervous avatars, which strive to get closer to realism.

The VASA-1 can produce high-quality videos at a resolution of 512 x 512 pixels at 45 frames per second. The researchers highlighted its efficiency, as the tool can be run on a computer equipped with an Nvidia RTX 4090 graphics card.

Microsoft's AI, which is currently only experimental and not available to the public, is not limited to real photos, but can also be applied in illustrations or paintings, such as the Mona Lisa mentioned above. Hyper-realistic avatars could revolutionize the way we interact in the digital world.

Technology is on the rise

If you search the Internet, you will find that there are many free tools for creating avatars using AI, although many of them provide results of questionable quality. However, Microsoft's efforts are not the only ones that have achieved amazing results.

In January, Google introduced Lumiere, an artificial intelligence system for creating videos from text. Lumiere is distinguished by its spatiotemporal architecture, which allows it to create entire clips in a single step, avoiding the temporal inconsistency seen in previous models. This feature is essential for smoothness and consistency in increasingly life-like videos.

Additionally, Lumiere makes video editing easier for users with little knowledge, allowing you to modify specific parts with a simple mask and text command. It is also possible to create “stylized videos” – which Aesthetic It's very fashionable now — to use a reference image, which was quite a challenge until now.

Meanwhile, on February 15, Sora arrived, a tool developed by OpenAI that allows you to create realistic videos from a sentence of text. With it you can detail the movement, settings and transitions of scenes, for a maximum of one minute.

Like GPT-4 or DALL-E 3, Sora uses a deep learning system of artificial neural networks and computational structures to learn from large amounts of data and apply it to create completely new things. Sora trained with numerous videos and descriptions to understand and learn how this type of multimedia environment works and apply it to his own creations.

VASA-1 is an evolution of Sora and Lumiere, as it uses not text, but sounds, to create its avatars.

Keys to understanding

Two basic concepts for evaluating AI capabilities are training and inference. These terms often go unnoticed, but they are essential to measuring the ability of these systems to perform at their best.

Training is the first aspect that we must consider. It refers to the volume of data with which AI is trained to create new things. You're not building anything from scratch, you're building on what you've learned. This is something humans also experience when we learn how to create new things. The more data the AI is trained on, the easier it is for it to create noticeable differences in the generated video.

The video was created using OpenAI's Sora AI.

Inference is another essential aspect. Without this, it will be very difficult for AI to understand humans and successfully fulfill our requests. Heuristics is the ability to understand our requests and adhere to them to achieve them successfully. The more you understand, the more accurate you become. The engineers behind AI have a clear approach: linguistic models should understand our requests, even if we explain ourselves poorly.

However, Sora also poses a problem: the difficulty of distinguishing between a real video and an AI-generated video, as is already the case with photos. This opens the door to the spread of fake videos on social media, which can show unrealistic situations with celebrities. Therefore, it is important that there is regulation and limits on the use of this technology.

Pipeline problem

However, there is also a dangerous side to this type of AI. Highly realistic avatars and videos can be used to trick users. For this reason, Microsoft announced that it is against any negative application, and indicated that it will not publish this tool except after ensuring that its technology is used responsibly.

“We oppose any behavior that leads to the creation of misleading or harmful content from real people and are interested in applying our technology to enhance the detection of counterfeit products. “We are committed to developing artificial intelligence responsibly, with the aim of enhancing human well-being,” the company stated.

Despite Microsoft's good intentions – which are also shared by Google and OpenAI – the reality is that disagreements have already arisen over which models are capable of generating images. Let us remember last year the fake pictures of Donald Trump while he was detained and resisting, or of Pope Francis rapping or modeling, or of US President Joe Biden fighting in a street surrounded by explosions and bullets.

They were all created using Midjourney, a tool that eliminated the free access model to avoid spreading them Fake news. The truth is that in this post-truth era, where we must doubt everything we see, read or hear, at least until we have verified its authenticity through reliable sources, tools like VASA-1, Sora or Lumiere are also a wake-up call of the importance of regulations. And the new frontiers of using these technologies.

Rowan Hewitt

“Beer enthusiast. Subtly charming alcohol junkie. Wannabe internet buff. Typical pop culture lover.”

The surprising evolution of digital avatars – Juventud Rebelde

How does VAZA-1 work?

Technology is on the rise

Keys to understanding

Pipeline problem

New Zealand vs. United States: How to watch the Paris 2024 Olympics?

Fast, Private No-Verification Casinos in New Zealand: Insights from Pettie Iv

The New Zealand central bank is cutting interest rates for the first time in more than four years

A spattooth whale washed ashore in New Zealand

New Zealand joins UK and US in accusing China of hacking and spying as concerted pressure mounts on Beijing

More Articles Like This

About us

Latest News

New Zealand vs. United States: How to watch the Paris 2024 Olympics?

Fast, Private No-Verification Casinos in New Zealand: Insights from Pettie Iv

The New Zealand central bank is cutting interest rates for the first time in more than four years

Popular News

Pages