If you have ever watched an AI “digital human” try to talk, you probably noticed something feels off. The mouth moves, the hands wave around, but the timing rarely matches the meaning. The result is that creepy “uncanny valley” effect where the character looks almost human but not quite. A newly open sourced project called SentiAvatar is trying to fix that problem.
Developed by SentiPulse alongside researchers from the Gaoling School of Artificial Intelligence at Renmin University of China, the framework is designed to create interactive 3D avatars that can speak, gesture, and react in real time. The idea is fairly simple. Human conversation is not just words. It is body language, facial expression, timing, posture, and subtle cues that signal emotion or intent. Most avatar systems handle those pieces separately, which is why they often look stiff or robotic.

SentiAvatar attempts to coordinate those elements so that speech, facial expression, and body motion stay synchronized during live conversation. The framework reportedly generates six second motion sequences in about 0.3 seconds, which allows avatars to keep moving naturally while speaking instead of waiting for a full sentence to finish processing. In theory, that should make conversations with digital characters feel far less awkward.
The release also includes a motion dataset called SuSuInterActs built around a single avatar character named SUSU. The dataset contains 21,000 clips and roughly 37 hours of multimodal conversation data with synchronized speech, facial expression, and full body motion. For developers and researchers working on conversational avatars, datasets like that can be just as valuable as the models themselves.
SEE ALSO: Mozilla and Mila team up on open source AI push as Big Tech tightens its grip
Under the hood, the system relies on what the developers call a Plan Then Infill architecture. It first decides what action should happen, such as a nod, shrug, or facial reaction, and then fills in the detailed animation frame by frame. Separating body motion from facial expression allows the model to coordinate gestures with speech while still generating natural looking movement.
The team also trained a motion foundation model on more than 200,000 motion sequences, totaling roughly 676 hours of animation data. That broader training set is meant to help the system understand general movement patterns beyond conversational gestures.
If the approach actually works outside of demos, the potential applications are fairly obvious. Game developers could build more believable NPCs. Virtual assistants could gain faces and body language. Streamers and VTubers could run expressive avatars that react in real time. Even robotics interfaces could eventually benefit from more natural humanlike communication.
Of course, the digital human space has a long history of promises that never quite deliver. Many projects claim to solve the uncanny valley only to reveal another avatar that still feels stiff and artificial. Until developers start experimenting with the code, it will be hard to know whether SentiAvatar truly improves the situation or simply pushes the problem a little further down the road.
Still, the fact that the framework, dataset, and avatar model are now open source means developers can test those claims themselves. And if someone finally figures out how to make AI avatars gesture and emote like real people during conversation, that could change how digital characters interact with humans across everything from games to AI assistants.
For now, it is another interesting experiment in a field that has been chasing believable digital humans for decades. Whether it becomes a real breakthrough or just another uncanny valley demo will depend on what the open source community manages to do with it.
You can check it out here on GitHub.