Abuse of the voices of public figures for pranks, and the application of AI technology has been questioned for -overreaching-

Recently, a surge of videos mocking a well-known entrepreneur has appeared on various short video platforms. In these clips, the entrepreneur humorously critiques hot topics such as traffic jams and game regulations, even interspersing some colorful language, which has sparked extensive online discussion.

However, these mockery videos aren’t actually the entrepreneur’s own words; rather, they are AI-generated voiceovers created by users who have employed AI software to replicate the entrepreneur’s original voice. The realistic nature of these audio clips has led many to believe they are authentic.

While AI technology has brought many conveniences to our lives, it has also raised significant concerns. For instance, the AI voice synthesis technology allows anyone to produce a convincingly lifelike audio clip with just a few clicks. Yet, some videos have crossed the line into inappropriate territory, misappropriating others’ voices to create pranks or spread misinformation, leading to negative societal consequences.

Generating an AI audio clip now takes only around 20 seconds. Our investigation into social media revealed that many users are sharing tutorials on how to create such videos. Most of these clips are based on a sophisticated voice model application, where creators upload any individual’s audio segment as training material. The AI then learns to clone that voice, enabling other users to simply input text and generate audio in the same voice.

Following the instructions on this site, I created a voice model of the aforementioned entrepreneur by inputting a 100-word text. Within approximately 20 seconds, an AI audio clip that closely resembled the entrepreneur’s voice was ready. As of now, this voice model has been used nearly 800,000 times, generating over 44.5 million characters.

Notably, other public figures have experienced similar misappropriation of their voices. At the end of September, a purported drunk recording of a certain livestream platform owner circulated online, depicting the individual as dismissive towards consumers. This incident sparked significant controversy for the platform. However, police later determined that the audio was fabricated using a large AI model.

An industry insider explained that AI technology has become so adept at mimicking audio samples—considering factors like tone, speed, emotion, accent, and vocal style—that it can produce voices almost indistinguishable from the originals, both to human ears and through other verification methods.

“From the early days of obvious fakes to today’s indistinguishable forgeries, we have entered an era of deepfakes,” said Liu Xiaochun, associate professor at the University of Chinese Academy of Social Sciences and director of the Internet Rule of Law Research Center. He highlighted that unauthorized use of someone else’s voice for AI-generated products, especially for public figures, can lead to misunderstandings, infringe on personal information security, and disrupt the online ecosystem.

Many users engaging in AI audio synthesis primarily for entertainment are often unaware of the legal risks involved. In the comment section of an “AI Voiceover Tutorial” video, one learner asked, “Isn’t this infringing on rights?” The content creator replied, “Everyone’s doing it; if the platform flags it, just delete it.”

Lawyer Zhang Qingxin from Beijing Yingshan Law Firm analyzed that, similar to a person’s likeness, an individual’s voice is unique and integral to their personality rights. Creating and uploading AI audio without consent, whether for commercial gains or entertainment, infringes on those rights. If the content created is illegal or violates public morals, it could also infringe on someone’s reputation.

In April, Beijing’s Internet Court ruled on the first AI-generated voice personality rights infringement case in China. The voice artist claimed that their work was misappropriated by a short video platform to generate AI voice products, significantly violating their voice rights. The court ruled in favor of the plaintiff, awarding damages of 250,000 yuan.

During the trial, the defendant argued that AI-generated audio differs from human voices in terms of personal rights and that technology typically applies watermarks to distinguish AI voices from human ones. However, the court maintained that the AI voice bore a high level of similarity to the plaintiff’s, likely prompting listeners to associate it with the plaintiff. Therefore, the protection of natural persons’ voice rights extends to AI-generated audio when it is recognizable.

“This ruling is significant in clarifying the boundaries around AI-generated voices,” Liu said.

As technology advances, experts emphasize the need for legal frameworks to keep pace. A report released in April by Tsinghua University’s Center for New Media Research indicated a staggering 99.91% increase in AI-related rumors regarding the economy and businesses. Experts advocate for clear regulations to guide the ethical use of AI technology.

Zhang Qingxin highlighted that AI voice tool providers cannot remain passive. They need to take proactive steps to control the quality of source materials and supervise generated content, improving operational guidelines and cooperating with authorities to trace the origin of illegal audio.

In September, the National Internet Information Office issued a draft guideline proposing that platforms providing online content dissemination services must implement measures to regulate AI-generated content, including necessary identification features and encouraging users to disclose whether their published content includes AI elements.

“As content distribution service providers, platforms like short video channels must fulfill their responsibilities,” Liu suggested. Beyond prompting users to label AI content, platforms should establish mechanisms to identify and trace AI-generated materials. If they detect counterfeit content or receive complaints, platform rules should empower them to require timely labeling of such content, enforce deletions if labels are absent, and impose penalties like account bans for severe violations.