小壁虎的尾巴有什么作用| 清新是什么意思| 测骨龄去医院挂什么科| 暮光是什么意思| 喝什么茶去火| 皮肤瘙痒是什么原因| 1972年五行属什么| 什么东西越洗越脏答案| 沾花惹草是什么生肖| 病符是什么意思| 黄茶是什么茶| 清华大学校长是什么级别| 什么是撤退性出血| 接站是什么意思| 孔雀开屏是什么行为| 粉色五行属什么| 什么是认知障碍| 白带什么时候来| 白芷有什么作用与功效| 月子吃什么最下奶| 芝士是什么| 樱花是什么样子的| 脱发应该挂什么科室| 东莞有什么好玩的地方| 女性尿频是什么原因| 护士证什么时候下来| 切糕为什么这么贵| 去湿气吃什么食物好| 脚趾甲凹凸不平是什么原因| 围绝经期是什么意思| 小叶增生和乳腺增生有什么区别| 汉语拼音什么时候发明的| 10度左右穿什么衣服合适| 孩子恶心想吐是什么原因| 安宫牛黄丸治什么病| 23岁属什么生肖| 观音菩萨的坐骑是什么| 葫芦什么时候开花| 1968年属什么生肖| 背痒是什么原因| 瞬息万变什么意思| 真身是什么意思| 宰相是什么意思| 鸡的五行属什么| 冒是什么意思| emr是什么意思| 地皮菜是什么菜| 梦见自己掉牙是什么意思| 痘痘里面挤出来的白色东西是什么| 且慢是什么意思| 愿字五行属什么| 面首什么意思| cc是什么牌子| 旧历是什么意思| 重组人干扰素a2b主要是治疗什么病| 鳞状上皮细胞是什么意思| 穹隆什么意思| 乌龟下蛋预示着什么| 抽脂有什么风险和后遗症| 尿分叉吃什么药能治好| 肌酐低是什么原因| 病毒性发烧吃什么药| 眉毛变白是什么原因| 猫癣长什么样| 脚崴了挂什么科| 塞屁股的退烧药叫什么| 垂涎欲滴意思是什么| 派对是什么意思| 6541是什么药| 犯口舌是什么意思| 胆结石忌吃什么| 身份证号码最后一位代表什么| 真人是什么意思| 三个直念什么| 1022是什么星座| 数字3代表什么意思| 囧是什么意思| nana是什么意思| 补钙什么季节补最好| 来曲唑片什么时候吃最好| 来姨妈头疼是什么原因| 地痞是什么意思| 子宫占位是什么意思| 棱角是什么意思| 什么时间艾灸效果最好| 五指毛桃长什么样子| 细小是什么病| 凉拖鞋什么材质的好| d是什么元素| 煮虾放什么| 睡久了头疼是什么原因| 骨质疏松用什么药好| acne是什么意思| 防空警报是什么| 舌苔厚白吃什么食物好| 女人一般什么时候绝经| cet什么意思| 胆红素升高是什么原因| 菠萝蜜和什么不能一起吃| 胃疼应该挂什么科| 喉咙痛吃什么药效果好| 4.6什么星座| 1964年什么命| 手指尖疼是什么原因| 阑尾炎在什么位置疼| 包裹是什么意思| 反式脂肪是什么| 泌尿系统感染有什么症状| 拔罐痒是什么原因| 免疫球蛋白有什么作用| 运钞车是什么车| 28年属什么生肖| 后宫是什么意思| 香奶奶是什么牌子| 崩溃什么意思| 人为什么要喝水| 什么时候排卵| 彻底是什么意思| 脱口秀是什么意思| 可爱是什么意思| 梦见蛇什么意思| 果实属于什么器官| 人尽可夫是什么意思| 嘴唇麻木什么病兆| 藿香正气水有什么用| 为什么泡完脚后非常痒| 弊端是什么意思| 做梦梦到已故的亲人是什么意思| 在什么什么后面| 三个箭头朝下是什么牌子| emr是什么意思| 兔子拉稀是什么原因| 老人适合喝什么茶| 三天打鱼两天晒网什么意思| 白藜芦醇是什么东西| 结扎后需要注意什么| bug是什么意思中文翻译| 吃什么能让月经量增多| 什么补蛋白最快的食物| 高碎是什么茶| 医学ac是什么意思| 三叉戟是什么车| b像什么| 七月八日是什么日子| 又拉又吐吃什么药| 虐恋是什么意思啊| 蛇遇猪就得哭什么意思| 骨折吃什么水果| 吃灵芝孢子粉有什么好处| 高尿酸血症吃什么药| 排卵期同房后要注意什么| 口腔溃疡要吃什么药| 血小板低吃什么好| 大哥是什么生肖| 卫生局是什么单位| 安痛定又叫什么| 迪奥是什么| 网球ad是什么意思| 心得安是什么药| 眼睛白色部分叫什么| 10年是什么年| 牛头不对马嘴是什么意思| 3月21号是什么星座| 看见黑猫代表什么预兆| 脐橙什么意思| 口疮是什么原因引起的| 蚊子咬了为什么痒| 木字旁加差是什么字| 脚臭是什么原因引起的| 什么是大专| 煤气是什么味道| 没有润滑剂可以用什么代替| 预后是什么意思| 月经提前十几天是什么原因| 什么运动长高最快| 宝宝不爱喝水有什么好的办法吗| 艾叶泡水喝有什么功效| 有蛇进屋预兆着什么| 脚发胀是什么前兆| 妍字属于五行属什么| 苦涩是什么意思| 喝什么可以减肥瘦肚子| 格列卫是什么药| 红馆是什么地方| 什么是血压高| 什么马什么什么成语| 肝虚吃什么中成药| 肝红素高是什么原因| 玺是什么意思| 理疗师是做什么的| 前置胎盘是什么原因引起的| 胃癌手术后吃什么补品| 什么样的人死后还会出现| 红醋是什么醋| 脸上经常长痘痘是什么原因| 滋阴润燥是什么意思| 回民为什么不吃猪| 炒什么菜好吃又简单| 腱鞘炎要挂什么科| 左眉上方有痣代表什么| 痛风不能吃什么食物| 什么菜降血压效果最好| 舅子是什么意思| 静的部首是什么| 梦见掉牙齿是什么征兆| 人为什么会出汗| 朋字五行属什么| 水为什么是绿色的| 大男子主义是什么意思| 湿热内蕴吃什么中成药| 生性多疑是什么意思| onlycook是什么牌子| 我们是什么意思| 脑利钠肽前体值高预示什么| 查肾结石挂什么科| 脑残是什么意思| 夺命大乌苏是什么意思| 三伏天什么时候开始| 半身不遂是什么意思| 手脱皮用什么药好得快| 经常嘴苦是什么原因| 西洋参有什么作用和功效| 安徽什么阳| 祥林嫂是什么样的人| 住院送什么花好| 三个贝念什么| 错构瘤是什么病| 99年是什么年| 什么叫早泄| sansay是什么牌子| 匝道什么意思| 乙醇对人体有什么伤害| 缺维生素d有什么症状| 什么是人棉| cpa是什么| 肚子上长毛是什么原因| 狗尾巴草的花语是什么| 狄仁杰为什么才三品| 塞保妇康为什么会出血| 脑干出血是什么原因| 1902年属什么生肖| 中元节会开什么生肖| 梦见狗打架是什么意思| 毛泽东什么时候逝世| 夏天木瓜煲什么汤最好| 鳄鱼的尾巴有什么作用| 什么时候开始降温| 生菜什么时候种| 病理科是干什么的| 老是发烧是什么原因| 扭曲是什么意思| 卵巢囊性暗区是什么意思| 糖尿病患者适合吃什么水果| 双重人格是什么意思| 什么样的男人值得托付终身| 袖珍是什么意思| dan什么意思| 杨树林是什么牌子| est是什么意思| 身份证号最后一位代表什么| 黄鳝不能和什么一起吃| 肠胃炎吃什么药好| 舌苔黄腻吃什么中成药| 口大是什么字| 百度

绊倒是什么意思

原创 精选
Techplur
Human-computer conversation has been a part of our everyday lives for quite some time, and technologies like AI voice assistants and chatbots are widespread. In this article, we invited Mr. Zhou Li, V
百度 按照区域,这份清单将本市划分成了6类地区。

Human-computer conversation has been a part of our everyday lives for quite some time, and technologies like AI voice assistants and chatbots are widespread. In this article, we invited Mr. Zhou Li, Vice President of Technology at XiaoIce, to share his ideas about the technical design of the AI chatbot system and the application of this technology in the immersive virtual world.


Why does this AI-AI conversation matter?

Conversation between humans has been around for at least hundreds of thousands of years, while human-machine communication has been around for approximately 55 years, starting with the humble chatbot Eliza. We have witnessed a significant improvement in human-machine conversation in the last decade.

Despite this, only limited research has been conducted in academia or industry on how AI communicates with AI. At most, two chatbots are put together for quality testing to determine which one is more interactive. Could chatbots be used for purposes other than quality testing?

Although the industry has conducted much research on human-AI interactions, including many technological and relevant advancements, there are still three fundamental issues that need to be addressed.

First, does AI truly comprehend what humans are saying, and can its algorithms understand varied human expressions, including omitted and extralinguistic meanings? With the emergence and development of large language models, the significance of this issue appears to have diminished, or we have at least been able to tackle it to some extent.

Second, what else might we discuss? This is an issue for many users of artificial intelligence, whether it be a mobile voice assistant or a chatbot. After the AI responds to a question such as "how is the weather in Beijing?" "you may next inquire about the weather in Shanghai. Then, once all known cities have been queried, the dialogue is likely over. It is pretty challenging to have an open-hearted conversation with AI since the interaction between humans and machines generally follows this pattern, which is quite distinct from human-to-human conversation.

Third, may I keep silent? Even in human-to-human interactions, there are times when people are not willing to speak but choose to become listeners. Therefore, in traditional human-AI conversation design, either the person must be forced to continue talking or exit the conversational interface.

Each of these scenarios raises the question, "Why should I waste time interacting with an AI chatbot?", since users cannot see any benefit from artificial intelligence.

Since 2013, XiaoIce has made considerable efforts toward human-computer conversation, and the average frequency of discussions between users and XiaoIce has increased as advancing technologies have been implemented. In the team's view, more rounds of exchange are an obvious indication that humans and AI are communicating more effectively. If the conversation is not productive, it may end after two or three rounds. If the quality is high, it may be possible to conduct ten, twenty, or even thirty rounds.

However, we also see that it is difficult for users to talk to AI like humans. As technology advances, how many users will engage with AI like an actual human, sharing their opinions, experiences, and moods instead of just asking simple questions like weather conditions? While the percentage is increasing, the growth rate is not quite as rapid as in the past, which means that most people will be unable to break through this barrier during one-on-one conversations.

User research has found that high school and college students are more likely to break the threshold and are more accepting of novel things. Older people find it harder to engage with AI and talk to it. As part of the user survey, the team also tried to use a real person to chat with the user so that they believe it is still artificial intelligence. However, even with a real person, i.e., with almost flawless conversational abilities, the percentage didn't exceed 20%.

Is there any prospect of breaking this limitation? It is an area that Xiao Ice has been experimenting with for the last two years and is a relatively new field.

Several examples of real-life human communication can help illustrate why and how the ceiling exists.

Scenario 1: A group of strangers meets in a matchmaking session. With strangers and a clear purpose, the topics of conversation tend to be more utilitarian and limited. For example, "do you have a home and a car," "what is your job," and "how is your family"? There is no indication that these attendees of the matchmaking conference are directly hostile or in any way mean, but similar to the prior discussion of weather questions or knowledge questions with AI voice assistants, the entire interaction is limited.

Scenario 2: Former classmates who haven't seen each other for many years. Typically, gatherings like this start with school-related memories, then people can progress to real life, work, and other issues, even though they may not have seen each other in years. Memories are the key to completing the ice-breaking process.

So XiaoIce also tried to post WeChat Moments, using algorithms to simulate content like what it ate today and where it visited, in the hope of providing more topics for conversation. As part of the project, XiaoIce has also allowed people to share articles with AI to build a shared memory so that both parties can communicate better. Unfortunately, this remains a closed circle. As long as the user has not established a willingness to communicate with AI, neither the user's WeChat Moments nor the user's ability to share content actively. In the end, it would just be a waste of time to do so.

Scenario 3: An elderly gentleman is walking through a park. A recently retired older man walks through the park, where he sees people of all types playing chess, caring for children, and chatting, and he doesn't know anyone. He is just looking and listening. A few days later, he may find a topic he is more interested in, and he reaches out. As he spends more time in the park, he makes new friends, creates his community, and seamlessly integrates into the environment.

An interactive experience like this is a great way to break the ice between humans and AI. Immersive social environments, or metaverse, as they are called today, are analogous to an older man sitting on a bench. A new user may ask how they can find what interests them in such a social environment, provided that there are already a lot of dynamic interactions. This already existing environment was not necessarily or possibly built by users, but by a bunch of artificial intelligence.

In a world of immersive social media platforms, there ought to be endless artificial intelligence living alongside people. Therefore, today the focus is on exploring how multiple AIs can interact and converse with each other in a complex way.

Ultimately, combining the human community with the artificial intelligence community makes sense and shows what interesting results can be achieved. The goal is to develop a user-supported and AI-based immersive virtual social media experience.

"XiaoIceland" contains actual people and some artificial intelligence. Each AI will join forces with another randomly to chat about various topics. If you are interested in hearing their conversation, you can participate in it directly. Several people can also participate in the dialogue for more complex interactions.


The overall design of the AI conversation system

Developing an AI-AI conversation is essential as a first step in implementing this technology.

Before discussing the technical details, it is necessary to understand the distinction between the traditional human-computer conversation and the AI-AI conversation.

First, the diversity of conversational modes will expand. Typically, with conventional chatbots or voice assistants, the user speaks one line, and the AI responds with another. Human-to-human conversations, on the other hand, do not follow this pattern since 90 percent of the words could be spoken by one person, while the other person acts primarily as a listener.

A range of listeners exists, such as guiding listeners, who sometimes offer guidance to their confidants to help them express their feelings more effectively; questioning listeners, who may ask questions to obtain more information; critical listeners, who will provide some comments and guidance at the appropriate time; and hater listeners, who, as their name implies, simply make disagreements.

This illustrates that the conversation is significantly more sophisticated than the typical human-computer scenario. An AI-AI interaction provides more significant potential for developing more sophisticated interaction patterns, as you may manage both sides of the AI simultaneously since there is transparency between them.

As an alternative, the overall rhythm becomes extremely important in AI-AI conversations. Even though TTS synthesis technology is very mature now, if you extend the time to five minutes or even thirty minutes, you will still find that the machine synthesized voice will appear very artificial.

Human speech will undergo many changes, which is the same for AI; we must simulate these variations in speed and the length of pauses between sentences for it to feel natural over time.

Likewise, include more transitions and intonations such as "um, ah, I think" and other similar words. Traditional human-machine conversations consider these words nonsense since they are only needed when the brain cannot keep up with the verbal expression. But if we have two artificial intelligence items, both sides would need these tone words. By doing so, the whole conversation can sound more natural, which will help real users devote more time to listening.


Text generation for AI conversations

At present, XiaoIce's current practice consists of three methods.

The first step is to do scratching of structured documents from search engines. As an example, scratching the structured documents of a local tourism website allows us to determine the best places to eat, how traffic is, and so forth. Then use technologies such as BERT to connect these pieces and convert them into content.

The second is the news feed. As an unstructured text, news presents a greater challenge, as its various writing techniques make it more difficult to understand. Despite this, XiaoIce has collaborated with several media outlets over the past few years and provided numerous comments on news stories, which led to a considerable collection of real user comments daily. By utilizing this data, artificial intelligence can converse with each other. For instance, when rewriting a news summary, a machine-learning algorithm speaks out the news. A second algorithm extracts high-quality comments from previous news articles and inserts them where relevant. In this way, a single piece of an article becomes an interactive dialogue.

Lastly, we utilize GPT-3 to generate paragraphs. The GPT-3 software was found to be effective in terms of language fluency, but it tends to lack logic when writing longer texts. This problem is resolved by using a method that extracts a series of keywords. For example, let's consider the theme of cat urination and defecation in a structured document. We can extract keywords such as cat litter and potty and mix them regularly into the sequence generated by GPT. In this way, the entire process of GPT generation will work along with the logic of these keywords, and the generated content will be more logically arranged. Nevertheless, in general, we now consider the length of the generation process to be more appropriate at around 100 to 300 words; any longer will result in a variety of logical defects.

The three methods described above were developed by XiaoIce using some of its own more mature data. These snippets of conversation must also be converted into a longer AI-AI conversation that may include a variety of topics.

The generated snippets of the three types are put into a search engine.

As soon as the first snippet is done, the team will place its last sentence into a conversation engine and then use the engine to receive a reply. Following that, the team picks it up again using a different conversation engine, which is equivalent to generating content using two conflicting engines.

It is vital to highlight that standard conversation engines meant for human-machine interaction, such as voice assistants and chatbots, do not operate very well in such a scenario. This is because machine-human dialogues and machine-machine exchanges remain distinct. We must considerably tweak at least one of the engines to make the machine-to-machine communication more fluid and logical, without recurrence of subjects.

Each round of conversation needs to be tested. As a first step, its relevance, message validity, and topic consistency must be constrained. In most cases, there are two options: a high entropy decision to terminate the conversation or new relevant content that matches the original.

Whenever a new snippet is strongly correlated with the last sentence generated by both machine-machine conversation engines, the team will consider that the engines have performed their function successfully since they have successfully extended their snippets seamlessly into each other, which is an ideal situation.

It is also possible that these two engines have attempted to find an appropriate topic for a long time and failed to do so. As of this moment, we should determine whether this conversation between these two machines is valid. When the information entropy is sufficiently high, or if the answers are overwhelmed with "yes, huh", or if there is a lot of repetition of Q & A, the situation is considered high entropy. Therefore, the dialogue between the engines has been suspended, and a new topic must be assigned by force. There may be a new topic related to a current hot topic or something of interest to the user.

The change of topic may be more abrupt, but generally, the team believes that the two engines will not continue conflicting all the time because the quality of the conversation will deteriorate. We need to interject such snippets so that the conversation becomes more meaningful. Using this approach, one can turn a short snippet into a longer speech.


Speech synthesis and pacing control of AI conversations

How to make the text into audio you can hear directly?

First, the dialogue should reflect the appropriate persona according to its content. This includes whether it is male or female, serious or humorous, and all of these are relevant to the content we create.

Additionally, as previously discussed, rhythm control should be more random and natural. Depending on the content, for example, when there is a long paragraph, it may be necessary to speak faster. When two people have an uninterrupted conversation, the speed may become slower with longer pauses to make the conversation more attractive.

When better content is being conveyed, the speed of speech should be slowed down, and the volume increased so that a few highlights and points can be heard. All of these elements are brought together to facilitate a better understanding of the machine-machine dialogue.


Application scenarios of AI conversation in immersive virtual social platforms

In a world where AIs converse with AI, XiaoIceland has presented us with an immersive social experience. So, how significant is this exploration for the coming metaverse and our future lives?

First of all, in metaverse research, the visual impact is a primary focus, and headsets are almost regarded as essential tools. In the metaverse, there may be only meaning if you see the strange visual phenomena that do not exist in reality, but this is not necessarily true.

In one sense, wearing a headset for a long time is so painful that people cannot devote much time to soaking up a visual virtual world, even with advancing hardware. Meanwhile, hearing is believed to be a much lighter method of sensory reception for the metaverse. If users have access to rich auditory content, they can remain comfortable in the metaverse for longer.

Second, it seems that the future of immersive virtual social platforms will provide humans with a game-like experience and solutions to problems in real social networks.

In China, for example, there is a growing elderly population, and the elderly have a great need for the company of their children. Yet their children are very busy at work and do not have much time available for such activities. Suppose an elderly person's granddaughter learns a song in kindergarten. Even if she cannot visit her grandpa and sing it to him in the real world, a computer can use the same image and voice to perform the song for him. In the longer term, this is the greater value that the metaverse and AI will bring to human life, and it will continue to grow over time.

责任编辑:庞桂玉 来源: 51CTO
相关推荐

2025-08-06 15:13:11

metaverseart

2025-08-06 14:34:56

metaverseAIWeb 3

2025-08-06 19:50:34

MetaverseCPPCCNPC

2025-08-06 14:39:45

metaverseSenseTimeAI

2025-08-06 12:54:58

2025-08-06 14:43:58

metaverse

2025-08-06 16:18:19

JavaOpenJDKDevelopmen

2025-08-06 11:22:07

open sourc

2025-08-06 22:45:36

gamesmetaverseAR

2025-08-06 19:41:09

NFTMetaverse

2025-08-06 08:45:47

metaverseblockchain

2025-08-06 10:56:05

open sourcApache PulStreamNati

2025-08-06 20:14:27

Zhou Hongycareerprogrammer

2025-08-06 15:11:46

Web 3.0元宇宙Metaverse

2025-08-06 08:12:53

2025-08-06 11:49:51

metaverseMicrosoft

2025-08-06 16:53:23

NVIDIA

2025-08-06 00:08:48

MetaverseWeb3加密货币

2025-08-06 10:08:49

Future Resp

2025-08-06 13:19:25

Future Resp
点赞
收藏

51CTO技术栈公众号

玻璃的原材料是什么 皮脂腺痣是什么原因引起的 hpmc是什么 9月10日什么星座 前戏是什么意思
拿东西手抖是什么原因 吃什么不会长胖 耳鸣吃什么药最好 鼻涕有血丝是什么原因 嘈杂的意思是什么
滚刀肉是什么意思 柔软的什么 2017年属鸡火命缺什么 做腹腔镜手术后需要注意什么 桃李满天下什么意思
石男是什么意思 blissful是什么意思 一什么春笋 尿道感染吃什么药最好 生吃大蒜有什么好处
动物都有什么naasee.com 为什么水不能燃烧hcv9jop1ns7r.cn 晚上吃什么菜hcv7jop9ns7r.cn heineken是什么啤酒hcv8jop9ns6r.cn 马斯卡彭是什么hcv9jop2ns6r.cn
岂是什么意思wzqsfys.com 牙根变黑是什么原因hcv7jop6ns7r.cn 爷爷的兄弟叫什么hcv8jop6ns5r.cn 试管婴儿是什么hcv8jop6ns8r.cn 女性漏尿是什么原因hcv8jop4ns1r.cn
退休工资什么时候补发hcv8jop0ns1r.cn 绝望是什么意思hcv8jop6ns0r.cn 焦虑症吃什么药好得快hcv9jop4ns6r.cn 白头发吃什么能变黑hcv9jop1ns0r.cn 什么是普拉提hcv8jop3ns5r.cn
啐了一口是什么意思hcv9jop2ns3r.cn 早泄什么意思hcv8jop3ns8r.cn 足踝外科主要看什么hcv9jop2ns4r.cn 斋醮是什么意思hcv7jop7ns4r.cn 急性肠胃炎吃什么药好hcv9jop2ns9r.cn
百度