Recent advancements in voice AI, led by companies like Google, OpenAI, and Microsoft, showcase potential yet reveal key challenges, including interruptions and lack of personalisation.
In a significant advancement in artificial intelligence technology, the ability to engage in voice conversations with AI chatbots is becoming increasingly prevalent. This development comes as prominent technology companies, OpenAI, Google, and Microsoft, have expanded their offerings in the voice-controlled AI space.
Google has made a significant leap forward by launching Gemini Live, a voice AI feature that is now available free of charge to all Android users. Gemini Live has positioned itself as a market leader by being the first to provide this functionality broadly, capitalising on its ability to interact through voice prompts. Meanwhile, OpenAI’s ChatGPT Advanced Voice mode, while groundbreaking, offers only a limited amount of free voice interaction per month. Microsoft has revamped its Copilot, making it freely available to users with voice interaction capabilities.
The introduction of voice-activated chatbots fulfills a long-held science fiction vision of conversing with computers, echoing the dynamic exchanges depicted in shows like Star Trek. However, despite the significant technological strides, these AI systems are not without their challenges, as discovered through a series of tests conducted over two weeks.
One of the main issues with these systems is their interruption capabilities. Ideal in theory, the feature allows users to intercept ongoing responses from the AI for more dynamic exchanges. In practice, however, these interruptions can be ineffective or misinterpreted by the AI, resulting in disjointed conversations. Users often find themselves exasperated, repeatedly reminding the AI to cease speaking to regain conversational control.
There is also a noticeable gap in localised information available through these voice AI companions. Of the current options, only Google’s Gemini Live can provide local recommendations, such as for dining establishments. This feature remains missing from ChatGPT Advanced Voice mode and Copilot, which are yet to facilitate web browsing capabilities. This deficiency limits the systems’ practical application regarding locally relevant information.
Moreover, the issue of personalisation surfaces as another area where voice AIs like ChatGPT Advanced Voice mode, Gemini Live, and Copilot fall short. These AIs currently lack the capability to integrate fully with personal applications, such as calendars and email. As a result, users cannot rely on these systems for personalised schedule management or reminders, hindering their potential as comprehensive virtual assistants.
Despite these limitations, voice AIs are proving effective in specific scenarios. They are well-suited for engaging users in research topics and generating ideas. Particularly in specialised areas such as Brazilian Jiu-Jitsu, the chatbots can provide detailed discussions, showing their extensive knowledge database. During testing, Microsoft’s Copilot was noted for providing reliable answers, while Gemini occasionally produced less accurate responses.
In terms of user interface, ChatGPT Advanced Voice mode has been commended for its interactive design, featuring a responsive swirling orb that visually indicates active listening. In contrast, Gemini Live’s interface, with its predominantly dark screen, lacks a visually engaging element, detracting from the user experience.
While these voice AI systems have come a long way, they still appear somewhat incomplete and require further integration with everyday smartphone functionalities. Apple, which currently lags behind with its Siri and the anticipated Apple Intelligence, is yet to join the voice AI market fully, a development expected next year.
In conclusion, while current voice AI technology offers exciting possibilities, the journey towards achieving an AI that mimics human conversation as seamlessly as a true virtual assistant continues. For now, the promise remains enticing, though the technological horizon still presents room for growth and enhancement.
Source: Noah Wire Services