We look forward to presenting Transform 2022 in person again on July 19 and virtually from July 20 to 28. Join us for insightful conversations and exciting networking opportunities. Register today!
The road to “generalizable intelligence”—what many consider sci-fi—starts with ambient intelligence. And that future is unfolding now.
“We live in the golden realm of AI, where dreams and sci-fi become reality,” said Rohit Prasad, senior vice president and chief scientist for Alexa at Amazon.
Prasad spoke today at re:MARS, the Amazon conference on machine learning, automation, robotics and space, on the evolution of ambient intelligence into generalizable intelligence (GI).
Ambient intelligence, Prasad said, is when the underlying AI is readily available, helping people when they need it — and also learning to anticipate needs — and then taking a backseat when it’s not needed.
A prime example and a significant step toward GI, Prasad said, is Amazon’s Alexa, which he described as a “personal assistant, advisor, companion.”
The virtual assistant is equipped with 30 ML systems that process various sensory signals, he explained. It receives more than 1 billion requests per week in 17 languages across dozens of countries. It will also be flown to the moon as part of the unmanned Artemis-1 mission, due to launch in August.
A future Alexa feature will be able to synthesize short audio clips into longer speech. Prasad gave the example of a deceased grandmother reading a bedtime story to a grandson.
“This required inventions where we had to learn to produce a high-quality voice in less than a minute of recording instead of hours of recording,” he said. He added that the point is to define the problem “as a language conversion task and not as a language production path,” he said.
Ambient Intelligence reactive, proactive, anticipatory
As Prasad explained, ambient intelligence is both reactive (responding to direct requests) and proactive (anticipating needs). This is achieved through the use of numerous sensor technologies: image, sound, ultrasonic, depth, mechanical and atmospheric sensors. These are then processed.
All in all, this skill requires deep learning skills as well as natural language processing (NLP). Ambient intelligence “agents” are also self-monitoring and self-learning, allowing them to generalize what they learn and apply it to new contexts.
Alexa’s self-learning mechanism, for example, automatically corrects tens of millions of errors a week, he said – both customer errors and errors in his own NLU (Natural Language Understanding) models.
He described this as the “most practical” route to GI, or the ability for AI entities to understand and learn any intellectual task humans can.
Ultimately, therefore, “the path from ambient intelligence leads to generalized intelligence,” Prasad said.
What do GI agents actually do?
Generalizable intelligence has three attributes. GI “agents” can multitask, adapt to changing environments, and learn new concepts and actions with minimal external human input.
GI also requires a significant dose of common sense. Alexa already shows this, he said: For example, if a user asks to set a reminder for the Super Bowl, it will identify the date of the big game, convert it to their time zone at the same time, and then remind them before it begins. It also suggests routines and detects anomalies through its “induction” feature.
Still, he emphasized, GI is not an “all-knowing, all-capable” technology that can do any job.
“We humans are still the prime example of generalization,” he said, “and the standard that AI should aspire to.”
GI is already being realized, he pointed out: Basic transformer-based large language models, trained with self-monitoring, enable many tasks with far less manually labeled data than ever before. An example of this is Amazon’s Alexa Teacher Model, which collects knowledge from NLU, speech recognition, dialogue prediction, and visual scene understanding.
The goal is to take automated thinking to new heights, with the first goal being the “ubiquitous use” of common sense in conversational AI, he said.
To work toward this, Amazon has released a common sense dataset of more than 11,000 newly collected dialogues to aid research into open domain conversations.
The company also invented a generative approach, which it understands as “think before you speak.” In doing so, the AI agent learns to externalize implicit commonsense knowledge (“thinking”) and to use a large language model (such as the freely available semantic network ConceptNet) in combination with a commonsense knowledge graph. This knowledge is then used to generate (“speak”) responses.
Amazon also trains Alexa to answer complex queries that require multiple inference steps, and also enables “conversational exploration” on nearby devices so users don’t have to pull out their phones or laptops to explore the web.
Prasad said that this capability required deep learning to predict dialogue flow; Web scale neural information retrieval; and automated summarization that can distill information from multiple sources.
The Alexa Conversations dialog manager helps Alexa decide what actions to take based on interaction, dialog history, current inputs and queries, query-driven, and self-awareness mechanisms. Neural information retrieval retrieves information from different modalities and languages based on billions of data points. Transformer-based models—trained with a multi-level paradigm optimized for different data sources—help to match search queries semantically with relevant information. Deep learning models distill information for users while capturing important information.
Prasad described the technology as multitasking, multilingual and multimodal, allowing for “more natural, human-like conversations”.
The ultimate goal is to make AI not only useful for customers in their daily lives, but also easy to use. It’s intuitive, they want to use it and even rely on it. It’s the AI that thinks before it speaks, is equipped with healthy knowledge graphs, and can generate answers through explainability—in other words, has the ability to process questions and answers that aren’t always easy.
Ultimately, GI is becoming more viable by the day as “AI can generalize better than before,” Prasad said.
For retail, the AI is learning to just walk away
Amazon is also using ML and AI to “reinvent” physical retail through features like futuristic palm scanning and smart shopping carts in its Amazon Go stores. This enables the “just walk out” capability, explained Dilip Kumar, vice president of physical retail and technology.
The company opened the first of its physical stores in January 2018. These have grown from 1,800 square feet convenience-style to 40,000 square feet grocery-style, Kumar said. The company developed this further with its dash cart in summer 2020 and with Amazon One in autumn 2020.
Advanced computer vision capabilities and machine learning algorithms allow people to scan their palms when entering a store, pick up items, place them in their shopping cart, and then walk out.
Palm scanning was chosen because the gesture needed to be intentional and intuitive, Kumar explained. Palms are linked to the customer’s credit or debit card information, and accuracy is achieved in part through underground images of vein information.
This allows for accuracy on “an order of magnitude greater than what facial recognition can achieve,” Kumar said.
Carts, on the other hand, are equipped with weight sensors that identify specific items and the number of items. Advanced algorithms can also handle the increased complexity of “pick and returns” — or when a customer changes their mind about an item — and eliminate ambient noise.
These algorithms run on-premises in the store, in the cloud, and at the edge, Kumar explained. “We can mix and match depending on the environment,” he said.
The goal, Kumar says, is “to completely take this technology into the background,” so customers can focus on shopping. “We’ve hidden all this complexity from customers,” he said, so they can “immerse themselves in their shopping experience, their mission.”
Similarly, the company opened its first Amazon Style store in May 2022. Upon entering the store, customers can scan items on the floor, which are automatically sent to fitting rooms or collection desks. You will also be offered suggestions for additional purchases.
Ultimately, Kumar said, “We’re still at the very beginning of our exploration, we’re pushing the boundaries of ML. We have a lot of innovations ahead of us.”
VentureBeat’s mission is intended to be a digital marketplace for technical decision makers to acquire knowledge about transformative enterprise technology and to conduct transactions. Learn more about membership.