A powerful, affordable machine learning infrastructure accelerates innovation in the cloud


MIT Technology Review

Artificial intelligence and machine learning (AI and ML) are key technologies that help companies find new ways to increase their sales, reduce costs, streamline business processes and better understand their customers. AWS helps customers accelerate their AI / ML adoption by providing high-performance compute, high-speed networking, and scalable, high-performance storage options as needed for any machine learning project. This lowers the barrier to entry for companies looking to use the cloud to scale their ML applications.

Developers and data scientists are pushing the boundaries of technology and increasingly relying on deep learning, a type of machine learning based on neural network algorithms. These deep learning models are larger and more sophisticated, increasing the cost of running the underlying infrastructure to train and deploy these models.

To enable customers to accelerate their AI / ML transformation, AWS builds powerful and affordable machine learning chips. AWS Inferentia is the first machine learning chip designed by AWS from the ground up for the most cost-effective machine learning inference in the cloud. In fact, Inferentia’s Amazon EC2 Inf1 instances offer 2.3 times the performance and cost of machine learning inference up to 70% lower than the current generation of GPU-based EC2 instances. AWS Trainium is the second machine learning chip from AWS specially developed for training deep learning models and will be available in late 2021.

Customers from all industries have deployed their ML applications in production on Inferentia and have seen significant performance improvements and cost savings. AirBnB’s customer support platform, for example, enables intelligent, scalable and extraordinary service experiences for its community of millions of hosts and guests around the world. It used Inferentia-based EC2 Inf1 instances to provide natural language processing (NLP) models that supported its chatbots. This resulted in a 2-fold improvement in performance compared to GPU-based instances.

With these innovations in silicon, AWS enables its customers to easily train and execute their deep learning models in production with high performance and throughput at significantly lower costs.

Machine learning poses challenges to the speed of the transition to a cloud-based infrastructure

Machine learning is an iterative process that requires teams to quickly build, train, and deploy applications, as well as train, retrain, and experiment frequently to increase the predictive accuracy of the models. When deploying trained models in their business applications, organizations must also scale their applications to serve new users around the world. You need to be able to service multiple simultaneous requests with near real-time latency to ensure a superior user experience.

New use cases such as object recognition, natural language processing (NLP), image classification, conversational AI and time series data are based on deep learning technology. Deep learning models are growing exponentially in size and complexity, from millions of parameters to billions within a few years.

Training and deploying these complex and sophisticated models results in significant infrastructure costs. Costs can quickly become prohibitive as companies scale their applications to deliver near real-time experiences to their users and customers.

This is where cloud-based machine learning infrastructure services can help. The cloud provides on-demand access to compute, high-performance networks and large data stores that seamlessly combine with ML operations and higher-level AI services so organizations can get started right away and scale their AI / ML initiatives.

How AWS Helps Customers Accelerate Their AI / ML Transformation

AWS Inferentia and AWS Trainium aim to democratize machine learning and make it accessible to developers regardless of experience and company size. Inferentia’s design is optimized for high performance, throughput, and low latency, making it ideal for deploying ML inference on a large scale.

Each AWS Inferentia chip contains four NeuronCores that implement a powerful systolic array matrix multiply engine that massively speeds up typical deep learning operations such as convolution and transformers. NeuronCores are also equipped with a large on-chip cache that helps reduce external memory access, reduce latency and increase throughput.

AWS Neuron, the software development kit for Inferentia, natively supports leading ML frameworks like TensorFlow and PyTorch. Developers can continue to use the same lifecycle development frameworks and tools they know and love. Many of their trained models can be compiled and deployed on Inferentia by changing a single line of code without additional changes to the application code.

The result is high-performance inference delivery that scales easily while keeping costs under control.

Sprinklr, a software-as-a-service company, has an AI-powered unified platform for customer experience management that enables companies to collect real-time customer feedback across multiple channels and translate it into actionable insights. This translates into more proactive problem solving, improved product development, improved content marketing, and better customer service. Sprinklr used Inferentia to deploy its NLP and some of its computer vision models and saw significant performance improvements.

Several Amazon services also provide their machine learning models on Inferentia.

Amazon Prime Video uses Computer Vision ML models to analyze the video quality of live events to ensure the best viewing experience for Prime Video members. It used its ML models to classify images on EC2 Inf1 instances and achieved a four-fold increase in performance and up to 40% cost savings compared to GPU-based instances.

Another example is the AI- and ML-based intelligence of Amazon Alexa, powered by Amazon Web Services and available on more than 100 million devices today. Alexa’s promise to customers is that it will keep getting smarter, more talkative, more proactive and more enjoyable. To keep this promise, machine learning response times and infrastructure costs must be continuously improved. Deploying Alexa’s Text-to-Speech ML models on Inf1 instances has reduced inference latency by 25% and cost per inference by 30% to improve the service experience for millions of customers who use Alexa each month use.

Unleashing new machine learning functions in the cloud

While companies struggle to future proof their business by offering the best in digital products and services, no company can fall short in providing sophisticated machine learning models to renew their customer experiences. In recent years, the applicability of machine learning has grown tremendously for a variety of use cases, from personalization and churn prediction to fraud detection and supply chain forecasting.

Fortunately, the machine learning infrastructure in the cloud is unleashing new features that weren’t possible before, making them far more accessible to laypeople. That’s why AWS customers are already using Inferentia-powered Amazon EC2 Inf1 instances to provide the information behind their recommendation engines and chatbots, and to generate actionable insights from customer feedback.

With AWS cloud-based machine learning infrastructure options suitable for different skill levels, it is clear that any business can accelerate innovation and cover the entire machine learning lifecycle on a large scale. As machine learning becomes more prevalent, organizations are now able to fundamentally transform the customer experience – and the way they do business – with low-cost, high-performance, cloud-based machine learning infrastructure.

Learn more about how the AWS machine learning platform can help your business innovate here.

This content was created by AWS. It was not written by the editorial staff of the MIT Technology Review.

Source Link

Leave a Reply