From the course: AI Show: Building NLP models with Azure ML AutoML

Introduction to AutoML for NLP - Azure Tutorial

From the course: AI Show: Building NLP models with Azure ML AutoML

Introduction to AutoML for NLP

You're not going to want to miss this episode of the AI Show. We're going to talk about AutoML in Azure ML, specifically building NLP models. Make sure you tune in. Hello and welcome to this episode of the AI Show. We're talking all about building NLP models with Azure ML, AutoML. There's a lot of A, A. Why don't you do some introductions here. We'll start with you Wenxin. Why don't you tell us who you are and what you do? Yeah, Hi everyone. My name is Wenxin and I'm a PM in Azure Machine Learning. I'm driving the AutoML NLP feature and I'm happy to share with everyone the new NLP offerings in AutoML. Fantastic. Lifeng? Yeah, my name is Lifeng and come from Qingdao. I work as a data scientist, so I'd be the technical details and pick up the pipelines for this NLP models. This is awesome. So let's start first with setting the ground. Everyone can understand from the ground floor what is AutoML for NLP? So I'm going to share some background. So for AutoML NLP, we actually provide end to end NLP training using the latest pre-trained large models from BERT, and we provide three different scenarios. The first one is for multi-class text classification, where you can classify your text data to be one category, and the second one will be the multi-label text classification, where you can assign multiple labels to a specific text sample. And the third scenario we are offering is for the name entity recognition. And for this task you can actually extract domain specific entities from unstructured tasks such as contracts or financial documents. And with all these three scenarios, we also support multi-lingual and you can benefit from over 100 languages. We actually support 104 languages for all these scenarios. That's amazing. Let me start -- let me just make sure I'm catching this. So, classification for text labeling, multi labels on text as well as any are named entity recognition. Am I getting those right? Yeah. And 104 languages to boot. Yeah. That's awesome. So you were saying, I'm sorry to interrupt you. Yeah, I'm going to continue like in order for you to accelerate your training, you can also benefit from the distributed training we support. You can use Multi-GPU or Multi-node to run the NLP models, which will largely reduce the training time needed. And since this is within Azure ML, you can also benefit from all the goodness of AutoML Python SDK. You can do training and inferencing in the same time, and there is also a seamless integration with the Azure ML data labeling service. So you can either bring your own text data and do it using the data labeling service to label your data for your training use, or you can directly bring your already labeled data and directly use AutoML NLP for training. And you can also continue your training and connect this to production with the MLOPs capability with Azure ML. This is -- this is all cool stuff. I love the idea of being able to do this automatically, but one of the things that I wanted to ask about is how is -- how is it able to do multiple languages, Lifeng? Yeah -- Yeah, I think I can come to the details. So there are many models that are pre-trained with different language sources, like for English, for German, for Japanese, for Chinese, et cetera. So we do some experiments on different kind of data sets and then find, you know, compare their speed and the final accuracy and other metrics to decide, okay, what are the models that are, like, the best for different scenarios? Then we, like, store these models and make them available for downloads in our system. And later when you're using them, you can just pass your language code and we will choose the right model for you depending on the language code you're passing. So we do every -- all the detail for you and you only need to tell us what kind of language do you want to use. And also you can specify not only a single language, you can also just tell us, okay, this is a multilingual dataset, which means it contains not only one single language and we also deal with that. That's -- that's really cool. It sounds awesome, but what's the user experience look like? We'll go to you Wenxin.

Contents