Efficient Text Classification
Fast, Flexible Transformer-Based Classification with GLiClass & FastAPI
Introduction
Text classification is a critical component in many modern applications, from hate speech detection, topic classification, sentiment analysis, to reranking in RAG pipelines.
The GLiClass architecture (Knowledgator) is a flexible transformer-based sequence classification model that can be fine-tuned for any domain. It provides the complex semantic understanding of large language models while being significantly smaller and more memory efficient.
[
[
{
"label": "Spanish",
"score": 0.7295899391174316
}
]
]
Its strengths are in its light weight, in terms of parameter count (small relative to LLMs), and flexibility as labels can be hotswapped at inference time.
But how do you integrate GLiClass into a scalable, efficient system? In this guide, we’ll walk through deploying GLiClass using a FastAPI microservices architecture, complete with Celery for task management, Redis for caching, and monitoring and load testing with Flower and Locust.
FastAPI
FastAPI acts as the REST gateway interface to push tasks to the celery workers.
We can call the /predict
endpoint with our batch of text inputs — in this case just one — along with the labels (entities we are trying to detect).
curl -X 'POST' \
'http://localhost:8080/predict' \
-H 'accept: application/json' \
-H 'Content-Type: application/json' \
-d '{
"inputs": [
"Hablo español, ¿cuánto cuesta el vino tinto?"
],
"labels": [
"English",
"Spanish",
"French",
"German",
"Italian",
"Portuguese",
"Russian",
"Turkish",
"Vietnamese"
],
"classification_type": "single-label"
}'
{
"task_id": "7869541c-ccc0-48df-b0fd-6218d0329eea",
"status": "Processing"
}
Then we can call the /result
endpoint with the task ID (in this implementation, starting the task is decoupled from waiting for its result).
curl -X 'GET' \
'http://localhost:8080/result/7869541c-ccc0-48df-b0fd-6218d0329eea' \
-H 'accept: application/json'
Which returns the predicted label and a score (related to the model’s confidence in it’s result).
[
[
{
"label": "Spanish",
"score": 0.7295899391174316
}
]
]
Celery/Flower
Flower acts as the cluster manager and issues tasks to the workers in the Celery cluster. We can lookup past tasks (or check the progress of currently running tasks).
Locust
Also included in the repo is a locustfile for load testing with locust. In the below example, you can see that we can handle concurrent requests and scale the number of workers to handle higher traffic loads.
Streamlit
Finally, I’ve also included a streamlit application for experimenting with the model’s predictions.
For the code, please see the repository here:
Conclusion
Thanks! I hope you learned something. This was a fun project. Machine learning operations (MLOps) is one of my favorite subjects.
Please follow me for more on software engineering, data science, machine learning, and artificial intelligence.
If you enjoyed reading, please consider supporting my work.
I am also available for 1:1 mentoring & data science project review.
Thanks! 👋🏻
References
- GLiNER — ArXiv
- GLiClass — GitHub
- GLiClass Models — Hugging Face
- knowledgator/gliclass-small-v1.0 — Hugging Face
- EmergentMethods/gliner_medium_news-v2.1 — Hugging Face
In Plain English 🚀
Thank you for being a part of the In Plain English community! Before you go:
- Be sure to clap and follow the writer ️👏️️
- Follow us: X | LinkedIn | YouTube | Discord | Newsletter
- Visit our other platforms: CoFeed | Differ
- More content at PlainEnglish.io