Using LLMs for Automated Request Classification

Explore top LinkedIn content from expert professionals.

Summary

Using large language models (LLMs) for automated request classification means harnessing advanced AI to quickly sort and understand incoming queries, identifying their intent and routing them to the right process or agent. This approach streamlines everything from customer support to cybersecurity by combining reasoning capabilities and efficient workflows for smarter, faster results.

  • Build smart workflows: Set up a hybrid system where a lightweight model handles simple classifications and only sends tougher cases to an LLM, saving time and computing resources.
  • Guide step-by-step reasoning: Use prompts that encourage the model to outline its logic before giving a final answer, making the AI's decision process clearer and easier to validate.
  • Streamline information routing: Deploy an LLM router that analyzes requests, categorizes them, and directs each one to the most suitable model or workflow for efficient handling.
Summarized by AI based on LinkedIn member posts
  • View profile for Faizan J.

    Data Science & AI/ML for Healthcare, E-commerce/Retail, HRTech

    7,185 followers

    Intent detection enables search and chat systems to understand and respond to user queries accurately. E.g. in e-commerce, a query like 𝗜 𝘄𝗮𝗻𝘁 𝘁𝗼 𝗿𝗲𝘁𝘂𝗿𝗻 𝗮 𝗧𝗩 𝗜 𝗯𝗼𝘂𝗴𝗵𝘁 𝗹𝗮𝘀𝘁 𝘄𝗲𝗲𝗸 would be a 𝗥𝗲𝘁𝘂𝗿𝗻 𝗿𝗲𝗾𝘂𝗲𝘀𝘁 intent. Correct intent detection can guide users with specific instructions on processing returns. Intent systems often use supervised classification or similarity-based models, such as sentence transformers (SetFit). The paper 𝗜𝗻𝘁𝗲𝗻𝘁 𝗗𝗲𝘁𝗲𝗰𝘁𝗶𝗼𝗻 𝗶𝗻 𝘁𝗵𝗲 𝗔𝗴𝗲 𝗼𝗳 𝗟𝗟𝗠𝘀 explores using large language models (LLMs) for intent detection, employing 𝗶𝗻-𝗰𝗼𝗻𝘁𝗲𝘅𝘁 𝗹𝗲𝗮𝗿𝗻𝗶𝗻𝗴 (𝗜𝗖𝗟) and 𝗖𝗵𝗮𝗶𝗻-𝗼𝗳-𝗧𝗵𝗼𝘂𝗴𝗵𝘁 (𝗖𝗢𝗧) 𝗽𝗿𝗼𝗺𝗽𝘁𝗶𝗻𝗴. In ICL a pre-trained LLM can be bootstrapped to solve specific tasks by observing some examples. E.g., ask an LLM to classify some text + provide examples of categories for the LLM to learn on the fly: "𝗖𝗹𝗮𝘀𝘀𝗶��𝘆 𝘁𝗵𝗶𝘀 𝘁𝗲𝘅𝘁: Patient has sore throat and difficulty breathing. 𝗜𝘀 𝗶𝘁 (𝗮) respiratory, (𝗯) cardiovascular, 𝗼𝗿 (𝗰) gastrointestinal? Examples: Respiratory: "The patient has a persistent cough and shortness of breath." (..etc)." In COT prompting, LLMs are provided with step-by-step reasoning to enhance their reasoning capabilities. E.g., 𝗣𝗿𝗼𝗺𝗽𝘁: The customer wants a book that’s a mystery and good review. Let’s think step by step: Step 1: Book should be in the mystery genre. Step 2: Book should have high ratings and positive reviews. 𝗖𝗼𝗻𝗰𝗹𝘂𝘀𝗶𝗼𝗻: Based on these criteria I recommend “The Silent Patient” which is a popular mystery novel with excellent reviews. 𝗧𝗵𝗲 𝗟𝗟𝗠 𝗶𝗻𝘁𝗲𝗻𝘁 𝘀𝘆𝘀𝘁𝗲𝗺: 1.      𝗢𝗳𝗳𝗹𝗶𝗻𝗲 : from the training data (intent, query 1..n), store SetFit embeddings of queries in a vector DB, and create intent descriptions using an LLM. 2.      𝗜𝗻𝗳𝗲𝗿𝗲𝗻𝗰𝗲: for a query, retrieve “k” most similar queries, and create a COT prompt using the queries and intent descriptions. Experiments with the Claude and Mistral LLM families show better performance than the SetFit models. However, the high compute and latency costs of LLMs make them challenging to use at scale. Hence the approach uses a hybrid system that only routes queries to the LLM intent system if the SetFit model is uncertain about a query. This hybrid architecture balances performance and cost. Link: https://lnkd.in/e7KgPZjU

  • View profile for Marie Stephen Leo

    Data & AI Director | Scaled customer facing Agentic AI @ Sephora | AI Coding | RecSys | NLP | CV | MLOps | LLMOps | GCP | AWS

    15,996 followers

    Few-shot Text Classification predicts the label of a given text after training with just a handful of labeled data. It's a powerful technique for overcoming real-world situations with scarce labeled data. SetFit is a fast, accurate few-shot NLP classification model perfect for intent detection in GenAI chatbots. In the pre-ChatGPT era, Intent Detection was an essential aspect of chatbots like Dialogflow. Chatbots would only respond to intents or topics that the developers explicitly programmed, ensuring they would stick closely to their intended use and prevent prompt injections. OpenAI's ChatGPT changed that with its incredible reasoning abilities, which allowed an LLM to decide how to answer users' questions on various topics without explicitly programming a flow for handling each topic. You just "prompt" the LLM on which topics to respond to and which to decline and let the LLM decide. However, numerous examples in the post-ChatGPT era have repeatedly shown how finicky a pure "prompt" based approach is. In my journey working with LLMs over the past year+, one of the most reliable methods I've found to restrict LLMs to a desired domain is to follow a 2-step approach that I've spoken about in the past: https://lnkd.in/g6cvAW-T 1. Preprocessing guardrail: An LLM call and heuristical rules to decide if the user's input is from an allowed topic. 2. LLM call: The chatbot logic, such as Retrieval Augmented Generation. The downside of this approach is the significant latency added by the additional LLM call in step 1. The solution is simple: replace the LLM call with a lightweight model that detects if the user's input is from an allowed topic. In other words, good old Intent Detection! With SetFit, you can build a highly accurate multi-label text classifier with as few as 10-15 examples per topic, making it an excellent choice for label-scarce intent detection problems. Following the documentation from the links below, I could train a SetFit model in seconds and have an inference time of <50ms on the CPU! If you're using an LLM as a few- or zero-shot classifier, I recommend checking out SetFit instead! 📝 SetFit Paper: https://lnkd.in/gy88XD3b 🌟 SetFit Github: https://lnkd.in/gC8br-EJ 🤗 SetFit Few Shot Learning Blog on Huggingface: https://lnkd.in/gaab_tvJ 🤗 SetFit Multi-Label Classification: https://lnkd.in/gz9mw4ey 🗣️ Intents in DialogFlow: https://lnkd.in/ggNbzxH6 Follow me for more tips on building successful ML and LLM products! Medium: https://lnkd.in/g2jAJn5 X: https://lnkd.in/g_JbKEkM #generativeai #llm #nlp #artificialintelligence #mlops #llmops

  • View profile for Nina Fernanda Durán

    AI Architect · Ship AI to production, here’s how

    58,510 followers

    Stop obsessing over which LLM is better. It does not matter if your architecture is weak. A junior dev optimizes prompts. A senior dev optimizes flow control. If you want to move from "demo" to "production", you need to master these 4 agentic patterns: 𝟭. 𝗖𝗵𝗮𝗶𝗻 𝗼𝗳 𝗧𝗵𝗼𝘂𝗴𝗵𝘁 (𝗖𝗼𝗧) This is your debugging layer for logic. Standard models fail at complex math or reasoning because they predict the answer token immediately. 𝗧𝗵𝗲 𝗜𝗺𝗽𝗹𝗲𝗺𝗲𝗻𝘁𝗮𝘁𝗶𝗼𝗻 Do not just ask for the result. In your System Prompt, explicitly instruct the model to "think step-by-step" or output its reasoning inside specific XML tags (e.g., <reasoning>...</reasoning>) before the final answer. You can parse and validate the reasoning steps programmatically before showing the final result to the user. 𝟮. 𝗥𝗔𝗚 (𝗥𝗲𝘁𝗿𝗶𝗲𝘃𝗮𝗹-𝗔𝘂𝗴𝗺𝗲𝗻𝘁𝗲𝗱 𝗚𝗲𝗻𝗲𝗿𝗮𝘁𝗶𝗼𝗻) This is your dynamic context injection. The context window is finite; your data is not. 𝗧𝗵𝗲 𝗜𝗺𝗽𝗹𝗲𝗺𝗲𝗻𝘁𝗮𝘁𝗶𝗼𝗻 ◼️ Ingest: Chunk your documents and store them as vector embeddings (using Pinecone, Milvus, or pgvector). ◼️ Retrieve: On user query, perform a cosine similarity search to find the top-k chunks. ◼️ Inject: Concatenate these chunks into the context string of your prompt before sending the request to the LLM. 𝟯. 𝗥𝗲𝗔𝗰𝘁 (𝗥𝗲𝗮𝘀𝗼𝗻 + 𝗔𝗰𝘁 𝗟𝗼𝗼𝗽) This is how you break out of the text box. It turns the LLM into a controller for your own functions. 𝗧𝗵𝗲 𝗜𝗺𝗽𝗹𝗲𝗺𝗲𝗻𝘁𝗮𝘁𝗶𝗼𝗻 You need a while loop in your code: 1. Call the LLM with a list of defined tools (JSON Schema). 2. Check if the finish_reason is tool_calls. 3. Execute: Run the requested function locally (e.g., fetch_weather(city)). 4. Observe: Append the function's return value to the message history. 5. Loop: Send the history back to the LLM to generate the final natural language response. 𝟰. 𝗥𝗼𝘂𝘁𝗲𝗿 (𝗧𝗵𝗲 𝗖𝗹𝗮𝘀𝘀𝗶𝗳𝗶𝗲𝗿) This is your switch statement powered by semantic understanding. Using a massive model for every trivial task is inefficient and slow. 𝗧𝗵𝗲 𝗜𝗺𝗽𝗹𝗲𝗺𝗲𝗻𝘁𝗮𝘁𝗶𝗼𝗻 Use a lightweight, fast model (like GPT-4o-mini or a local Llama 3 8B) as the entry point. Its only job is to classify the user intent into a category ("Coding", "General Chat", "Database Query"). Based on this classification, your code routes the request to the appropriate specialized prompt or agent. - - - - - - - - - - - - - - - 𖤂 Save this post, you’ll want to revisit it. - - - - - - - - - - - - - - - - I’m Nina. I build with AI and share how it’s done weekly. #aiagents #llm #softwaredevelopment #technology

  • View profile for Daniel Chernenkov

    Co-Founder, CTO | 2x Post Exists. Staying Foolish, Building the Future of AI.

    6,670 followers

    LLM Routers are a critical component in modern AI pipelines, acting as intelligent dispatchers that streamline complex workflows and maximize the capabilities of large language models. 📚 How they work: - Task Identification: LLM routers analyze incoming requests, including text prompts, context, or structured data, to determine the most appropriate LLM for the task. - Intelligent Routing: Based on factors like task type, LLM specialization, resource availability, and cost considerations, the router dynamically selects the optimal LLM and routes the request accordingly. - Load Balancing: Routers distribute requests across multiple LLMs, ensuring efficient resource utilization and preventing overload. - Fallback Mechanisms: In cases where the primary LLM is unavailable or unable to process a request, the router can seamlessly switch to alternative LLMs. 🗝 Key Technical Benefits: - Enhanced Performance: By matching tasks to specialized LLMs, routers unlock the full potential of each model, resulting in faster and more accurate responses. - Cost Optimization: Routers minimize unnecessary API calls to expensive LLMs by utilizing smaller, more efficient models for simpler tasks. - Scalability: The ability to distribute workloads across multiple LLMs allows for seamless scaling as demand increases. - Modularity & Flexibility: Routers can integrate with a wide range of LLMs, both open-source and proprietary, and support custom routing logic for specific use cases. LLM routers are revolutionizing a wide range of AI applications: Chatbots: Delivering more contextually relevant and personalized responses. Content Generation: Automating the creation of high-quality articles, summaries, and creative content. Data Analysis: Accelerating complex data extraction, cleaning, and analysis tasks. Research & Development: Streamlining workflows for scientific research, drug discovery, and more. By optimizing LLM usage, reducing costs, and ensuring scalable performance, LLM routers are paving the way for a new generation of intelligent applications across industries.

  • View profile for Dylan Williams

    Co-Founder - Spectrum Security

    16,082 followers

    New research introduces IntelEX. This paper shows how LLMs can help automate the pipeline from unstructured threat reports to detection rules, reducing the boring and error-prone workflow for security analysts. Testing on 1,769 newly crawled reports, it identified 3,591 attack techniques and includes a LLM Judge module to reduce hallucinations. Some tips we can borrow from the paper: - when processing CTI reports with LLMs, chunk documents by linking sentences around IoCs rather than feeding the full document - used a vector DB of MITRE ATT&CK tactics/techniques for retrieval - used separate LLM instances for extraction, retrieval, and verification to avoid context contamination (looks like they deliberately did not pass context from previous steps downstream) - step to explicitly reason about why a detection/classification is valid or invalid - extra steps to improve semantic understanding of threat intel by using attack variant generation and attack re-construction - the authors found GPT-4-mini achieved similar accuracy to GPT-4 at 1/20th the cost. IntelEX: A LLM-driven Attack-level Threat Intelligence Extraction Framework: 🔗 https://lnkd.in/eH9tkgbt

  • View profile for Rachitt Shah

    AI at Accel, Former Applied AI Consultant

    29,740 followers

    Understanding LLM Routing What is LLM Routing? LLM routing is a technique used to dynamically direct user queries to the most appropriate Large Language Model (LLM) based on the complexity and specificity of the query. The primary goal is to balance response quality and computational cost by leveraging both high-quality closed LLMs (e.g., GPT-4) and cost-effective open-source LLMs (e.g., Mixtral-8x7B). Key Points on Building an LLM Router by Anyscale(h/t: Amjad Almahairi) 1. Data Collection and Labeling: - Anyscale collected a diverse set of queries from the Nectar dataset, which includes responses from various models, including GPT-4. - Queries were labeled using a 1-5 scoring system based on the quality of responses from Mixtral-8x7B, with higher scores indicating better quality. 2. Model Selection: - GPT-4 was chosen as the closed LLM for its superior response quality. - Mixtral-8x7B was selected as the open-source LLM for its cost-effectiveness. 3. Causal LLM Classifier: - A Llama3-8B model was finetuned as a causal LLM classifier to route queries based on their complexity. - The classifier was trained to predict the quality score of Mixtral-8x7B's response to a given query. 4. Training Process: - The training involved full-parameter finetuning of the Llama3-8B model using Anyscale's API. - The dataset was balanced to ensure the model was not biased towards any specific label. 5. Evaluation: - Offline evaluations were conducted using benchmarks such as MT Bench and GSM8K. - The performance of the LLM router was compared against random routing and other public LLM routing systems. 6. Routing Decision: - The router directs "simple" queries to Mixtral-8x7B if the predicted score is high (4-5), maintaining high response quality while reducing costs. - More complex queries are routed to GPT-4 to ensure high-quality responses. 7. Results: - The LLM router achieved significant cost reductions while maintaining response quality. - Evaluations showed that the router could achieve up to a 70% cost reduction on MT Bench and a 40% cost reduction on GSM8K compared to using GPT-4 alone. Advantages of LLM Routing - Cost Efficiency: By routing simpler queries to cost-effective models, LLM routing significantly reduces computational costs. - High-Quality Responses: Complex queries are directed to high-quality models like GPT-4, ensuring that response quality is not compromised. - Scalability: The system can handle a high volume of queries by efficiently distributing the load between different models. - Flexibility: The routing framework can be adapted to include new models and updated based on evolving performance metrics. - Optimized Resource Utilization: Balances the use of computational resources, ensuring that high-cost models are only used when necessary.

  • View profile for Manjeet Singh

    Sr Director, Agentforce and AI platform @Salesforce| Evals, Observability, Multi-Agents Orchestration | Ex VP IRM, ServiceNow | AI Startups Advisor | Athlete

    14,740 followers

    RouteLLM is such an incredible concept for the trade-off between model performance and computational costs 💰 This approach allows for the efficient use of resources by reserving powerful models for challenging tasks while routing simpler queries to more economical options (with 2X or more cost saving as claimed in this paper) 🗞 Paper "RouteLLM: Learning to Route LLMs with Preference Data". arxiv.org/abs/2406.18665 One of the key innovations of RouteLLM is its use of human preference data for training the router. The researchers leveraged data from the Chatbot Arena, a platform where users compare responses from different LLMs, to create a rich dataset of human preferences. This data provides valuable insights into the relative strengths and weaknesses of various models across different types of queries. The RouteLLM framework employs several sophisticated routing techniques: 1. Similarity-weighted (SW) ranking 2. Matrix factorization 3. BERT-based classifier 4. Causal LLM classifier Interestingly, the RouteLLM approach demonstrates strong transfer learning capabilities. The routers maintained their performance even when the underlying strong and weak models were changed at test time, suggesting a robust and generalizable solution for LLM deployment.

Explore categories