I'm excited to share our latest theoretical work that formally proves an interesting property of large language models: base transformer models can approximate fine-tuned capabilities using only inference-time techniques like in-context learning. The core question we investigated: Can specialized behaviors typically acquired through expensive supervised fine-tuning be elicited from base models without any parameter updates? Our theoretical contribution: We provide a formal proof, grounded in the Turing completeness of transformers, showing that this is indeed possible under certain assumptions. The work establishes mathematical bounds on the minimal dataset sizes needed for approximation. Key theoretical results: - For text generation tasks: O(mV/ε²) examples suffice (where m = number of contexts, V = vocabulary size, ε = error tolerance) - For linear classification: O(d/ε) examples (where d = input dimension) - Extensions to finite context scenarios with practical bounds This work helps explain why techniques like few-shot prompting, retrieval-augmented generation, and in-context learning work so effectively in practice. It bridges formal computer science theory with empirical observations about modern language models. While the assumptions are idealized (unbounded computational resources, full dataset access), the results provide mathematical foundations for understanding inference-time adaptation strategies that are increasingly important in AI deployment. #MachineLearning #AI #Research #Transformers #TheoreticalCS
Using Multi-Dimensional Context in Large Language Models
Explore top LinkedIn content from expert professionals.
Summary
Using multi-dimensional context in large language models (LLMs) means enabling these AI systems to understand and process information from several different sources, types, or perspectives at the same time—such as text, images, numbers, and real-world relationships. This approach helps LLMs grasp complex meanings, adapt to specialized tasks, and continually learn from new context, making them smarter and more flexible in practical applications.
- Integrate varied data: Combine information from sources like text, images, and measurements so your language model can capture the full picture rather than relying on just one type of input.
- Apply structured context: Use tools like vector embeddings and knowledge graphs to give your AI a more precise understanding of words, relationships, and meanings—helping it distinguish between similar terms in different scenarios.
- Build for adaptation: Design your workflows so the AI can remember new information and refine its recommendations based on evolving rules, feedback, or changing data patterns over time.
-
-
Excited to share a major advance in multimodal representation learning: UniME (Universal Multimodal Embedding), a new two-stage framework that breaks through the limitations of traditional vision-language models. What's the challenge? Popular models like CLIP have powered cross-modal retrieval and clustering, but they're held back by text truncation, isolated image-text encoding, and poor compositionality. Even recent Multimodal Large Language Models (MLLMs) excel at reasoning but struggle to learn truly universal, discriminative embeddings. How does UniME work? Developed through a collaboration between top academic and industrial labs, UniME rethinks how MLLMs learn multimodal representations: - Stage 1: Textual Discriminative Knowledge Distillation - UniME uses a state-of-the-art LLM-based embedding model as a teacher (e.g., NV-Embed V2). - The language component of the MLLM is trained to align its embeddings with the teacher, using prompts to compress information and KL divergence to align distributions. - Only text data and minimal parameters are updated, making training highly efficient. - Stage 2: Hard Negative Enhanced Instruction Tuning - UniME addresses the issue of false negatives by filtering candidates based on similarity thresholds. - For each query, multiple hard negatives (samples most similar to the query but incorrect) are selected, forcing the model to refine its discriminative power. - Task-specific prompts enable adaptation to a wide range of tasks, from retrieval to visual question answering. Under the hood: - UniME leverages a unified MLLM architecture with a vision tower, projection layer, and LLM backbone, supporting both unimodal and multimodal inputs. - Efficient parameter tuning is achieved with QLoRA and memory-saving techniques like GradCache, allowing training on large batches with limited hardware. - The framework is evaluated on the MMEB benchmark (36 datasets across 4 meta-tasks) and multiple retrieval tasks, including both short and long caption retrieval as well as compositional benchmarks. Results: - UniME delivers consistent and significant improvements over strong baselines (like CLIP, E5-V, and VLM2Vec) across all evaluated tasks. - It demonstrates robust discriminative and compositional capabilities, outperforming prior models especially in handling long captions and compositional retrieval scenarios. This work highlights the power of combining knowledge distillation, hard negative mining, and instruction tuning to unlock the full potential of MLLMs for universal embedding learning. Looking forward to seeing how this framework shapes the next generation of multimodal AI applications!
-
Large language models (LLMs) can improve their performance not just by retraining but by continuously evolving their understanding through context, as shown by the Agentic Context Engineering (ACE) framework. Consider a procurement team using an AI assistant to manage supplier evaluations. Instead of repeatedly inputting the same guidelines or losing specific insights, ACE helps the AI remember and refine past supplier performance metrics, negotiation strategies, and risk factors over time. This evolving “context playbook” allows the AI to provide more accurate supplier recommendations, anticipate potential disruptions, and adapt procurement strategies dynamically. In supply chain planning, ACE enables the AI to accumulate domain-specific rules about inventory policies, lead times, and demand patterns, improving forecast accuracy and decision-making as new data and insights become available. This approach results in up to 17% higher accuracy in agent tasks and reduces adaptation costs and time by more than 80%. It also supports self-improvement through feedback like execution outcomes or supply chain KPIs, without requiring labeled data. By modularizing the process—generating suggestions, reflecting on results, and curating updates—ACE builds robust, scalable AI tools that continuously learn and adapt to complex business environments. #AI #SupplyChain #Procurement #LLM #ContextEngineering #BusinessIntelligence
-
Heterogeneous datasets are pervasive today, existing in various domains. Objects within these complex datasets are often represented from different perspectives, at different scales, or through multiple modalities, such as images, sensor readings, language sequences, and compact mathematical statements. Such datasets have been analyzed in the past using Multi-View Learning (MVL), Multi-Task Learning (MTL), and Tensor Learning (TL). In recent years, Multi-Modal Learning (MML) has also been employed. MML is a Machine Learning (ML) approach that integrates and processes information from multiple types of data, with different "perspectives" or "modalities" such as text, images, audio, video, or sensor data. The goal of MML is to leverage the complementary strengths of these modalities to improve model performance and enable richer understanding and predictions. Precision medicine and personalized clinical decision support systems (CDSS) tools have long aimed to leverage multimodal patient data to better capture complex, high-dimensional patient states and provider responses. This data ranges from free-form text notes and semi-structured electronic health records (EHR) to high-frequency physiological signals. While the advent of transformer architectures has enabled deeper insights from merging modalities, it has also required meticulous feature engineering and alignment. In patient monitoring, effectively analyzing diverse physiological signals within CDSS is highly challenging. #MedicalInformatics To address the challenges of analyzing multimodal patient data, the authors of [1] introduce MedTsLLM, a general multimodal large language model (LLM) framework that effectively integrates time series data and rich contextual information in the form of text. This framework performs three clinically relevant tasks (in time-series) which enable deeper analysis of physiological signals and can provide actionable insights for clinicians: • semantic segmentation • boundary detection • anomaly detection At a high level, boundary detection splits signals into periods like breaths or beats. Semantic segmentation further splits time series into distinct, meaningful segments. Anomaly detection identifies periods within the signals that deviate from normal. MedTsLLM utilizes a reprogramming layer to align embeddings of time series patches with a pretrained LLM's embedding space, making effective use of raw time series in conjunction with textual context. They additionally tailored the text prompt to include patient-specific information. Their experiments showed that MedTsLLM outperforms state-of-the-art baselines, including deep learning models, other LLMs, and clinical methods, across multiple medical domains, specifically electrocardiograms (ECG) and respiratory waveforms. Links to their preprint [1] and #Python GitHub repository [2] are shared in the comments.
-
An LLM might know what “Apple” means. But how can it know whether you’re talking about a fruit or a tech company? If we want to move deeper into AI-enabled workflows that’s the kind of nuance they’re going to need to handle in Localization. Our field is drowning in unstructured data, support tickets, product specs, marketing copy and while large language models can process these, they still need structure, context, and precision to make more sense of it all. That’s where vector embeddings and knowledge graphs can make a difference. Vector embeddings allow machines to understand meaning numerically. They help detect semantic similarities, flag hallucinations, and improve machine translation quality. Knowledge graphs, on the other hand, model relationships. They organize information about entities, people, brands, terms, and how they relate to each other. They don’t just help LLMs understand language better. They help them reason through it. Benjamin Loy, Principal engineer at Smartling laid this out beautifully in our recent LocDiscussion interview. He’s not just thinking about these technologies in theory, he’s already using them. Vector embeddings, as he explains, have been instrumental for hallucination detection. They’re fast, cheap, and effective when you’re comparing a source and target sentence. But he also sees the promise of knowledge graphs as the future of terminology 2.0, especially when you’re dealing with problems like multilingual named entities, pronoun disambiguation, and complex domain-specific knowledge. It’s rare to find someone who can articulate the trade-offs so clearly. Ben doesn’t oversell the shiny new thing, he talks candidly about ROI, token cost, storage, and where the real gains might be. It’s the kind of pragmatic, forward-looking perspective that localization needs more of right now. So here’s the question: If you were redesigning your multilingual technology infrastructure from scratch, would you prioritize vectors or graphs? What’s your bet on the future?
-
Context engineering is the art of giving AI systems the right information at the right time. Most engineers focus on initial retrieval, that is, finding relevant documents from a large corpus. But retrieval is just the first step. The real challenge is prioritization. A reranker sits between the first step of retrieval and generation, making critical decisions about which pieces of context deserve the model's attention. This is essential regardless of context window length. Enterprise documents can contain conflicting information across sources over time. Recent work on context rot demonstrates that LLMs don’t process long input contexts uniformly, and best practices still require precise optimization of context at each step. You might retrieve 50 relevant documents with similar content, but your model can only meaningfully process 10. The reranker decides which 10 matter most. Sometimes, with simple problems, you can use metadata filtering instead of reranking. However, metadata filters quickly hit limitations when dealing with semantic understanding, relative importance, or multi-dimensional relevance criteria that require actual comprehension of content. But instruction-following rerankers go further: they can be configured at the agent/application level for consistent behavior, or dynamically at the query level for specific tasks. These rerankers excel at complex, nuanced requirements that metadata alone can't capture: "Prioritize documents from the last 6 months and internal sources. However, if dealing with foundational concepts, include older authoritative external sources." They handle semantic nuance and soft filters. Good context engineering recognizes that relevance isn't binary, it's instructable and contextually complex. Rerankers are a critical control point that transforms raw retrieval results into precisely curated context for optimal model performance.
-
When Context Becomes Computation: From Luhman's Polycontextuality to the Age of Hypercontext In the 1990s, Neil Postman argued that we lived in a time of context with no context: a media ecology where information moved faster than the frameworks that once gave it meaning. He warned that television, and later the total information environment, created a landscape where facts multiplied while significance thinned, every message drifting free of the conditions that once anchored understanding. A decade earlier, Barthes had traced this dissolution in language, watching the signifier detach from its origin and wander through cultural codes. The death of the author, he wrote, was a structural event in which meaning dispersed across innumerable readings—a counterpart to Postman’s context of no context, each describing the loss of a stable frame to connect information to meaning. What we have now may be an inversion of that condition. In language models, every word is generated through overlapping contexts—patterns drawn from fragments of human expression that the system recombines in real time. Each token arises within a field shaped by associations across the archive. Context here is swarm-like, emergent from intersections of others that have appeared near it. The model builds internal contexts of relation, generating provisional zones of meaning through its own operations. Each new configuration gives rise to others in a continual expansion of possibility. Postman’s desert has become Barthes’s jungle and then something stranger still: a condition in which context itself becomes generative. Each word now occupies a position within a multidimensional field that recalibrates as new connections form. The model compresses linguistic histories into latent associations, then reactivates them through evolving patterns in its representational space. It operationalizes context by translating linguistic proximity into measurable relation, encoding the associations among words as vectors that continuously adjust to one another. In this sense, context is no longer merely a structural condition of meaning but part of the computational substrate through which language is generated. Luhmann described modernity as polycontextural, a world of distinct meaning systems, each internally consistent yet incompatible with the rest. Hypercontextuality describes the way language models transform that plurality into a shared field where contexts interpenetrate and continually reshape one another. Here, meaning arises through distributed alignment, as relations across vast domains interact and recombine. The structure of meaning itself shifts from sequence to simultaneity, from the linear unfolding of relations over time to their continuous recombination within a shared representational space. Luhmann’s differentiated world folds into a hypercontextural one, where every context exists through its relation to every other.
-
Over the past year, I’m sure many of us have spent a lot of time working with large language models 🤖. Like many others, I started by learning how to write better prompts — and as I went deeper, I realized something important: 👉 Prompt engineering is just the beginning. The real magic happens when you start thinking about context engineering ✨. 🔹 Prompt engineering is about crafting the right question or instruction. It’s what we all first learned — choosing clear language, being specific, maybe even using a few tricks like “act as a…”. It’s useful, especially for one-off tasks or simple completions. But when you’re building complex use cases, full applications, or trying to create consistent, reliable AI behavior across sessions, prompts alone aren’t enough. That’s where context engineering comes in 🧠. 🔹 Context engineering is about everything that surrounds the prompt. It’s about structuring the entire input to the model to guide its behavior — not just what you say, but how you set the scene. Think: 🗣️ System instructions – set the model’s role and style 💾 Memory – helps the model remember past info 🧩 Examples – teach by showing, not telling 🧱 Metadata – highlight key info only 🔄 Context windows – update what matters as chat continues That shift — from focusing on the prompt to curating the context — has completely changed how we build with LLMs. And honestly, I think this is where the field is going. 🚀 Would love to hear how others are thinking about this shift. Are you seeing the same evolution in your work? #AI #innovation #promptengineering #contextengineering #biotech #pharma #AIdevelopment
-
Up until now, much of domain specific knowledge injection to LLMs has answered the question: "How do we get the right context INTO the window?", but with the success of coding agents and recursive language models, that question has changed to: "How do we let the model NAVIGATE context itself?" Large language models have a limited context window, or maximum amount of tokens that can be input as its entire context. This is a hard constraint resulting from the transformer architecture itself, and while modern models have pushed context windows into the hundreds of thousands (even millions) of tokens, more context doesn't always mean better results. Research has shown that model performance actually degrades as input length increases, a phenomenon known as context rot, where models struggle to reliably use information buried deep in long sequences, especially when surrounded by similar but irrelevant content. The solution up until now has been Retrieval Augmented Generation (RAG), chunking and embedding documents into vector databases, then retrieving the most relevant pieces via semantic similarity. This works, but it frames context management purely as a search problem, and scaling it starts to feel more like building a search engine than an AI system. What coding agents like Claude Code, Cursor, and Codex stumbled into was a different approach entirely: give the LLM a terminal and let it explore. Filesystem-based context navigation lets models directly explore, preview, and selectively load content using tools they already understand. Instead of engineering a pipeline to deliver the right context, the model finds it itself. Recursive Language Models (RLMs) formalize this further, with a slight distinction: in a coding agent, opening a file or running a tool dumps results back into the context window. RLMs instead store the prompt and all sub-call results as variables in a code environment, only interacting with them programmatically. Recursion happens during code execution, meaning the model can spawn arbitrarily many sub-LLM calls without polluting its own context, orchestrating understanding of 10M+ tokens without ever having to look at all of it at once. This gives us two differently motived options: RAG gives you fast, narrow retrieval great for latency-sensitive apps like chatbots. RLM-style frameworks trade speed for deeper exploration, better suited when thorough analysis matters more than response time. To learn more about context rot, how coding agents changed context delivery, and how recursive language models are formalizing it all, check out my latest video here: https://lnkd.in/ehszSKV7
From Retrieval to Navigation: The New RAG Paradigm
https://www.youtube.com/
-
Prompt Engineering is so old school Think now on Context Engineering. Prompt engineering helped us unlock early capabilities of large language models (LLMs). It centered on crafting instructions, phrasing, examples, formatting, all to guide output. But as LLMs infrastructure, and our understanding of their capacities evolved, prompt tuning alone became insufficient. Now, context engineering is taking center stage. It’s not just about what you say to the model but everything the model sees and knows at inference time: - System roles and instructions - Dialogue state and conversation memory - Retrieved knowledge - Tool schemas, function calls, API outputs - User profiles - Output constraints and formatting guides The model becomes a reasoning engine, embedded within a larger orchestration system. Success hinges on how you manage and structure context, not just prompt phrasing. Key context engineering practices include: - Prioritizing critical content placement - Compressing histories with LLM summarization - Optimizing retrieval (vector/hybrid search) - Tool output integration into context - Modular prompt construction with clear roles and boundaries Prompt engineering still matters, but it’s now one piece of the broader toolkit. Context engineering is how we can build reliable, scalable, and enterprise-ready LLM systems. And as context windows grow, so will its importance.