The way people search for and discover personalized content is fundamentally changing.
Most Google searches now end without a click, and AI Overviews appear on up to a quarter of all queries. Users don't want ten blue links — they want answers, synthesized in context, personalized to their history.
We're entering the era of conversational search and agentic personalization.
RAG grounds LLM outputs in real-time knowledge from trusted sources. Agentic architectures decompose complex queries into multi-step reasoning chains — coordinating retrieval across documents, databases, APIs, and knowledge graphs. Memory systems make it persistent: short-term memory tracks reasoning within a session, long-term memory carries user preferences across sessions in natural language you can actually inspect and reason about.
This changes what personalization means. Instead of selecting from a fixed catalog, systems now synthesize — generating novel responses, explanations, and recommendations conditioned on your full interaction history. GEPA (NeurIPS-adjacent, 2025) shows how prompts across compound AI pipelines can be automatically evolved using reflective natural language feedback, outperforming reinforcement learning with 35x fewer rollouts.
Meta's GEM makes this real at scale: the largest RecSys foundation model in the industry, trained at LLM scale across thousands of GPUs. It combines deep learning sequence modeling with LLM-inspired scaling laws, then transfers knowledge to hundreds of serving models via distillation. 5% conversion lift on Instagram. 3% on Facebook Feed. The roadmap: a unified model ranking both organic content and ads across text, image, audio, and video — with inference-time compute scaling and agentic automation.
But here's the nuance that gets lost in the GenAI hype: none of this works without the deep learning foundation built over the past decade.
Two-tower models (DSSM, YouTube DNN) made retrieval personalized by construction — learning joint user-item embeddings serving billions of requests daily. Multi-objective architectures (MMoE, PLE, ESMM) taught us to jointly optimize competing signals like CTR, conversion, and satisfaction. Sequential models (SASRec, BERT4Rec) captured that your intent right now isn't your intent five minutes ago. LLM-ESR (NeurIPS 2024) bridges both worlds — fusing LLM semantic embeddings with collaborative signals without adding inference overhead.
The most effective GenAI systems don't replace this stack. They augment it. Generate offline, serve efficiently. Synthesize on top of what embeddings retrieve.
The frontier isn't choosing between eras. It's building the full stack.
Just wrote a blog on this. ↓
https://lnkd.in/gc28Wggx
#GenerativeAI #ConversationalSearch #Personalization #RecommendationSystems #AgenticAI #LLM #RAG #MachineLearning #DeepLearning #RecSys