Visual search is quietly killing keywords. $41.7B in 2024 → $151.6B by 2032. 12 billion+ searches on Google Lens every month. The next SEO revolution is image understanding. Pinterest Lens hit 600M monthly searches. In 2025, the brands winning are the ones who realized: Google Lens understands "leather handbag in burgundy" better than keywords ever could. Pinterest's multimodal AI generates the right keywords from images automatically. Your customer doesn't want to type "green sofa." They want to grab a photo of a sofa they like and say, "Find me this vibe." When I audit brand websites for visual search readiness, I find: 84% have product images named IMG123.jpg (AI can't understand them) 71% have zero image sitemaps 0% have structured data for visual search Not because they're lazy. Because nobody told them it mattered. But here's what kills me: these same brands are spending $50K/month on Google Ads trying to rank for keywords that matter less every quarter. Meanwhile, visual discovery is just sitting there. WHAT I'D DO DIFFERENTLY: Week 1: Audit your product photography. Are images high-resolution? Do file names use keywords? Do you have image sitemaps? (Most brands: 0/3) Week 2: Rename your product images. Sounds basic. But black-leather-handbag.jpg beats IMG123.jpg by 340% in visual search visibility. Week 3: Build image sitemaps. Takes 2 hours. ROI? 30% engagement boost within 3 months based on early data. Week 4: Implement AR try-on for your top 5 SKUs. Expected ROI: 200% conversion increase. (I'm not exaggerating, PerfectCorp data backs this) Week 5: Set up Pinterest Lens optimization. 600M monthly searches. 36% of your audience starts there. If you're not there, you're dead. By Month 2, you'll start seeing visual search traffic. By Month 3, you won't go back to keyword-only strategies. And AI is getting so good at it, text-based search feels like a flip phone in a smartphone world. IF YOU TAKE ONE THING FROM THIS: Stop asking, "How do we rank for keywords?" Start asking, "How do we get discovered when customers snap a photo of what they want?" The answer is visual search. And if you're not ready, your competitor will be. #Marketingexpert #visuals #GTMLeader
Visual Search Enhancement
Explore top LinkedIn content from expert professionals.
Summary
Visual search enhancement is the process of making images and visual content on websites easier for search engines and AI systems to understand and find, so people can search with photos instead of just typing keywords. As more users snap pictures to discover products or information, getting your visuals ready for this shift can dramatically increase online visibility and engagement.
- Use descriptive visuals: Make sure your images have clear, meaningful file names and alt text so AI and visual search tools can understand what’s in each photo.
- Structure your image data: Add image sitemaps and structured data to your website, so search engines and AI models can easily find and interpret your visual content.
- Keep visuals high-quality: Use crisp, professional photos and ensure fast loading for all images, since clear pictures and quick access matter for both AI discovery and user experience.
-
-
Your thumbnail might legitimately outperform your title in AI search results. While everyone obsesses over text optimization, we discovered something unexpected: clients with strong visual assets are getting cited more often in multimodal AI responses. Google I/O 2025 confirmed what we've been seeing: Enhanced multimodal capabilities are making visual search a primary discovery channel. The shift makes sense when you think about user behavior. People are uploading screenshots to ChatGPT, asking Claude to analyze images, and using Google Lens for everything from product identification to problem-solving. But most companies are completely unprepared for visual AI optimization. They're still thinking about images as decoration instead of discoverable content that AI systems can parse and cite. What's actually driving visual AI citations: • Images that directly answer queries at a glance work best. Structure visual content to solve specific problems or demonstrate clear outcomes rather than generic stock photos or logos. • Proper image schema markup using ImageObject schema with detailed alt text, captions, and structured data helps LLMs understand and cite visual content accurately. • Consistent visual authority through unified branding and professional quality across all visual assets. AI systems recognize and favor brands with coherent visual identity. • Context-rich visuals that work standalone while supporting surrounding text. LLMs prefer content that provides clear, actionable information whether viewed independently or with accompanying text. • Systematic visual performance tracking to monitor how images appear in AI responses and search features, then optimize based on actual citation patterns. The opportunity is massive because so few companies are thinking about visual AI optimization yet. The brands that nail this early will dominate multimodal discovery in their categories. Visual content optimized for AI comprehension dramatically increases citation chances in multimodal search results. How are you thinking about visual content for AI discovery? Are you seeing any of your images get referenced in LLM responses yet?
-
Visual Search Readiness Is your site optimized for Google Lens and image-first discovery? More people now search with Google Lens instead of typing. Snap a product, landmark, recipe, or object, and Lens finds matches. But here’s the catch: Visual search isn’t just about having images. It’s about whether your site is Lens-ready: Images in <img> tags (not hidden in CSS/JS). Descriptive filenames + meaningful alt text. Image sitemaps & structured data (Product / ImageObject). High-quality, clear photos from multiple angles. Fast, responsive images (WebP/AVIF, srcset). Google looks at pixels + page context. If your images aren’t discoverable, fast, and semantically described, you’re invisible in this search flow. Question for you: Do you still treat image SEO as “secondary,” or is it already part of your core strategy? ---------------------------- Stop letting your images get lost. Make them Lens-ready and discoverable. © Muhammad Usman WordPress Developer | Website Strategist | SEO Specialist
-
Mini-o3, ByteDance: How AI Models Learn to Think Through Complex Visual Problems ... What if an AI could examine an image the way a detective investigates a crime scene - taking multiple looks, backtracking when confused, and gradually piecing together the answer? Most vision AI models today take one quick glance and guess. But researchers at ByteDance and University of Hong Kong just demonstrated something different: an AI system that thinks through visual problems step by step, sometimes taking 30+ reasoning turns to solve challenging tasks. 👉 Why This Matters Current vision models struggle with complex visual search tasks. Show them a busy street scene and ask "what's written below the parking sign?" - they often fail because they can't systematically explore the image or recover from initial mistakes. The problem isn't just accuracy. It's that these models use rigid, shallow reasoning patterns that don't scale to harder problems. 👉 What They Built Mini-o3 introduces three key innovations: Visual Probe Dataset: Thousands of intentionally difficult visual search problems with small targets, distractors, and high-resolution images that require trial-and-error exploration. Cold-start Training Pipeline: They collected diverse multi-turn reasoning trajectories by prompting existing models to mimic exploratory behaviors like depth-first search and self-correction. Over-turn Masking: A training technique that prevents penalizing incomplete responses, allowing the model to learn longer reasoning chains without being biased toward quick answers. 👉 How It Works The system operates in thought-action-observation loops. At each turn, it: - Generates internal reasoning (thought) - Takes an action (zoom into image region or provide final answer) - Receives new visual information (observation) - Continues until solved or hitting turn limits During training, they capped interactions at 6 turns for efficiency. But at test time, the model naturally scales to 30+ turns with accuracy improving as more turns are allowed. 👉 The Results On their hardest visual search benchmark, Mini-o3 achieved 48% accuracy compared to 35% for previous best models. More importantly, it showed genuine test-time scaling - performance continued improving with more reasoning steps, even though it was trained on much shorter sequences. This suggests a path toward AI systems that can tackle increasingly complex visual reasoning tasks by simply allowing more thinking time. The complete training recipe and Visual Probe dataset are being released to help advance research in this direction.
-
Google’s latest AI shopping update adds a visual “fan-out” mode that surfaces product images directly within search. It’s yet another clear sign of where shopping is headed. All signs point away from keyword-based towards context-based discovery. The KEY for consumer brands: these new visual experiences only work when your underlying product data is clean, structured, and trusted. AI can only match “that striped blue shirt with organic cotton” if the product feed actually includes structured attributes for color, pattern, and material, tied to verified sources. Without that metadata, the model doesn’t surface you. At Novi, this is exactly what we’re solving: helping brands and retailers turn product attributes, claims, and certifications into verified, machine-readable data that AI systems (like Google’s) can interpret correctly. It’s the key to preventing hallucinated search results that erode consumer trust fast. Visual search makes product storytelling feel effortless and behind every effortless experience is hard data work. First mover advantage folks! The brands investing in structured, trustworthy, authoritative product data now will be the ones whose products surface first in AI-powered discovery.
-
Google just announced the visual fan-out technique‼️ Visual Search Fan-Out marks a significant leap in how AI interprets and interacts with images in search. Instead of simply identifying objects, the system now understands images in full context, combining visual details with natural language to deliver richer and more relevant results. ✅ Interactive visual exploration You can describe what you’re looking for in plain language, and the AI turns vague ideas into clear visual results. It learns from your follow-up questions to improve and refine the images it shows. ✅ Context-aware shopping Finding products is easier. Instead of using complicated filters, you just describe what you want—like style, fit, or color—and the AI shows matching items using up-to-date product data. ✅ Advanced image understanding The AI looks at the main subject of an image as well as smaller details and background elements. It combines all this information to give more accurate and relevant results. ✅ Flexible, multimodal input You can start your search with text, an image, or a photo. The AI blends these inputs to guide you to the most useful results.
-
Search isn’t about typing anymore. It’s about snapping. 📸 Visual search is transforming how people discover products online. Google Lens handles 12 billion searches per month, and Pinterest Lens sees 600 million. Younger generations, especially Gen Z, prefer visual-first discovery over text. Smartphone cameras + AI = instant product recognition. Visual search drives faster discovery, better decisions, and higher confidence. Google Lens and Pinterest Lens are shaping the future of SEO beyond keywords. Brands with weak visuals or missing structured data risk being invisible. Optimized images, alt text, and product context can boost e-commerce discovery 30%+. Retailers using visual search see 48% faster discovery and 25% higher conversion rates. By 2028, 50% of searches will be visual or voice-driven. Early optimization means brands dominate tomorrow’s AI-powered shopping landscape. Are your images ready for the future of search?
-
We’ve all been there during the holidays: trying to find the perfect gift based on a vague idea, like "something with a vintage vibe" or a "cozy sweater in a trending color,” but the right keywords just don't exist. With a major new update to AI Mode in Google Search, you can now simply show or tell Google what you’re thinking and get rich, visual results that you can instantly shop. This is a game changer for everything from designing a room, to completing an outfit, to figuring out the right gift for that person who’s hard to shop for. Just in time for the holiday shopping season. This breakthrough visual search experience is rooted in combining our visual understanding (Lens and Image search) with Gemini 2.5’s advanced multimodal capabilities. For advertisers, this means two critical shifts: Richer Intent Signals: Consumers are moving from vague keywords to detailed, natural language descriptions of their intent ("I want more ankle length"). This gives marketers significantly richer data signals to optimize campaigns and ensure their ads are perfectly relevant. Visual Content is Your New Keyword: Your product images, high-quality visuals, and associated metadata are now more important than ever. The new "visual search fan-out" technique allows AI Mode to understand subtle details within an image, meaning brands must prioritize comprehensive, structured content to ensure their products are discoverable when the customer is searching by image or "vibe." AI is making search more intuitive and shoppable than ever. Read the full announcement here: https://bit.ly/46RlgAH #GoogleSearch #AIMode #HolidayShopping