Research · June 2, 2026 · 9 min read
Visual AI Search Is About to Happen: How Customers Will Find Local Businesses With a Camera
Two announcements at Google I/O 2026 (May 19) point to the same near future. The first: Gemini Omni, a multimodal model that takes any input — text, image, audio, video — and produces high-quality video output, with image and text outputs coming later. The second: intelligent eyewear with Gemini, launching this fall via partners including Samsung, Gentle Monster, and Warby Parker.
Stack the two and the picture is clear. By the end of 2026, a customer can be walking down a street, glance at your storefront, and ask Gemini through their glasses “is this place any good?” Or point their phone at a piece of furniture and ask “who in town makes one like this?” Visual AI search is moving from novelty to channel.
This post covers what was actually announced (every fact verified against blog.google), why visual search changes the local-discovery equation, and the playbook for businesses that want to be on the “point and ask” shortlist.
What was announced
Gemini Omni — multimodal video model
Source: blog.google — Introducing Gemini Omni
Combine images, audio, video, and text as input; generate high-quality videos grounded in Gemini's real-world knowledge. Over time, image and text outputs as well. Gemini Omni Flash is rolling out to Google AI Plus, Pro, and Ultra subscribers globally via the Gemini app and Google Flow. Free for YouTube Shorts and YouTube Create users starting May 19, 2026.
Intelligent eyewear with Gemini — launching fall 2026
Source: blog.google — Android XR I/O 2026
Two types announced: audio glasses (spoken help in your ear) launching first this fall, and display glasses (with visible UI) later. Partners include Samsung, Gentle Monster, and Warby Parker. Capabilities at launch include multi-step background tasks (Google's example: 'order coffee on DoorDash while your phone stays in your pocket'), turn-by-turn directions, finding nearby restaurants based on user preferences, hands-free calls and texts, and real-time translation.
Search box redesign with multimodal input
Source: blog.google — Google Search I/O 2026 updates
The biggest Search box upgrade in 25+ years. Multimodal input includes text, images, files, videos, and Chrome tabs. Rolling out everywhere AI Mode is available, starting May 19, 2026. This is the desktop-and-mobile bridge between today's typed search and tomorrow's point-and-ask glasses search.
Why visual search is different from text search for local businesses
Text search starts with a query someone has to formulate (“best Italian restaurant in Lincoln Park”). Visual search starts with something the user is looking at — a place, a product, a sign, a dish. Three implications follow:
- Photo metadata becomes ranking signal. What the AI thinks about an image depends entirely on what you've told it. Alt text, EXIF data, and ImageObject schema are no longer optional.
- Original photography stops being a polish item. Stock photos look identical to AI across thousands of sites. A real photo of your storefront, dish, or product is what AI uses to match a customer's glance to your business.
- Geo-tagged photos become a discovery vector. A user pointing their glasses at a building wants the AI to identify what's in that building. Photos with embedded GPS coordinates teach AI which storefront is where.
The visual AI search playbook
Audit your homepage and About page for original, recent photography
If your hero image is a stock photo or six years old, replace it. AI parses image quality and recency. A real photo of your team, location, or signature product taken in the last 12 months is what visual search will match against.
Write descriptive alt text on every image
'Storefront photo' is not enough. 'Linnea Bakery storefront, 245 NE Alberta St, Portland, with handmade signage and outdoor seating' is what AI needs to confirm a match when a user points their camera.
Add ImageObject schema to your most important photos
ImageObject JSON-LD with contentUrl, name, description, geo coordinates (where applicable), and a representativeOfPage flag tells AI exactly what each photo represents. Storefront photos, product photos, and dish photos benefit most.
Keep EXIF geo data on storefront and team photos
Many CMS platforms strip EXIF data on upload. Confirm yours doesn't, or re-upload originals with metadata intact. A geotagged photo of your shop signals to AI exactly where the physical business is — useful for the 'is this place any good?' query.
Submit photos to Google Business Profile and Apple Maps
Both directories are the verified-image sources Gemini eyewear and AI Mode draw from. Photos of your storefront, interior, team, and signature products on Google Business Profile and Apple Business Connect are what AI matches against the user's camera input.
Use Gemini Omni Flash for product and storefront video
Gemini Omni Flash is free for YouTube Shorts and Create users. The same cost curve that made photography affordable for small businesses 20 years ago is now hitting video. Short, high-quality video assets per service or product feed both visual search and the YouTube ad surface.
What changes when glasses launch
Smart glasses don't replace phone search overnight. Even at scale they'll be a minority of queries for a few years. But three behavioral shifts will start appearing in your analytics this fall and grow through 2027:
- More queries about specific places the user is looking at — your storefront, a competitor's sign, a dish at another restaurant
- More queries from walking and transit users — Google's announced capabilities include turn-by-turn navigation and restaurant discovery based on preferences
- More multilingual interaction — real-time translation in the glasses removes a barrier to local discovery for non-English speakers in U.S. cities and for English speakers traveling abroad
The bottom line
Visual AI search isn't a 2027 problem. It's a fall-2026 problem, and the work that prepares your business for it (real photos, descriptive alt text, ImageObject schema, geo-tagged metadata, complete Google Business and Apple Maps profiles) all pays back immediately in text-based AI search too. There is no “just optimize for this later” option that won't cost you in the meantime.
See where your image and structured data stand — free
Our free scan checks alt text coverage, ImageObject schema, structured data, and the signals visual AI agents will use to match users' cameras to your business. Under 10 seconds, no signup, no credit card.
Run my free scan →Sources: Introducing Gemini Omni, Intelligent eyewear with Gemini, Google Search — I/O 2026 updates.
Written by the team at Kesem Marketing, a digital agency helping small businesses get found in the AI-first era.