Minh / blog

Visual content search should be visual

12th Apr 2023



Imagine you want to look for a specific template for a project you're working on. What keywords would you use to search for it? The content of an image is easy to describe and we often use this in our day to day conversations. But describing stylistic properties is often much harder to do. Describing a style requires you to determine a common set of vocabulary with your audience (in this case, the search engine). Not everyone describe style the same way and not everyone is even aware of the different styles. In fact, the most popular keywords in most template marketplaces is "minimal" - used to describe a wide variety of different templates - what you consider "minimal" might not be my definition of "minimal". In addition to this, not everyone can express a concept succinctly or at all in their language.

Here in lies the problem with using text to search for visual content, the translation between the two medium is malleable and unreliable. Searching for visual content with text requires you to translates said concepts in your head from image form to textual form, a process that will likely reduce the fidelity of what you are describing but also turn it into a set of terms that aren't clearly defined.

Current approaches often converge on reverse image search where one can search by using an image. However, this requires users to first have one (or multiple) reference images and search using this method is often one-shot, giving users only 1 opportunity to find what they are looking for. Perhaps a better solution is to think of what we're searching for as a map and the act of searching as navigating this map.

Look for this apple













The images above was vectorized using ResNet and the resulting embeddings flattened into 2 dimensions using UMAP. The resulting search path is visualized in a graph.

When using a map, we have the ability to navigate through it and make adjustments as we go along. Similarly, visual content search should allow users to refine their search by exploring different styles and adjusting their preferences as they go. By using image search, users can find what they're looking for based on visual cues rather than relying on text-based descriptions.

For example, if a user is looking for a template that has a minimalist design but with pops of color, they could use an image of a minimalist design as their reference image and then explore different options that have the added element of color. This allows them to refine their search without having to put a label on the specific style they are looking for.

By relying on visual cues rather than text-based descriptions, image search enables users to more accurately translate stylistic concepts into their search queries. It also allows for greater flexibility in refining searches based on personal preferences and individual interpretations of certain styles.


- Minh