Here in lies the problem with using text to search for visual content, the translation between the two medium is malleable and unreliable. Searching for visual content with text requires you to translates said concepts in your head from image form to textual form, a process that will likely reduce the fidelity of what you are describing but also turn it into a set of terms that aren't clearly defined.
Current approaches often converge on reverse image search where one can search by using an image. However, this requires users to first have one (or multiple) reference images and search using this method is often one-shot, giving users only 1 opportunity to find what they are looking for. Perhaps a better solution is to think of what we're searching for as a map and the act of searching as navigating this map.

Look for this apple






The images above was vectorized using
ResNet
and the resulting embeddings flattened into 2 dimensions using UMAP
. The resulting search path is visualized in a graph.When using a map, we have the ability to navigate through it and make adjustments as we go along. Similarly, visual content search should allow users to refine their search by exploring different styles and adjusting their preferences as they go. By using image search, users can find what they're looking for based on visual cues rather than relying on text-based descriptions.
For example, if a user is looking for a template that has a minimalist design but with pops of color, they could use an image of a minimalist design as their reference image and then explore different options that have the added element of color. This allows them to refine their search without having to put a label on the specific style they are looking for.
By relying on visual cues rather than text-based descriptions, image search enables users to more accurately translate stylistic concepts into their search queries. It also allows for greater flexibility in refining searches based on personal preferences and individual interpretations of certain styles.