Deep Lake is a unique Vector Store because it supports storage of various data types including images, video, and audio. In this tutorial we show how to use Deep Lake to perform similarity search for images.
Creating the Vector Store
We will use ~5k images in the as a source of diverse images. First, let's download the data.
!wget -O "<download_path>" http://images.cocodataset.org/zips/val2017.zip
# MAC !curl -o "<download_path>" http://images.cocodataset.org/zips/val2017.zip
We must unzip the images and specify their parent folder below.
images_path = <download_path>
Next, let's define a to embed the images based on the output from the second-to-last layer. We use the torchvision feature extractor to return the output of the avgpool layer to the embedding key, and we run on a GPU if available. (Note: DeepLakeVectorStore class was deprecated, but you can still use it. The new API for calling Deep Lake's Vector Store is: VectorStore.)
from deeplake.core.vectorstore.deeplake_vectorstore import VectorStore
import os
import torch
from torchvision import transforms, models
from torchvision.models.feature_extraction import create_feature_extractor
from PIL import Image
device = torch.device('cuda') if torch.cuda.is_available() else torch.device('cpu')
model = models.resnet18(pretrained=True)
return_nodes = {
'avgpool': 'embedding'
}
model = create_feature_extractor(model, return_nodes=return_nodes)
model.eval()
model.to(device)
Let's define an embedding function that will embed a list of image filenames and return a list of embeddings. A transformation must be applied to the images so they can be fed into the model, including handling of grayscale images.
tform= transforms.Compose([
transforms.Resize((224,224)),
transforms.ToTensor(),
transforms.Lambda(lambda x: torch.cat([x, x, x], dim=0) if x.shape[0] == 1 else x),
transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225]),
])
def embedding_function(images, model = model, transform = tform, batch_size = 4):
"""Creates a list of embeddings based on a list of image filenames. Images are processed in batches."""
if isinstance(images, str):
images = [images]
#Proceess the embeddings in batches, but return everything as a single list
embeddings = []
for i in range(0, len(images), batch_size):
batch = torch.stack([transform(Image.open(item)) for item in images[i:i+batch_size]])
batch = batch.to(device)
with torch.no_grad():
embeddings+= model(batch)['embedding'][:,:,0,0].cpu().numpy().tolist()
return embeddings
Now we can create the vector store for storing the data. The Vector Store does not have the default configuration with text, embedding, and metadata tensors, so we use the tensor_params input to define the structure of the Vector Store.
We observe in the automatically printed summary that the Vector Store has tensors for the image, their filename, their embedding, and an id, with 5000 samples each. This summary is also available via vector_store.summary().
Since images can be quite large, and we may not want to return them as numpy arrays, so we use return_tensors to specify that only the filename and id tensors should be returned:
Instead of returning the results of the similarity search directly, we can use return_view = True to get the Deep Lake dataset view, which is a lazy pointer to the underlying data that satisfies the similarity search (no data is retrieved locally).