LogoLogo
API ReferenceGitHubSlackService StatusLogin
v3.9.16
v3.9.16
  • 🏠Deep Lake Docs
  • List of ML Datasets
  • 🏗️SETUP
    • Installation
    • User Authentication
      • Workload Identities (Azure Only)
    • Storage and Credentials
      • Storage Options
      • Setting up Deep Lake in Your Cloud
        • Microsoft Azure
          • Configure Azure SSO on Activeloop
          • Provisioning Federated Credentials
          • Enabling CORS
        • Google Cloud
          • Provisioning Federated Credentials
          • Enabling CORS
        • Amazon Web Services
          • Provisioning Role-Based Access
          • Enabling CORS
  • 📚Examples
    • Deep Learning
      • Deep Learning Quickstart
      • Deep Learning Guide
        • Step 1: Hello World
        • Step 2: Creating Deep Lake Datasets
        • Step 3: Understanding Compression
        • Step 4: Accessing and Updating Data
        • Step 5: Visualizing Datasets
        • Step 6: Using Activeloop Storage
        • Step 7: Connecting Deep Lake Datasets to ML Frameworks
        • Step 8: Parallel Computing
        • Step 9: Dataset Version Control
        • Step 10: Dataset Filtering
      • Deep Learning Tutorials
        • Creating Datasets
          • Creating Complex Datasets
          • Creating Object Detection Datasets
          • Creating Time-Series Datasets
          • Creating Datasets with Sequences
          • Creating Video Datasets
        • Training Models
          • Splitting Datasets for Training
          • Training an Image Classification Model in PyTorch
          • Training Models Using MMDetection
          • Training Models Using PyTorch Lightning
          • Training on AWS SageMaker
          • Training an Object Detection and Segmentation Model in PyTorch
        • Updating Datasets
        • Data Processing Using Parallel Computing
      • Deep Learning Playbooks
        • Querying, Training and Editing Datasets with Data Lineage
        • Evaluating Model Performance
        • Training Reproducibility Using Deep Lake and Weights & Biases
        • Working with Videos
      • Deep Lake Dataloaders
      • API Summary
    • RAG
      • RAG Quickstart
      • RAG Tutorials
        • Vector Store Basics
        • Vector Search Options
          • LangChain API
          • Deep Lake Vector Store API
          • Managed Database REST API
        • Customizing Your Vector Store
        • Image Similarity Search
        • Improving Search Accuracy using Deep Memory
      • LangChain Integration
      • LlamaIndex Integration
      • Managed Tensor Database
        • REST API
        • Migrating Datasets to the Tensor Database
      • Deep Memory
        • How it Works
    • Tensor Query Language (TQL)
      • TQL Syntax
      • Index for ANN Search
        • Caching and Optimization
      • Sampling Datasets
  • 🔬Technical Details
    • Best Practices
      • Creating Datasets at Scale
      • Training Models at Scale
      • Storage Synchronization and "with" Context
      • Restoring Corrupted Datasets
      • Concurrent Writes
        • Concurrency Using Zookeeper Locks
    • Deep Lake Data Format
      • Tensor Relationships
      • Version Control and Querying
    • Dataset Visualization
      • Visualizer Integration
    • Shuffling in Dataloaders
    • How to Contribute
Powered by GitBook
On this page

Was this helpful?

Edit on GitHub
  1. Examples
  2. Tensor Query Language (TQL)

TQL Syntax

How to properly format TQL queries

Query syntax for the Tensor Query Language (TQL)

CONTAINS and ==

-- Exact match, which generally requires that the sample
-- has 1 value, i.e. no lists or multi-dimensional arrays
select * where tensor_name == 'text_value'   # If value is text
select * where tensor_name == numeric_value  # If values is numeric

select * where contains(tensor_name, 'text_value')

Tensor or group names with special characters should be wrapped with double-quotes:

select * where contains("tensor-name", 'text_value')

select * where "tensor_name/group_name" == numeric_value

Make sure to wrap double-quotes with escape characters in Python:

select * where contains(\"tensor-name\", 'text_value')

SHAPE

select * where shape(tensor_name)[dimension_index] > numeric_value 
select * where shape(tensor_name)[1] > numeric_value # Second array dimension > value

LIMIT

select * where contains(tensor_name, 'text_value') limit num_samples

AND, OR, NOT

select * where contains(tensor_name, 'text_value') and NOT contains(tensor_name_2, numeric_value)
select * where contains(tensor_name, 'text_value') or tensor_name_2 == numeric_value

select * where (contains(tensor_name, 'text_value') and shape(tensor_name_2)[dimension_index]>numeric_value) or contains(tensor_name, 'text_value_2')

UNION and INTERSECT

(select * where contains(tensor_name, 'value')) intersect (select * where contains(tensor_name, 'value_2'))

(select * where contains(tensor_name, 'value') limit 100) union (select * where shape(tensor_name)[0] > numeric_value limit 100)

ORDER BY

-- Order by requires that sample is numeric and has 1 value, 
-- i.e. no lists or multi-dimensional arrays

-- The default order is ASCENDING (asc)

select * where contains(tensor_name, 'text_value') order by tensor_name asc

ANY, ALL, and ALL_STRICT

all adheres to NumPy and list logic where all(empty_sample) returns True

all_strict is more intuitive for queries so all_strict(empty_sample) returns False

select * where all(tensor_name==0) # Returns True for empty samples

select * where all_strict(tensor_name[:,2]>numeric_value) # Returns False for empty samples

select * where any(tensor_name[0:6]>numeric_value)

IN and BETWEEN

Only works for scalar numeric values and text references to class_names

select * where tensor_name in (1, 2, 6, 10)

select * where class_label_tensor_name in ('car', 'truck')

select * where tensor_name between 5 and 20

LOGICAL_AND and LOGICAL_OR

select * where any(logical_and(tensor_name_1[:,3]>numeric_value, tensor_name_2 == 'text_value'))

REFERENCING SAMPLES IN EXISTING TENORS

# Select based on index (row_number)
select * where row_number() == 10

# Referencing values of of a tensor at index (row_number)
select * order by l2_norm(<tensor_name> - data(<tensor_name>, index))
# Finds rows of data with embeddings most similar to index 10
select * order by l2_norm(embedding - data(embedding, 10)) 

SAMPLE BY

select * sample by weight_choice(expression_1: weight_1, expression_2: weight_2, ...)
        replace True limit N
  • weight_choice resolves the weight that is used when multiple expressions evaluate to True for a given sample. Options are max_weight, sum_weight. For example, if weight_choice is max_weight, then the maximum weight will be chosen for that sample.

  • replace determines whether samples should be drawn with replacement. It defaults to True.

  • limit specifies the number of samples that should be returned. If unspecified, the sampler will return the number of samples corresponding to the length of the dataset

EMBEDDING SEARCH

Deep Lake supports several vector operations for embedding search. Typically, vector operations are called by returning data ordered by the score based on the vector search method.

select * from (select tensor_1, tensor_2, <VECTOR_OPERATION> as score) order by score desc limit 10

-- THE SUPPORTED VECTOR_OPERATIONS ARE:

l1_norm(<embedding_tensor> - ARRAY[<search_embedding>]) # Order should be asc

l2_norm(<embedding_tensor> - ARRAY[<search_embedding>]) # Order should be asc

linf_norm(<embedding_tensor> - ARRAY[<search_embedding>]) # Order should be asc

cosine_similarity(<embedding_tensor>, ARRAY[<search_embedding>]) # Order should be desc

VIRTUAL TENSORS

Virtual tensors are the result of a computation and are not tensors in the Deep Lake dataset. However, they can be treated as tensors in the API.

-- "score" is a virtual tensor
select * from (select tensor_1, tensor_2, <VECTOR_OPERATION> as score) order by score desc limit 10

-- "box_beyond_image" is a virtual tensor
select *, any(boxes[:,0])<0 as box_beyond_image where ....

-- "tensor_sum" is a virtual tensor
select *, tensor_1 + tensor_3 as tensor_sum where ......

When combining embedding search with filtering (where conditions), the filter condition is evaluated prior to the embedding search.

GROUP BY AND UNGROUP BY

Group by creates a sequence of data based on the common properties that are being grouped (i.e. frames into videos). Ungroup by splits sequences into their individual elements (i.e. videos into images).

select * group by label, video_id # Groups all data with the same label and video_id in to the same sequence

select * ungroup by split # Splits sequences into their original pieces

EXPAND BY

Expand by includes samples before and after a query condition is satisfied.

select * where <condition> expand by rows_before, rows_after 
PreviousTensor Query Language (TQL)NextIndex for ANN Search

Was this helpful?

📚