Query Syntax
How to properly format TQL queries
Query syntax for the Tensor Query Language (TQL)
CONTAINS and ==
# Exact match, which generally requires that the sample
# has 1 value, i.e. no lists or multi-dimensional arrays
select * where tensor_name == 'text_value' # If value is numeric
select * where tensor_name == numeric_value # If values is text
select * where contains(tensor_name, 'text_value')
Any special characters in tensor or group names should be wrapped with double-quotes:
select * where contains("tensor-name", 'text_value')
select * where "tensor_name/group_name" == numeric_value
SHAPE
select * where shape(tensor_name)[dimension_index] > numeric_value
select * where shape(tensor_name)[1] > numeric_value # Second array dimension > value
LIMIT
select * where contains(tensor_name, 'text_value') limit num_samples
AND, OR, NOT
select * where contains(tensor_name, 'text_value') and NOT contains(tensor_name_2, numeric_value)
select * where contains(tensor_name, 'text_value') or tensor_name_2 == numeric_value
select * where (contains(tensor_name, 'text_value') and shape(tensor_name_2)[dimension_index]>numeric_value) or contains(tensor_name, 'text_value_2')
UNION and INTERSECT
(select * where contains(tensor_name, 'value')) intersect (select * where contains(tensor_name, 'value_2'))
(select * where contains(tensor_name, 'value') limit 100) union (select * where shape(tensor_name)[0] > numeric_value limit 100)
ORDER BY
# Order by requires that sample is numeric and has 1 value,
# i.e. no lists or multi-dimensional arrays
# The default order is ASCENDING (asc)
select * where contains(tensor_name, 'text_value') order by tensor_name asc
ANY, ALL, and ALL_STRICT
select * where all_strict(tensor_name[:,2]>numeric_value)
select * where any(tensor_name[0:6]>numeric_value)
all
adheres to NumPy and list logic where all(empty_sample)
returns True
all_strict
is more intuitive for queries so all_strict(empty_sample)
returns False
LOGICAL_AND and LOGICAL_OR
select * where any(logical_and(tensor_name_1[:,3]>numeric_value, tensor_name_2 == 'text_value'))
SAMPLE BY
select * sample by weight_choice(expression_1: weight_1, expression_2: weight_2, ...)
replace True limit N
weight_choice
resolves the weight that is used when multiple expressions evaluate toTrue
for a given sample. Options aremax_weight, sum_weight
. For example, ifweight_choice
ismax_weight
, then the maximum weight will be chosen for that sample.replace
determines whether samples should be drawn with replacement. It defaults toTrue
.limit
specifies the number of samples that should be returned. If unspecified, the sampler will return the number of samples corresponding to the length of the dataset
EMBEDDING SEARCH
Deep Lake supports several vector operations for embedding search. Typically, vector operations are called by returning data ordered by the score based on the vector search method.
select * from (select tensor_1, tensor_2, <VECTOR_OPERATION> as score) order by score desc limit 10
# THE SUPPORTED VECTOR_OPERATIONS ARE:
l1_norm(<embedding_tensor> - ARRAY[<search_embedding>]) # Order should be asc
l2_norm(<embedding_tensor> - ARRAY[<search_embedding>]) # Order should be asc
linf_norm(<embedding_tensor> - ARRAY[<search_embedding>]) # Order should be asc
cosine_similarity(<embedding_tensor>, ARRAY[<search_embedding>]) # Order should be desc
Last updated
Was this helpful?