Quickstart
A jump-start guide to using Deep Lake.
How to Get Started with Activeloop Deep Lake in Under 5 Minutes
Installing Deep Lake
Deep Lake can be installed using pip. By default, Deep Lake does not install dependencies for video, google-cloud, and other features. Details on all installation options are available here.
$ pip3 install deeplake
Fetching Your First Deep Lake Dataset
Let's load the Visdrone dataset, a rich dataset with many object detections per image. Datasets hosted by Activeloop are identified by the host organization id followed by the dataset name: activeloop/visdrone-det-train
.
import deeplake
dataset_path = 'hub://activeloop/visdrone-det-train'
ds = deeplake.load(dataset_path) # Returns a Deep Lake Dataset but does not download data locally
Reading Samples From a Deep Lake Dataset
Data is not immediately read into memory because Deep Lake operates lazily. You can fetch data by calling the .numpy()
or .data()
methods:
# Indexing
image = ds.images[0].numpy() # Fetch the first image and return a numpy array
labels = ds.labels[0].data() # Fetch the labels in the first image
# Slicing
img_list = ds.labels[0:100].numpy(aslist=True) # Fetch 100 labels and store
# them as a list of numpy arrays
Other metadata such as the mapping between numerical labels and their text counterparts can be accessed using:
labels_list = ds.labels.info['class_names']
Visualizing a Deep Lake Dataset
Deep Lake enables users to visualize and interpret large datasets. The tensor layout for a dataset can be inspected using:
ds.summary()
The dataset can be visualized in the Deep Lake UI, or using an iframe in a Jupyter notebook:
ds.visualize()
Creating Your Own Deep Lake Datasets
You can access all of the features above and more with your own datasets! If your source data conforms to one of the formats below, you can ingest them directly with 1 line of code. The ingestion functions support source data from the cloud, as well as creation of Deep Lake datasets in the cloud.
Classifications
For example, a COCO format dataset can be ingested using:
dataset_path = 's3://bucket_name_deeplake/dataset_name' # Destination for the Deep Lake dataset
images_folder = 's3://bucket_name_source/images_folder'
annotations_files = ['s3://bucket_name_source/annotations.json'] # Can be a list of COCO jsons.
ds = deeplake.ingest_coco(images_folder, annotations_files, dataset_path, src_creds = {...}, dest_creds = {...})
For creating datasets that do not conform to one of the formats above, you can use our methods for manually creating datasets, tensors, and populating them with data.
Authentication
In order to use features in the Python API that require authentication (Activeloop storage, connecting your cloud dataset to the Deep Lake UI, etc.) you should register in the Deep Lake App and authenticate on your machine using the following methods:
Login via CLI using your username+password, or your API Token
activeloop login -u <your_username> -p <your_password>
OR
activeloop login -t <your_token>
Pass the API token directly to the Python method that requires authentication
ds = deeplake.load('hub://org_name/dataset_name', token = <your_token>)
Next Steps
Check out our Getting Started Guide for a comprehensive walk-through of Deep Lake. Also check out tutorials on Running Queries, Training Models, and Creating Datasets, as well as Playbooks about powerful use-cases that are enabled by Deep Lake.
Congratulations, you've got Deep Lake working on your local machine🤓
Last updated
Was this helpful?