Step 2: Creating Deep Lake Datasets

Creating and storing Deep Lake Datasets.

How to Create Datasets in Deep Lake Format

This guide creates Deep Lake datasets locally. You may create datasets in the Activeloop cloud by registering, creating an API token, and replacing the local paths below with the path to your Deep Lake organization hub://organization_name/dataset_name

You don't have to worry about uploading datasets after you've created them. They are automatically synchronized with wherever they are being stored.

Let's follow along with the example below to create our first dataset manually. First, download and unzip the small classification dataset below called animals.

338KB
Open
animals dataset

The dataset has the following folder structure:

_animals
|_cats
    |_image_1.jpg
    |_image_2.jpg
|_dogs
    |_image_3.jpg
    |_image_4.jpg

Now that you have the data, you can create a Deep Lake Dataset and initialize its tensors. Running the following code will create Deep Lake dataset inside of the ./animals_deeplakefolder.

Next, let's inspect the folder structure for the source dataset './animals' to find the class names and the files that need to be uploaded to the Deep Lake dataset.

Next, let's create the dataset tensors and upload metadata. Check out our page on Storage Synchronization for details about the with syntax below.

Finally, let's populate the data in the tensors.

Check out the first image from this dataset. More details about Accessing Data are available in Step 4.

Dataset inspection

You can print a summary of the dataset structure using:

Congrats! You just created your first dataset! 🎉

Automatic Creation (Classification Datasets Only)

The above animals dataset can also be converted to Deep Lake format automatically using 1 line of code:

Creating Tensor Hierarchies

Often it's important to create tensors hierarchically, because information between tensors may be inherently coupled—such as bounding boxes and their corresponding labels. Hierarchy can be created using tensor groups:

Tensors in groups are accessed via:

For more detailed information regarding accessing datasets and their tensors, check out Step 4.

Last updated

Was this helpful?