LogoLogo
API ReferenceGitHubSlackService StatusLogin
v3.8.16
v3.8.16
  • Deep Lake Docs
  • Vector Store Quickstart
  • Deep Learning Quickstart
  • Storage & Credentials
    • Storage Options
    • User Authentication
    • Storing Deep Lake Data in Your Own Cloud
      • Microsoft Azure
        • Provisioning Federated Credentials
        • Enabling CORS
      • Amazon Web Services
        • Provisioning Role-Based Access
        • Enabling CORS
  • List of ML Datasets
  • 🏢High-Performance Features
    • Introduction
    • Performant Dataloader
    • Tensor Query Language (TQL)
      • TQL Syntax
      • Sampling Datasets
    • Deep Memory
      • How it Works
    • Index for ANN Search
      • Caching and Optimization
    • Managed Tensor Database
      • REST API
      • Migrating Datasets to the Tensor Database
  • 📚EXAMPLE CODE
    • Getting Started
      • Vector Store
        • Step 1: Hello World
        • Step 2: Creating Deep Lake Vector Stores
        • Step 3: Performing Search in Vector Stores
        • Step 4: Customizing Vector Stores
      • Deep Learning
        • Step 1: Hello World
        • Step 2: Creating Deep Lake Datasets
        • Step 3: Understanding Compression
        • Step 4: Accessing and Updating Data
        • Step 5: Visualizing Datasets
        • Step 6: Using Activeloop Storage
        • Step 7: Connecting Deep Lake Datasets to ML Frameworks
        • Step 8: Parallel Computing
        • Step 9: Dataset Version Control
        • Step 10: Dataset Filtering
    • Tutorials (w Colab)
      • Vector Store Tutorials
        • Vector Search Options
          • Deep Lake Vector Store API
          • REST API
          • LangChain API
        • Image Similarity Search
        • Deep Lake Vector Store in LangChain
        • Deep Lake Vector Store in LlamaIndex
        • Improving Search Accuracy using Deep Memory
      • Deep Learning Tutorials
        • Creating Datasets
          • Creating Complex Datasets
          • Creating Object Detection Datasets
          • Creating Time-Series Datasets
          • Creating Datasets with Sequences
          • Creating Video Datasets
        • Training Models
          • Splitting Datasets for Training
          • Training an Image Classification Model in PyTorch
          • Training Models Using MMDetection
          • Training Models Using PyTorch Lightning
          • Training on AWS SageMaker
          • Training an Object Detection and Segmentation Model in PyTorch
        • Updating Datasets
        • Data Processing Using Parallel Computing
      • Concurrent Writes
        • Concurrency Using Zookeeper Locks
    • Playbooks
      • Querying, Training and Editing Datasets with Data Lineage
      • Evaluating Model Performance
      • Training Reproducibility Using Deep Lake and Weights & Biases
      • Working with Videos
    • Low-Level API Summary
  • 🔬Technical Details
    • Best Practices
      • Creating Datasets at Scale
      • Training Models at Scale
      • Storage Synchronization and "with" Context
      • Restoring Corrupted Datasets
      • Concurrent Writes
    • Data Layout
    • Version Control and Querying
    • Dataset Visualization
    • Tensor Relationships
    • Visualizer Integration
    • Shuffling in dataloaders
    • How to Contribute
Powered by GitBook
On this page
  • Authentication for each cloud storage provider:
  • Activeloop Storage and Managed Datasets
  • AWS S3
  • Custom Storage with S3 API
  • Microsoft Azure
  • Google Cloud Storage

Was this helpful?

  1. Storage & Credentials

Storage Options

How to authenticate using Activeloop storage, AWS S3, and Google Cloud Storage.

PreviousStorage & CredentialsNextUser Authentication

Was this helpful?

Deep Lake datasets can be stored locally, or on several cloud storage providers including Deep Lake Storage, AWS S3, Microsoft Azure, and Google Cloud Storage. Datasets are accessed by choosing the correct prefix for the dataset path that is passed to methods such as deeplake.load(path), and deeplake.empty(path). The path prefixes are:

Storage Location

Path

Notes

Local

/local_path

Deep Lake Storage

hub://org_id/dataset_name

Deep Lake Managed DB

hub://org_id/dataset_name

Specify runtime = {"tensor_db": True} when creating the dataset

AWS S3

s3://bucket_name/dataset_name

Microsoft Azure (Gen2 DataLake Only)

azure://account_name/container_name/dataset_name

Google Cloud

gcs://bucket_name/dataset_name

Connecting Deep Lake datasets stored in your own cloud via Deep Lake is required for accessing enterprise features, and it significantly simplifies dataset access.

Authentication for each cloud storage provider:

Activeloop Storage and Managed Datasets

In order to access datasets stored in Deep Lake, or datasets in other clouds that are , users must register and authenticate using the steps in the link below:

AWS S3

Authentication with AWS S3 has 4 options:

  1. Use Deep Lake on a machine in the AWS ecosystem that has access to the relevant S3 bucket via , in which case there is no need to pass credentials in order to access datasets in that bucket.

  2. Configure AWS through the cli using aws configure. This creates a credentials file on your machine that is automatically access by Deep Lake during authentication.

  3. Save the AWS_ACCESS_KEY_ID ,AWS_SECRET_ACCESS_KEY , and AWS_SESSION_TOKEN (optional) in environmental variables of the same name, which are loaded as default credentials if no other credentials are specified.

  4. Create a dictionary with the AWS_ACCESS_KEY_ID ,AWS_SECRET_ACCESS_KEY , and AWS_SESSION_TOKEN (optional), and pass it to Deep Lake using:

    Note: the dictionary keys must be lowercase!

# Vector Store API
vector_store = VectorStore('s3://<bucket_name>/<dataset_name>', 
                           creds = {
                               'aws_access_key_id': <your_access_key_id>,
                               'aws_secret_access_key': <your_aws_secret_access_key>,
                               'aws_session_token': <your_aws_session_token>, # Optional
                               }
                               )

# Low Level API
ds = deeplake.load('s3://<bucket_name>/<dataset_name>', 
                   creds = {
                       'aws_access_key_id': <your_access_key_id>,
                       'aws_secret_access_key': <your_aws_secret_access_key>,
                       'aws_session_token': <your_aws_session_token>, # Optional
                       }
                       )

Custom Storage with S3 API

# Vector Store API
vector_store = VectorStore('s3://...', 
                           creds = {
                               'aws_access_key_id': <your_access_key_id>,
                               'aws_secret_access_key': <your_aws_secret_access_key>,
                               'aws_session_token': <your_aws_session_token>, # Optional
                               'endpoint_url': 'http://localhost:8888'
                               }
                               )

# Low Level API
ds = deeplake.load('s3://...', 
                   creds = {
                       'aws_access_key_id': <your_access_key_id>,
                       'aws_secret_access_key': <your_aws_secret_access_key>,
                       'aws_session_token': <your_aws_session_token>, # Optional
                       'endpoint_url': 'http://localhost:8888'
                       }
                       )

Microsoft Azure

Authentication with Microsoft Azure has 4 options:

  1. Log in from your machine's CLI using az login.

  2. Save the AZURE_STORAGE_ACCOUNT, AZURE_STORAGE_KEY , or other credentials in environmental variables of the same name, which are loaded as default credentials if no other credentials are specified.

  3. Create a dictionary with the ACCOUNT_KEY or SAS_TOKEN and pass it to Deep Lake using:

    Note: the dictionary keys must be lowercase!

# Vector Store API
vector_store = VectorStore('azure://<account_name>/<container_name>/<dataset_name>', 
                           creds = {
                               'account_key': <your_account_key>,
                               'sas_token': <your_sas_token>,
                               }
                               )

# Low Level API
ds = deeplake.load('azure://<account_name>/<container_name>/<dataset_name>', 
                   creds = {
                       'account_key': <your_account_key>, 
                       #OR
                       'sas_token': <your_sas_token>,
                       }
                       )

Google Cloud Storage

Authentication with Google Cloud Storage has 2 options:

  1. Create a service account, download the JSON file containing the keys, and then pass that file to the creds parameter in deeplake.load('gcs://.....', creds = 'path_to_keys.json') . It is also possible to manually pass the information from the JSON file into the creds parameter using:

    # Vector Store API
    vector_store = VectorStore('gcs://.....', 
                               creds = {<information from the JSON file>}
                               )
    
    # Low Level API
    ds = deeplake.load('gcs://.....', 
                       creds = {<information from the JSON file>}
                       )
  2. # Vector Store API
    vector_store = VectorStore('gcs://.....', 
                               creds = 'browser' # Switch to 'cache' after doing this once
                               )
    
    # Low Level API
    ds = deeplake.load('gcs://.....', 
                       creds = 'browser' # Switch to 'cache' after doing this once
                       )

Dataset can be connected to Deep Lake via

Dataset can be connected to Deep Lake via

Dataset can be connected to Deep Lake via

endpoint_url can be used for connecting to other object storages supporting S3-like API such as , and others.

In order to connect to other object storages supporting S3-like API such as , and others, simply add endpoint_url the the creds dictionary.

Authenticate through the browser using the steps below. This requires that the project credentials are stored on your machine, which happens after gcloud is and through the CLI. Afterwards, creds can be switched to creds = 'cache'.

Managed Credentials
managed by Activeloop
User Authentication
AWS IAM
MinIO
StorageGrid
MinIO
StorageGrid
initialized
logged in
Managed Credentials
Managed Credentials
Managed Credentials