LogoLogo
API ReferenceGitHubSlackService StatusLogin
v3.8.16
v3.8.16
  • Deep Lake Docs
  • Vector Store Quickstart
  • Deep Learning Quickstart
  • Storage & Credentials
    • Storage Options
    • User Authentication
    • Storing Deep Lake Data in Your Own Cloud
      • Microsoft Azure
        • Provisioning Federated Credentials
        • Enabling CORS
      • Amazon Web Services
        • Provisioning Role-Based Access
        • Enabling CORS
  • List of ML Datasets
  • 🏢High-Performance Features
    • Introduction
    • Performant Dataloader
    • Tensor Query Language (TQL)
      • TQL Syntax
      • Sampling Datasets
    • Deep Memory
      • How it Works
    • Index for ANN Search
      • Caching and Optimization
    • Managed Tensor Database
      • REST API
      • Migrating Datasets to the Tensor Database
  • 📚EXAMPLE CODE
    • Getting Started
      • Vector Store
        • Step 1: Hello World
        • Step 2: Creating Deep Lake Vector Stores
        • Step 3: Performing Search in Vector Stores
        • Step 4: Customizing Vector Stores
      • Deep Learning
        • Step 1: Hello World
        • Step 2: Creating Deep Lake Datasets
        • Step 3: Understanding Compression
        • Step 4: Accessing and Updating Data
        • Step 5: Visualizing Datasets
        • Step 6: Using Activeloop Storage
        • Step 7: Connecting Deep Lake Datasets to ML Frameworks
        • Step 8: Parallel Computing
        • Step 9: Dataset Version Control
        • Step 10: Dataset Filtering
    • Tutorials (w Colab)
      • Vector Store Tutorials
        • Vector Search Options
          • Deep Lake Vector Store API
          • REST API
          • LangChain API
        • Image Similarity Search
        • Deep Lake Vector Store in LangChain
        • Deep Lake Vector Store in LlamaIndex
        • Improving Search Accuracy using Deep Memory
      • Deep Learning Tutorials
        • Creating Datasets
          • Creating Complex Datasets
          • Creating Object Detection Datasets
          • Creating Time-Series Datasets
          • Creating Datasets with Sequences
          • Creating Video Datasets
        • Training Models
          • Splitting Datasets for Training
          • Training an Image Classification Model in PyTorch
          • Training Models Using MMDetection
          • Training Models Using PyTorch Lightning
          • Training on AWS SageMaker
          • Training an Object Detection and Segmentation Model in PyTorch
        • Updating Datasets
        • Data Processing Using Parallel Computing
      • Concurrent Writes
        • Concurrency Using Zookeeper Locks
    • Playbooks
      • Querying, Training and Editing Datasets with Data Lineage
      • Evaluating Model Performance
      • Training Reproducibility Using Deep Lake and Weights & Biases
      • Working with Videos
    • Low-Level API Summary
  • 🔬Technical Details
    • Best Practices
      • Creating Datasets at Scale
      • Training Models at Scale
      • Storage Synchronization and "with" Context
      • Restoring Corrupted Datasets
      • Concurrent Writes
    • Data Layout
    • Version Control and Querying
    • Dataset Visualization
    • Tensor Relationships
    • Visualizer Integration
    • Shuffling in dataloaders
    • How to Contribute
Powered by GitBook
On this page
  • How to Write Data Concurrently to Deep Lake Datasets
  • Concurrency Using External Locks
  • Managed Concurrency
  • Concurrency Using Deep Lake Locks

Was this helpful?

  1. EXAMPLE CODE
  2. Tutorials (w Colab)

Concurrent Writes

Concurrent writes in Deep Lake

PreviousData Processing Using Parallel ComputingNextConcurrency Using Zookeeper Locks

Was this helpful?

How to Write Data Concurrently to Deep Lake Datasets

Deep Lake offers 3 solutions for concurrently writing data, depending on the required scale of the application. Concurrency is not native to the Deep Lake format, so these solutions use locks and queues to schedule and linearize the write operations to Deep Lake.

Concurrency Using External Locks

Concurrent writes can be supported using an in-memory database that serves as the locking mechanism for Deep Lake datasets. Tools such as or are highly performant and reliable and can be deployed using a few lines of code. External locks are recommended for small-to-medium workloads.

Managed Concurrency

COMING SOON. Deep Lake will offer a that supports read (search) and write operations at scale. Deep Lake ensures the operations are performant by provisioning the necessary infrastructure and executing the underlying user requests in a distributed manner. This approach is recommended for production applications that require a separate service to handle the high computational loads of vector search.

Concurrency Using Deep Lake Locks

Deep Lake datasets internally support file-based locks. File-base locks are generally slower and less reliable that the other listed solutions, and they should only be used for prototyping.

Default Behavior

By default, Deep Lake datasets are loaded in write mode and a lock file is created. This can be avoided by specifying read_only = True to APIs that load datasets.

An error will occur if the Deep Lake dataset is locked and the user tries to open it in write mode. To specify a waiting time for the lock to be released, you can specify lock_timeout = <timeout_in_s> to APIs that load datasets.

Manipulating Locks

Locks can manually be set or released using:

from deeplake.core.lock import lock_dataset, unlock_dataset

unlock_dataset(<dataset_path>)
lock_dataset(<dataset_path>)

📚
Zookeper
Redis
Concurrency Using Zookeeper Locks
Managed Tensor Database