LogoLogo
API ReferenceGitHubSlackService StatusLogin
v3.4.0
v3.4.0
  • Deep Lake Docs
  • List of ML Datasets
  • Quickstart
  • Dataset Visualization
  • Storage & Credentials
    • Storage Options
    • User Authentication
    • Managed Credentials
      • Enabling CORS
      • Provisioning Role-Based Access
  • API Reference
  • Enterprise Features
    • Querying Datasets
      • Sampling Datasets
    • Performant Dataloader
  • EXAMPLE CODE
  • Getting Started
    • Step 1: Hello World
    • Step 2: Creating Deep Lake Datasets
    • Step 3: Understanding Compression
    • Step 4: Accessing and Updating Data
    • Step 5: Visualizing Datasets
    • Step 6: Using Activeloop Storage
    • Step 7: Connecting Deep Lake Datasets to ML Frameworks
    • Step 8: Parallel Computing
    • Step 9: Dataset Version Control
    • Step 10: Dataset Filtering
  • Tutorials (w Colab)
    • Deep Lake Vector Store in LangChain
    • Creating Datasets
      • Creating Complex Datasets
      • Creating Object Detection Datasets
      • Creating Time-Series Datasets
      • Creating Datasets with Sequences
      • Creating Video Datasets
    • Training Models
      • Training an Image Classification Model in PyTorch
      • Training Models Using MMDetection
      • Training Models Using PyTorch Lightning
      • Training on AWS SageMaker
      • Training an Object Detection and Segmentation Model in PyTorch
    • Updating Datasets
    • Data Processing Using Parallel Computing
  • Playbooks
    • Querying, Training and Editing Datasets with Data Lineage
    • Evaluating Model Performance
    • Training Reproducibility Using Deep Lake and Weights & Biases
    • Working with Videos
  • API Summary
  • Technical Details
    • Best Practices
      • Creating Datasets at Scale
      • Training Models at Scale
      • Storage Synchronization and "with" Context
      • Restoring Corrupted Datasets
    • Data Layout
    • Version Control and Querying
    • Tensor Relationships
    • Visualizer Integration
    • Shuffling in dataloaders
    • How to Contribute
Powered by GitBook
On this page

Was this helpful?

  1. Technical Details

Version Control and Querying

Understanding Deep Lake's Version control and Querying Layout

PreviousData LayoutNextTensor Relationships

Last updated 2 years ago

Was this helpful?

Understanding the Interaction Between Deep Lake's Versions, Queries, and Dataset Views.

Version control is the core of the Deep Lake data format, and it interacts with queries and view as follows:

  • Datasets have commits and branches, and they can be traversed or merged using Deep Lake's Python API.

  • Queries are applied on top of commits, and in order to save a query result as a view, the dataset cannot be in an uncommitted state (no changes were performed since the prior commit).

  • Each saved view is associated with a particular commit, and the view itself contains information on which dataset indices satisfied the query condition.

This logical approach was chosen in order to preserve data lineage. Otherwise, it would be possible to change data on which a query was executed, thereby potentially invalidating the saved view, since the indices that satisfied the query condition may no longer be correct after the dataset was changed.

Please check out our Getting Stated Guide to learn how to use the Python API to version your data, run queries, and save views.

An example workflow using version control and queries is shown below.