# Storage Options **Deep Lake datasets can be stored locally, or on several cloud storage providers including Activeloop Storage, AWS S3, and Google Cloud Storage.** Datasets are accessed by choosing the correct prefix for the dataset `path` that is passed to methods such as `deeplake.load(path)`, `deeplake.dataset(path)`, and `deeplake.empty(path)`. The path prefixes are:

Storage	Path	Notes
Storage Location	Path	Notes
Local	`/local_path`
Deep Lake Storage	`hub://org_id/dataset_name`
Deep Lake Managed DB	`hub://org_id/dataset_name`	Specify `runtime = {"managed_db": True}` when creating the dataset
AWS S3	`s3://bucket_name/dataset_name`	Dataset can be connected to Deep Lake via Managed Credentials
Google Cloud	`gcs://bucket_name/dataset_name`	Dataset can be connected to Deep Lake via Managed Credentials

**If you chose to manage your credentials in Deep Lake, you can access datasets in your own Cloud Buckets using the Deep Lake path `hub://org_name/dataset_name` without having to pass credentials in the Python API.** ## Authentication for each cloud storage provider: ### Activeloop Storage and Managed Datasets In order to gain access in Python to datasets stored in Activeloop, or datasets in other clouds that are [managed by Activeloop](https://docs-v3.activeloop.ai/v3.4.0/storage-and-credentials/managed-credentials), users must register on the [Deep Lake App](https://app.activeloop.ai/) or through the CLI, and login through the CLI using: ```bash activeloop register activeloop login ``` #### Authentication using tokens Authentication can also be performed using tokens, which can be created after registration on the [Deep Lake App](https://app.activeloop.ai/) (Profile -> API tokens). Tokens can be passed to any Deep Lake function that requires authentication: ``` deeplake.load(path, token = "...") deeplake.empty(path, token = "...") ... ``` {% hint style="warning" %} Credentials created using the CLI login `!activeloop login` expire after 1000 hrs. Credentials created using API tokens in the [Deep Lake App](https://app.activeloop.ai/) expire after the time specified for the individual token. Therefore, long-term workflows should be run using API tokens in order to avoid expiration of credentials mid-workflow. {% endhint %} ### AWS S3 Authentication with AWS S3 has 4 options: 1. Use Deep Lake on a machine in the AWS ecosystem that has access to the relevant S3 bucket via [AWS IAM](https://aws.amazon.com/iam/), in which case there is no need to pass credentials in order to access datasets in that bucket. 2. Configure AWS through the cli using `aws configure`. This creates a credentials file on your machine that is automatically access by Deep Lake during authentication. 3. Save the `AWS_ACCESS_KEY_ID` ,`AWS_SECRET_ACCESS_KEY` , and `AWS_SESSION_TOKEN (optional)` in environmental variables of the same name, which are loaded as default credentials if no other credentials are specified. 4. Create a dictionary with the `AWS_ACCESS_KEY_ID` ,`AWS_SECRET_ACCESS_KEY` , and `AWS_SESSION_TOKEN (optional)`, and pass it to Deep Lake using: **Note:** the dictionary keys must be lowercase! ```python deeplake.load('s3://...', creds = { 'aws_access_key_id': 'abc', 'aws_secret_access_key': 'xyz', 'aws_session_token': '123', # Optional }) ``` `endpoint_url` can be used for connecting to other object storages supporting S3-like API such as [MinIO](https://github.com/minio/minio), [StorageGrid](https://www.netapp.com/data-storage/storagegrid/) and others. ### Custom Storage with S3 API In order to connect to other object storages supporting S3-like API such as [MinIO](https://github.com/minio/minio), [StorageGrid](https://www.netapp.com/data-storage/storagegrid/) and others, simply add `endpoint_url` the the `creds` dictionary. ```python deeplake.load('s3://...', creds = { 'aws_access_key_id': 'abc', 'aws_secret_access_key': 'xyz', 'aws_session_token': '123', # Optional 'endpoint_url': 'http://localhost:8888' }) ``` ### Google Cloud Storage Authentication with Google Cloud Storage has 2 options: 1. Create a service account, download the JSON file containing the keys, and then pass that file to the `creds` parameter in `deeplake.load('gcs://.....', creds = 'path_to_keys.json')` . It is also possible to manually pass the information from the JSON file into the `creds` parameter using: `deeplake.load('gcs://.....', creds = {information from the JSON file})` 2. Authenticate through the browser using `deeplake.load('gcs://.....', creds = 'browser')`. This requires that the project credentials are stored on your machine, which happens after `gcloud` is [initialized](https://cloud.google.com/sdk/gcloud/reference/init) and [logged in](https://cloud.google.com/sdk/gcloud/reference/auth) through the CLI. 1. After this step, re-authentication through the browser can be skipped using: `deeplake.load('gcs://.....', creds = 'cache')`