# Storage Options

**Deep Lake datasets can be stored locally, or on several cloud storage providers including Activeloop Storage, AWS S3, and Google Cloud Storage.** Datasets are accessed by choosing the correct prefix for the dataset `path` that is passed to methods such as `deeplake.load(path)`, `deeplake.dataset(path)`, and `deeplake.empty(path)`. The path prefixes are:

<table data-header-hidden><thead><tr><th width="222.76694359979138">Storage</th><th>Path</th><th>Notes</th></tr></thead><tbody><tr><td><strong>Storage Location</strong></td><td><strong>Path</strong></td><td><strong>Notes</strong></td></tr><tr><td><strong>Local</strong></td><td><code>/local_path</code></td><td></td></tr><tr><td><strong>Deep Lake Storage</strong></td><td><code>hub://org_id/dataset_name</code></td><td></td></tr><tr><td><strong>Deep Lake Managed DB</strong></td><td><code>hub://org_id/dataset_name</code></td><td>Specify <code>runtime = {"managed_db": True}</code> when creating the dataset</td></tr><tr><td><strong>AWS S3</strong></td><td><code>s3://bucket_name/dataset_name</code></td><td>Dataset can be connected to Deep Lake via <a href="managed-credentials">Managed Credentials</a></td></tr><tr><td><strong>Google Cloud</strong></td><td><code>gcs://bucket_name/dataset_name</code></td><td>Dataset can be connected to Deep Lake via <a href="managed-credentials">Managed Credentials</a></td></tr></tbody></table>

**If you chose to manage your credentials in Deep Lake, you can access datasets in your own Cloud Buckets using the Deep Lake path `hub://org_name/dataset_name` without having to pass credentials in the Python API.**

## Authentication for each cloud storage provider:

### Activeloop Storage and Managed Datasets

In order to gain access in Python to datasets stored in Activeloop, or datasets in other clouds that are [managed by Activeloop](https://docs-v3.activeloop.ai/v3.4.0/storage-and-credentials/managed-credentials), users must register on the [Deep Lake App](https://app.activeloop.ai/) or through the CLI, and login through the CLI using:

```bash
activeloop register

activeloop login
```

#### Authentication using tokens

Authentication can also be performed using tokens, which can be created after registration on the [Deep Lake App](https://app.activeloop.ai/) (Profile -> API tokens). Tokens can be passed to any Deep Lake function that requires authentication:

```
deeplake.load(path, token = "...")
deeplake.empty(path, token = "...")
...
```

{% hint style="warning" %}
Credentials created using the CLI login `!activeloop login` expire after 1000 hrs. Credentials created using API tokens in the [Deep Lake App](https://app.activeloop.ai/) expire after the time specified for the individual token. Therefore, long-term workflows should be run using API tokens in order to avoid expiration of credentials mid-workflow.
{% endhint %}

### AWS S3

Authentication with AWS S3 has 4 options:

1. Use Deep Lake on a machine in the AWS ecosystem that has access to the relevant S3 bucket via [AWS IAM](https://aws.amazon.com/iam/), in which case there is no need to pass credentials in order to access datasets in that bucket.
2. Configure AWS through the cli using `aws configure`. This creates a credentials file on your machine that is automatically access by Deep Lake during authentication.
3. Save the `AWS_ACCESS_KEY_ID` ,`AWS_SECRET_ACCESS_KEY` , and `AWS_SESSION_TOKEN (optional)` in environmental variables of the same name, which are loaded as default credentials if no other credentials are specified.
4. Create a dictionary with the `AWS_ACCESS_KEY_ID` ,`AWS_SECRET_ACCESS_KEY` , and `AWS_SESSION_TOKEN (optional)`, and pass it to Deep Lake using:

   **Note:** the dictionary keys must be lowercase!

```python
deeplake.load('s3://...', creds = {
   'aws_access_key_id': 'abc', 
   'aws_secret_access_key': 'xyz', 
   'aws_session_token': '123', # Optional
})
```

`endpoint_url` can be used for connecting to other object storages supporting S3-like API such as [MinIO](https://github.com/minio/minio), [StorageGrid](https://www.netapp.com/data-storage/storagegrid/) and others.

### Custom Storage with S3 API

In order to connect to other object storages supporting S3-like API such as [MinIO](https://github.com/minio/minio), [StorageGrid](https://www.netapp.com/data-storage/storagegrid/) and others, simply add `endpoint_url` the the `creds` dictionary.

```python
deeplake.load('s3://...', creds = {
   'aws_access_key_id': 'abc', 
   'aws_secret_access_key': 'xyz', 
   'aws_session_token': '123', # Optional
   'endpoint_url': 'http://localhost:8888'
})
```

### Google Cloud Storage

Authentication with Google Cloud Storage has 2 options:

1. Create a service account, download the JSON file containing the keys, and then pass that file to the `creds` parameter in `deeplake.load('gcs://.....', creds = 'path_to_keys.json')` . It is also possible to manually pass the information from the JSON file into the `creds` parameter using:  &#x20;

   `deeplake.load('gcs://.....', creds = {information from the JSON file})`
2. Authenticate through the browser using `deeplake.load('gcs://.....', creds = 'browser')`. This requires that the project credentials are stored on your machine, which happens after `gcloud` is [initialized](https://cloud.google.com/sdk/gcloud/reference/init) and [logged in](https://cloud.google.com/sdk/gcloud/reference/auth) through the CLI.
   1. After this step, re-authentication through the browser can be skipped using: `deeplake.load('gcs://.....', creds = 'cache')`
