profile picture

Secrets Management in Google Colab

Feb 28, 2022; last updated on Mar 07, 2022 • machine-learning google-colab

Managing secrets on Google Colab is difficult and cumbersome. I present a solution that is painless, powerful and secure.

Whenever I create a new Google Colab notebook, I go through many of the same steps: clone a private repository from GitHub, access data from my AWS or GCS buckets, log outputs to Weights & Biases, etc. However, all of these things require that I authenticate with various accounts. The problem is I don’t want to save my secrets in the Colab notebook because I might accidentally commit them. Also, that whole procedure is pretty cumbersome, so I want to automate as much of it as I can.

In the following article, I’ll tell you about the solution I came up with. It is pretty simple, but it’s also very powerful, extensible, and requires only two additional lines at the top of every notebook. And it works not only for Google Colab but also for any other Jupyter notebook, like those hosted on Kaggle, Deepnote, Paperspace, or any other provider.

To give you the big picture, we will:

Step 1: Save credentials

Let’s say we want to access our private resources from GitHub and AWS. To save our credentials, we first need to convert them into a form suitable for the password manager (e.g. Bitwarden). We do this by stringifying a dict containing all our secrets:

import json

credentials = {
    "github_user": "<your-github-username>",
    "github_pat": "<your-github-personal-access-token>",
    "aws_access_key_id": "<your-aws-key-id>",
    "aws_secret_access_key": "<your-aws-secret-access-key>",
}
print(json.dumps(credentials))

Now we simply copy the resulting string and put it in our password manager connected with colab.research.google.com, so each time we visit Google Colab, this will be the first item that pops up.

Step 2: Create Gist

Next, we save the following code as a public Gist, so we can easily and safely download it later. Note that this code contains no secrets, so our Gist can be public. This is important because we want to be able to download the Gist without having to authenticate with GitHub, which would, you know, kind of defeat the whole purpose of this exercise.

import json
import os
from getpass import getpass
from pathlib import Path
from typing import Dict, List, Optional, Union


def load_secrets(secrets: Optional[Union[str, Dict[str, str]]] = None, overwrite=False) -> None:
    """
    Loads secrets and sets up some env vars and credential files.

    If the `secrets` param is empty, you will be prompted to input a stringified json dict containing your secrets. Otherwise, the secrets will be loaded from the given string or dict.

    The following types of credentials are supported:

    GitHub Credentials:
        `github_user`: GitHub Username
        `github_pat`: GitHub Personal Access Token

    AWS Credentials:
        `aws_access_key_id`: AWS Key ID
        `aws_secret_access_key`: AWS Access Key
    """

    if secrets and isinstance(secrets, str):
        secrets = json.loads(secrets)

    if not secrets:
        input = getpass("Secrets (JSON string): ")
        secrets = json.loads(input) if input else {}

    if "github_user" in secrets:
        os.environ["GH_USER"] = secrets["github_user"]
        os.environ["GH_PAT"] = secrets["github_pat"]
        # provide a custom credential helper to git so that it uses your env vars
        os.system("""git config --global credential.helper '!f() { printf "%s\n" "username=$GH_USER" "password=$GH_PAT"; };f'""")

    if "aws_access_key_id" in secrets:
        home = Path.home()
        aws_id = secrets["aws_access_key_id"]
        aws_key = secrets["aws_secret_access_key"]
        (home / ".aws/").mkdir(parents=True, exist_ok=True)
        with open(home / ".aws/credentials", "w") as fp:
            fp.write(f"[default]\naws_access_key_id = {aws_id}\naws_secret_access_key = {aws_key}\n")

By clicking on “raw” at the top right of your new public Gist, you get the raw file and can copy its URL, which will look something like https://gist.github.com/USERNAME/GIST_ID/raw/COMMIT_ID/FILENAME. To always refer to the latest version, remove the COMMIT_ID/ part, and you’ll end up with a link like: https://gist.github.com/USERNAME/GIST_ID/raw/FILENAME.

We can, of course, put all kinds of things we commonly need into this Gist. Here are some examples:

For maximum aesthetics (your vertical screws), you can create a shortlink with a memorable name. In my case, this would be bit.ly/aseifert-colab-setup.

Step 3: Profit

Finally, all we need to do is run the following two lines. There will be a password prompt, which will automatically be filled in by our password manager.

!wget -q bit.ly/aseifert-colab-setup
%run aseifert-colab-setup

And that’s it. Now go ahead and put all those secret credentials to good use. Best of luck!


Share on:
❧
profile picture

Alexander Seifert

Hi, I'm Alex and I write this blog. Here you'll find articles and tutorials mostly about Natural Language Processing and related areas.

Follow me on Twitter for updates or contact me.