Secrets Management in Google Colab
Feb 28, 2022; last updated on Mar 07, 2022 •Managing secrets on Google Colab is difficult and cumbersome. I present a solution that is painless, powerful and secure.
Whenever I create a new Google Colab notebook, I go through many of the same steps: clone a private repository from GitHub, access data from my AWS or GCS buckets, log outputs to Weights & Biases, etc. However, all of these things require that I authenticate with various accounts. The problem is I don’t want to save my secrets in the Colab notebook because I might accidentally commit them. Also, that whole procedure is pretty cumbersome, so I want to automate as much of it as I can.
In the following article, I’ll tell you about the solution I came up with. It is pretty simple, but it’s also very powerful, extensible, and requires only two additional lines at the top of every notebook. And it works not only for Google Colab but also for any other Jupyter notebook, like those hosted on Kaggle, Deepnote, Paperspace, or any other provider.
To give you the big picture, we will:
- save our credentials inside a password manager
- create code to authenticate with our accounts and put it in a Gist
- insert and run two lines of code at the top of every notebook
Step 1: Save credentials
Let’s say we want to access our private resources from GitHub and AWS. To save our credentials, we first need to convert them into a form suitable for the password manager (e.g. Bitwarden). We do this by stringifying a dict containing all our secrets:
import json
credentials = {
"github_user": "<your-github-username>",
"github_pat": "<your-github-personal-access-token>",
"aws_access_key_id": "<your-aws-key-id>",
"aws_secret_access_key": "<your-aws-secret-access-key>",
}
print(json.dumps(credentials))
Now we simply copy the resulting string and put it in our password manager connected with colab.research.google.com
, so each time we visit Google Colab, this will be the first item that pops up.
Step 2: Create Gist
Next, we save the following code as a public Gist, so we can easily and safely download it later. Note that this code contains no secrets, so our Gist can be public. This is important because we want to be able to download the Gist without having to authenticate with GitHub, which would, you know, kind of defeat the whole purpose of this exercise.
import json
import os
from getpass import getpass
from pathlib import Path
from typing import Dict, List, Optional, Union
def load_secrets(secrets: Optional[Union[str, Dict[str, str]]] = None, overwrite=False) -> None:
"""
Loads secrets and sets up some env vars and credential files.
If the `secrets` param is empty, you will be prompted to input a stringified json dict containing your secrets. Otherwise, the secrets will be loaded from the given string or dict.
The following types of credentials are supported:
GitHub Credentials:
`github_user`: GitHub Username
`github_pat`: GitHub Personal Access Token
AWS Credentials:
`aws_access_key_id`: AWS Key ID
`aws_secret_access_key`: AWS Access Key
"""
if secrets and isinstance(secrets, str):
secrets = json.loads(secrets)
if not secrets:
input = getpass("Secrets (JSON string): ")
secrets = json.loads(input) if input else {}
if "github_user" in secrets:
os.environ["GH_USER"] = secrets["github_user"]
os.environ["GH_PAT"] = secrets["github_pat"]
# provide a custom credential helper to git so that it uses your env vars
os.system("""git config --global credential.helper '!f() { printf "%s\n" "username=$GH_USER" "password=$GH_PAT"; };f'""")
if "aws_access_key_id" in secrets:
home = Path.home()
aws_id = secrets["aws_access_key_id"]
aws_key = secrets["aws_secret_access_key"]
(home / ".aws/").mkdir(parents=True, exist_ok=True)
with open(home / ".aws/credentials", "w") as fp:
fp.write(f"[default]\naws_access_key_id = {aws_id}\naws_secret_access_key = {aws_key}\n")
By clicking on “raw” at the top right of your new public Gist, you get the raw file and can copy its URL, which will look something like https://gist.github.com/USERNAME/GIST_ID/raw/COMMIT_ID/FILENAME
. To always refer to the latest version, remove the COMMIT_ID/
part, and you’ll end up with a link like: https://gist.github.com/USERNAME/GIST_ID/raw/FILENAME
.
We can, of course, put all kinds of things we commonly need into this Gist. Here are some examples:
- Any other kind of credentials we need
- Code that sets up a virtual environment, installs dependencies, etc. (we might want to put this in a bash script, however)
- Code that pulls data from a remote source (e.g. from AWS or GCS)
- Code that connects our notebook to Google Drive
- Helper functions that we commonly use in our notebooks
For maximum aesthetics (your vertical screws), you can create a shortlink with a memorable name. In my case, this would be bit.ly/aseifert-colab-setup.
Step 3: Profit
Finally, all we need to do is run the following two lines. There will be a password prompt, which will automatically be filled in by our password manager.
!wget -q bit.ly/aseifert-colab-setup
%run aseifert-colab-setup
And that’s it. Now go ahead and put all those secret credentials to good use. Best of luck!
Alexander Seifert
Hi, I'm Alex and I write this blog. Here you'll find articles and tutorials mostly about Natural Language Processing and related areas.
Follow me on Twitter for updates or contact me.