Connecting Superset and Amazon Athena

1 minute read

Apache Superset is an open-source business intelligence tool that can be connected with many different data sources. Amazon Athena is a query engine built on top of Presto and can be used to analyze data stored in S3.

This post shows how to use Amazon Athena as a data source.

Install Superset

If you’re running Windows 10 I recommend installing Superset in Ubuntu on Windows.

Start installing prerequisites:

sudo apt update
sudo apt install build-essential libssl-dev libffi-dev python3-dev python3-pip libsasl2-dev libldap2-dev python3-venv

Create a virtual environment for the Superset installation:

python3 -m venv superset
source superset/bin/activate

The following may not be needed but I had to change version of the following dependencies like so:

pip install "SQLAlchemy<1.4.0"
pip install "itsdangerous<2.0,>=0.24"

Now install Superset:

pip install apache-superset

In order to make Superset “talk” to Athena install PyAthena which is a Python client for Athena:

pip install PyAthena[SQLAlchemy]

Now initialize the database:

superset db upgrade

and create an admin user:

export FLASK_APP=superset
superset fab create-admin

and finally, setup roles and permissions:

superset init

Now you’re ready to fire up Superset:

superset run -p 8088

and go to http://localhost:8088

In the Superset web UI select Databases:

title

and create new database:

title

In the add database dialog, name your database and add a connection string with appropriate values:

awsathena+rest://<aws-access-key-id>:<aws-secret-access-key>@athena.eu-west-1.amazonaws.com/etldatabase__prod?s3_staging_dir=<uri-of-staging-directory>

Now open SQL Lab -> SQL Editor on you’re ready to query your Glue database in Superset!

Updated: