Connecting Superset and Amazon Athena
Apache Superset is an open-source business intelligence tool that can be connected with many different data sources. Amazon Athena is a query engine built on top of Presto and can be used to analyze data stored in S3.
This post shows how to use Amazon Athena as a data source.
Install Superset
If you’re running Windows 10 I recommend installing Superset in Ubuntu on Windows.
Start installing prerequisites:
sudo apt update
sudo apt install build-essential libssl-dev libffi-dev python3-dev python3-pip libsasl2-dev libldap2-dev python3-venv
Create a virtual environment for the Superset installation:
python3 -m venv superset
source superset/bin/activate
The following may not be needed but I had to change version of the following dependencies like so:
pip install "SQLAlchemy<1.4.0"
pip install "itsdangerous<2.0,>=0.24"
Now install Superset:
pip install apache-superset
In order to make Superset “talk” to Athena install PyAthena which is a Python client for Athena:
pip install PyAthena[SQLAlchemy]
Now initialize the database:
superset db upgrade
and create an admin user:
export FLASK_APP=superset
superset fab create-admin
and finally, setup roles and permissions:
superset init
Now you’re ready to fire up Superset:
superset run -p 8088
and go to http://localhost:8088
In the Superset web UI select Databases:
and create new database:
In the add database dialog, name your database and add a connection string with appropriate values:
awsathena+rest://<aws-access-key-id>:<aws-secret-access-key>@athena.eu-west-1.amazonaws.com/etldatabase__prod?s3_staging_dir=<uri-of-staging-directory>
Now open SQL Lab -> SQL Editor on you’re ready to query your Glue database in Superset!