AWS Redshift Athena external table set up

Assume you’ve got S3 bucket and Redshift cluster running in EU (Frankfurt).

Set up IAM role, e.g. s3acces with the following policies
aws2

Make sure you:

  • Enabled IAM role for your Redshift cluster
    aws3
  • Activated Security Token Service Regions
    aws1
    Otherwise you’ll get

    [2019-09-18 17:38:35] [42804][500310] [Amazon](500310) Invalid operation: User arn:aws:redshift:eu-central-1:${account_number}:${user}:${db}/${db_user} is not authorized to assume IAM Role arn:aws:iam::${account}:role/s3acces;

    More detail can be found at http://docs.aws.amazon.com/IAM/latest/UserGuide/id_credentials_temp_enable-regions.html

Navigate to Athena query editor https://eu-central-1.console.aws.amazon.com/athena/home?region=eu-central-1#query and create a database that points to your S3 bucket:

CREATE DATABASE IF NOT EXISTS athenadb
  LOCATION 's3://${your_backet_name}';

After that you can create an external schema and tables:

create external schema if not exists athena_test
from
database 'athenadb'
iam_role 'arn:aws:iam::${role_id}:role/s3acces'
;
create external table athena_test.test(
  venueid    integer,
  venuename  varchar(200),
  venuecity  varchar(200),
  venuestate varchar(200),
  venueseats integer
)
row format serde 'org.apache.hadoop.hive.serde2.RegexSerDe'
with serdeproperties (
  'serialization.format' = ',',
  'field.delim' = ',',
  'input.regex' = 'ff'
)
stored as textfile
location 's3://${your_bucket}/data/'
table properties  ('has_encrypted_data'='false')
;

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s