Setup an Athena Data Lake in Two Minutes

Available on Linux, Windows, and Mac

1. Setup your data

/* file structure */
page-view/partitiondate=YYYY-MM-DD/userid=USER_GUID
PAGE_VIEW_GUID.JSON

/* sample data */
{
"id": "260b3231-dafb-4c58-8945-7cb90178d870",
"name": "dashboard",
"userId": "3c19c5e6-f1bd-4e5c-92f3-a035e64d0acc",
"createdAt": "2020-08-06T02:34:04.436Z"
}

2. Create the Athena table

CREATE EXTERNAL TABLE `page_view`(
`id` string COMMENT 'from deserializer',
`name` string COMMENT 'from deserializer',
`userid` string COMMENT 'from deserializer',
`createdat` string COMMENT 'from deserializer',
`params` string COMMENT 'from deserializer',
`leadid` string COMMENT 'from deserializer')
PARTITIONED BY (
`partition_date` string,
`user_id` string)
ROW FORMAT SERDE
'org.openx.data.jsonserde.JsonSerDe'
WITH SERDEPROPERTIES (
'paths'='createdAt,id,leadId,name,params,userId')
STORED AS INPUTFORMAT
'org.apache.hadoop.mapred.TextInputFormat'
OUTPUTFORMAT
'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
LOCATION
's3://commandeer-dev-analytics/page-view/'
TBLPROPERTIES (
'CrawlerSchemaDeserializerVersion'='1.0',
'CrawlerSchemaSerializerVersion'='1.0',
'UPDATED_BY_CRAWLER'='commandeer-analytics-crawler',
'averageRecordSize'='241',
'classification'='json',
'compressionType'='none',
'objectCount'='157',
'recordCount'='157',
'sizeKey'='42375',
'typeOfData'='file')

3. Index your Data

4. Run your Queries

Conclusion

Available on Linux, Windows, and Mac

--

--

--

Helping Cloud Developers manage their local and cloud environments.

Love podcasts or audiobooks? Learn on the go with our new app.

Recommended from Medium

One Analyst’s Guide for going from Good to Great

Deliver your project on time

Zoho Travel Agency CRM | Tour Operator CRM

Deploying Wordpress Site Using AWS EKS, Fargate, EFS and RDS

Traefik + Docker + OAuth: a free reverse proxy with TSL and Google OAuth2

STATEMENT ON MELI IDO

A cake for your cherry: what should go in the C++ standard library?

Why should any developer solve coding challenges?

An image with a text ‘DO MORE’

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Sergeant Serverless

Sergeant Serverless

Helping Cloud Developers manage their local and cloud environments.

More from Medium

How to setup Airflow Sensor’s mode as Reschedule

Dataset search engines as global data discovery tools

What are Abstract Data Flows and why should you use them ? (Long form)

Data engineering in a nutshell : coding up business rules on the one hand, and creating infrastructure on the other

Auto Scaling Impact on Data Scientists Day-to-Day