Setup an Athena Data Lake in Two Minutes

Available on Linux, Windows, and Mac

1. Setup your data

/* file structure */
page-view/partitiondate=YYYY-MM-DD/userid=USER_GUID
PAGE_VIEW_GUID.JSON

/* sample data */
{
"id": "260b3231-dafb-4c58-8945-7cb90178d870",
"name": "dashboard",
"userId": "3c19c5e6-f1bd-4e5c-92f3-a035e64d0acc",
"createdAt": "2020-08-06T02:34:04.436Z"
}

2. Create the Athena table

CREATE EXTERNAL TABLE `page_view`(
`id` string COMMENT 'from deserializer',
`name` string COMMENT 'from deserializer',
`userid` string COMMENT 'from deserializer',
`createdat` string COMMENT 'from deserializer',
`params` string COMMENT 'from deserializer',
`leadid` string COMMENT 'from deserializer')
PARTITIONED BY (
`partition_date` string,
`user_id` string)
ROW FORMAT SERDE
'org.openx.data.jsonserde.JsonSerDe'
WITH SERDEPROPERTIES (
'paths'='createdAt,id,leadId,name,params,userId')
STORED AS INPUTFORMAT
'org.apache.hadoop.mapred.TextInputFormat'
OUTPUTFORMAT
'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
LOCATION
's3://commandeer-dev-analytics/page-view/'
TBLPROPERTIES (
'CrawlerSchemaDeserializerVersion'='1.0',
'CrawlerSchemaSerializerVersion'='1.0',
'UPDATED_BY_CRAWLER'='commandeer-analytics-crawler',
'averageRecordSize'='241',
'classification'='json',
'compressionType'='none',
'objectCount'='157',
'recordCount'='157',
'sizeKey'='42375',
'typeOfData'='file')

3. Index your Data

4. Run your Queries

Conclusion

Available on Linux, Windows, and Mac

--

--

--

Helping Cloud Developers manage their local and cloud environments.

Love podcasts or audiobooks? Learn on the go with our new app.

Recommended from Medium

How to implement read receipts in Sendbird Chat

Kotlin Coroutine Flow : Usage & some anti-patterns

Basics of Pandas for beginners

Sept 27ᵗʰ — Minor Test Release v1.2

The No-Code Generation is Arriving

Gettyimages 1161169055

20 Must-Have Chrome Extensions For Enterprises, SMEs & Startups in 2019

Fireblocks expands institutional access to Terra’s DeFi ecosystem

Evolution to devops

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Sergeant Serverless

Sergeant Serverless

Helping Cloud Developers manage their local and cloud environments.

More from Medium

Open Source Data Lake Table Formats: Evaluating Current Interest and Rate of Adoption

Image copyright: peshkov (123rf.com)

Auth between Amazon MSK + Glue

Apache Airflow migraiton journey from self-hosted to AWS Managed Airflow

A DevOps approach to building, Scalable Cloud-Native Data Engineering Applications