Before we begin we should know few terms i.e. S3, EMR, Bucket
Amazon S3 stands for Amazon Simple Storage Service.
Amazon EMR stands for Amazon Elastic Map Reduce.
Bucket is a term used to store the data. We can place files, folders etc … inside a S3 Bucket.
For more detail about the terms refer to AWS website.
We will create a bucket to store the result. We will use MapReduce to find the number of times word is repeated. For this example we will use the sample py file and data which is already available in aws website as Word Count example. Continue reading “Amazon Elastic Map Reduce for Beginners”