This post provides essential instructions on how to get started with Amazon Elastic MapReduce (Amazon EMR). You will learn how to create a sample Amazon EMR cluster by using the AWS Management Console. You then run a Hive script to process data stored in Amazon S3.
Cost
Pricing for Amazon EMR varies by region and service. For this example, charges accrue for the Amazon EMR cluster and Amazon Simple Storage Service (Amazon S3) storage of the log data and output from the Hive job. If you are within your first year of using AWS, some or all of your charges for Amazon S3 might be waived if you are within your usage limits of the AWS Free Tier.
For more information about Amazon EMR pricing and the AWS Free Tier, go to Amazon EMR Pricing and AWS Free Tier.
You can use the Amazon Web Services Simple Monthly Calculator to estimate your bill.
Sample EMR Cluster Prerequisites
The following are the preliminary steps you must perform to complete the example.
- Create an AWS account.
- Create an S3 bucket.
The example in this topic uses an S3 bucket to store log files and output data.
Due to Hadoop constraints, the bucket name should conform to these requirements:- It must contain lower case letter, numbers, periods and hyphens.
- It cannot end with a number.
Example: mycompany.username.vernumber-emr-quickstart.
- Click on the S3 bucket name. The bucket page is displayed.
- Create 2 folders named: logs and output respectively.
Make sure that the output folder is empty. For more information, see Creating a Folder. - Create an Amazon EC2 Key Pair.
You need the key pair to connect to the nodes in the cluster.
Launch the Sample Amazon EMR Cluster
- In your browser, navigate to the Amazon management console.
- In the Analytics section click on EMR. The console dashboard is displayed.
- Click the Create cluster button.
The Create Cluster – Quick Options page is displayed.
For more information, see Using Quick Cluster Configuration Overview. - Accept the default values except for the following fields:
- In the Cluster name box, enter any name that has meaning to you
- For the S3 folder box, click on the folder icon to select the path to the logs folder that you created.
- For the EC2 key pair box, from the drop-down list, choose the key pair that you created.
- Click the Create cluster button.