AWS Elastic Map Reduce Quick Start – Dashboard

This post provides essential instructions on how to get started with  Amazon Elastic MapReduce  (Amazon EMR). You will learn how to create a sample Amazon EMR cluster by using the AWS Management Console. You then run a Hive script to process data stored in Amazon S3.

The instructions in this example do not apply to production environments and they do not cover in depth configuration options. The example shows how to quickly set up a cluster for evaluation purposes. For questions or issues you can reach out to the Amazon EMR team by posting on the Discussion Forum.

Cost

The sample cluster that you create runs in a live environment and you are charged for the resources used. This example should take an hour or less, so the charges should be minimal. After you complete this example, you should reset your environment to avoid incurring further charges.For more information, see  Reset EMR Environment.

Pricing for Amazon EMR varies by region and service. For this example, charges accrue for the Amazon EMR cluster and Amazon Simple Storage Service (Amazon S3) storage of the log data and output from the Hive job. If you are within your first year of using AWS, some or all of your charges for Amazon S3 might be waived if you are within your usage limits of the AWS Free Tier.
For more information about Amazon EMR pricing and the AWS Free Tier, go to Amazon EMR Pricing   and  AWS Free Tier.

You can use the Amazon Web Services Simple Monthly Calculator to estimate your bill.

Sample EMR Cluster Prerequisites

The following are the preliminary steps you must perform to complete the example.

  1. Create an AWS account.
  2. Create an S3 bucket.
    The example in this topic uses an S3 bucket to store log files and output data.
    Due to Hadoop constraints, the bucket name should conform to these requirements:

    • It must contain lower case letter, numbers, periods and hyphens.
    • It cannot end with a number.
      Example: mycompany.username.vernumber-emr-quickstart.
  3. Click on the S3 bucket name. The bucket page is displayed.
  4. Create 2 folders named: logs and output respectively.
    Make sure that the output folder is empty. For more information, see Creating a Folder.
  5. Create an Amazon EC2 Key Pair.
    You need the key pair to connect to the nodes in the cluster.

Launch the Sample Amazon EMR Cluster

  1. In your browser, navigate to the Amazon management console.
  2. In the Analytics section click on EMR. The console dashboard is displayed.
    EMR Console
  3. Click the Create cluster button.
    The Create Cluster – Quick Options page is displayed.
    For more information, see Using Quick Cluster Configuration Overview
  4. Accept the default values except for the following fields:
    • In the Cluster name box, enter any name that has meaning to you
    • For the S3 folder box, click on the folder icon to select the path to the logs folder that you created.
    • For the EC2 key pair box, from the drop-down list, choose the key pair that you created.
  5. Click the Create cluster button.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.