Category Archives: MapReduce

AWS Elastic Map Reduce (EMR)

Amazon Elastic MapReduce (Amazon EMR) is a web service that makes it easy to quickly and cost-effectively process vast amounts of data. Amazon EMR simplifies big data processing, providing a managed Hadoop framework that makes it easy, fast, and cost-effective for you to distribute and process vast amounts of data across dynamically scalable Amazon EC2 instances.You can also run other popular distributed frameworks such as Apache Spark and Presto (SQL Query Engine) in Amazon EMR, and interact with data in other AWS data stores such as Amazon S3 and Amazon DynamoDB. For a quick overview, see Introduction to Amazon Elastic MapReduce.

Background

Amazon EMR enables you to quickly and easily provision as much computing capability as you need and add or reduce or remove it at any time. This is very important when dealing with variable or unpredictable processing requirements as it is often the case with big data processing.
For example, if the bulk of your processing occurs at night, you might need 100 virtual machine instances during the day and 500 instances at night. Or you might need a significant computing peak for a short period of time. With Amazon EMR you can quickly provision hundreds or thousands of instances, and release them when the work is completed. saving on the overall cost.

Computing Capacity

The following are some possible way to control computing capacity: