Hadoop is not a single product, but rather a software family. Its common components consist of the following:
- Pig, a scripting language used to quickly write MapReduce code to handle unstructured sources
- Hive, used to facilitate structure for the data
- HCatalog, used to provide inter-operatability between these internal systems
- HBase, which is essentially a database built on top of Hadoop
- HDFS, the actual file system for hadoop.
- Apache Mahout
- Packaging for Hadoop: BigTop
Hadoop structures data using Hive, but can handle unstructured data easily using Pig.
Hadoop and Mongo¶
Amazon EMR includes