BigData - Apache Hadoop
Apache Hadoop
history
Doug Cutting: Lucene -> Nutch
Doug Cutting + Yahoo: -> Hadoop: NDFS + MapReduce
basic
before
nowday
data process become a problem
hadoop
Hadoop
- comsisted of 3 components
Hadoop 核心
HDFS - Storage unit
HDFS
- replication
- fault-tolerant
MapReduce
- Apache Hadoop
- an open source framework for big data.
- It is based on the MapReduce programming model which Google invented and published.
- "Map function"
- runs in parallel with a massive dataset to produce intermediate results.
- "Reduce function"
- builds a final result set based on all those intermediate results.
- "Map function"
- The term “Hadoop” is often used informally to encompass Apache Hadoop itself, and related projects such as Apache Spark, Apache Pig, and Apache Hive.
YARN
usages
- use by a lot of big company
- Data warehousing
- recommendation system
- fraud detection
.
This post is licensed under CC BY 4.0 by the author.
Comments powered by Disqus.