BigData - Apache Hadoop

Posted Nov 11, 2020 Updated Mar 25, 2024

By Grace L

1 min read

Apache Hadoop

Apache Hadoop

history

Doug Cutting: Lucene -> Nutch

Doug Cutting + Yahoo: -> Hadoop: NDFS + MapReduce

basic

before

nowday

data process become a problem

hadoop

Hadoop

comsisted of 3 components

Hadoop 核心

HDFS + MapReduce

HDFS - Storage unit

HDFS

replication
fault-tolerant

MapReduce

Apache Hadoop
- an open source framework for big data.
- It is based on the MapReduce programming model which Google invented and published.
  - "Map function"
    - runs in parallel with a massive dataset to produce intermediate results.
  - "Reduce function"
    - builds a final result set based on all those intermediate results.
- The term “Hadoop” is often used informally to encompass Apache Hadoop itself, and related projects such as Apache Spark, Apache Pig, and Apache Hive.

YARN

usages

use by a lot of big company
- Data warehousing
- recommendation system
- fraud detection

.

BigData Apache Hadoop

This post is licensed under CC BY 4.0 by the author.

Comments powered by Disqus.

Trending Tags

AWS CyberAttack Linux Lab GCP CyberAttackTools ML NetworkSec AI Sysadmin