
Table of Contents
What is Hadoop
Hadoop is an open source framework to handle big data in a distributed system. The official site of Hadoop is https://hadoop.apache.org/
Why Hadoop
Hadoop cam solve the below problems
- Handle the storing data of large volume (terabyte, petabyte, zettabyte, and yottabyte data!)
- Can Process various format of complex data like structured, semi structured and unstructured data efficiently
- Increase capacity of the processing and computing data of large volume complex data
- Decrease the processing and computing time of handling large volume complex data
Components of Hadoop
HDFS (Hadoop Distributed File System)
- It is a storage units of Hadoop
- It follows master/slave architecture
- Store any kind of data
- Input Data are divided into small nodes then these nodes are stored accordingly
- Schema (database and XML schema etc.) validation is not required to store and dumping data