Hadoop

What is Hadoop

Hadoop is an open source framework to handle big data in a distributed system. The official site of Hadoop is https://hadoop.apache.org/

Why Hadoop

Hadoop cam solve the below problems

  • Handle the storing data of large volume (terabyte, petabyte, zettabyte, and yottabyte data!)
  • Can Process various format of complex data like structured, semi structured and unstructured data efficiently
  • Increase capacity of the processing and computing data of large volume complex data
  • Decrease the processing and computing time of handling large volume complex data

Components of Hadoop

HDFS (Hadoop Distributed File System)

  • It is a storage units of Hadoop
  • It follows master/slave architecture
  • Store any kind of data
  • Input Data are divided into small nodes then these nodes are stored accordingly
  • Schema (database and XML schema etc.) validation is not required to store and dumping data

MapReduce (Processing technique)

Yarn (Yet Another Resource Negotiator)