Hadoop Online Training | contact us : +91 9989445901.
Hadoop is an Apache open source framework written in java that allows distributed processing of large datasets across clusters of computers using simple programming models. The Hadoop framework application works in an environment that provides distributed storage and computation across clusters of computers. Hadoop is designed to scale up from single server to thousands of machines, each offering local computation and storage.
Hadoop is based on work done by Google in the early 2000s – Specifically, on papers describing the Google File System (GFS) published in 2003, and Map Reduce published in 2004 ! This work takes a radical new approach to the problem of distributed computing – Meets all the requirements we have for reliability, scalability etc ! Core concept: distribute the data as it is initially stored in the system – Individual nodes can work on data local to those nodes – No data transfer over the network is required for initial processing.
when to use Hadoop :
1 . For Processing Really BIG Data
2. For Storing a Diverse Set of Data
3. For Parallel Data Processing.
When NOT to use Hadoop :
1. For Real-Time Data Analysis
2. For a Relational Database System
3. For a General Network File System
4. For Non-Parallel Data Processing.
3 Core Components :
1. Hadoop Distributed File System (HDFS)
2. Hadoop map reduce
3. hadoop yarn
Hadoop tools:
Ambari
Avro
Cassandra
chukwa
HBase
Hive
Mahout
Pig
Spark
Tez
Zookeeper.
No comments:
Post a Comment