Wednesday, 15 March 2017


What is Hadoop:
Hadoop is an open source framework which is used to store large amount of data sets.Hadoop is provided for data storage,data access,data processing and security operations.Many organizations are used hadoop for storage purpose because Hadoop storing large amount of data quickly.
Main Components of Hadoop:
  • HDFS
  • Mapreduce
  • YARN
What is Spark:
Spark also open source framework and mainly used for data analytics.Spark runs more faster than hadoop and it designed on top of the hadoop.Spark does not have separate file system and it integrated with another one.Main feature of spark is does not use YARN for functioning.
How Hadoop and Spark processing data?
Spark does not have own file system for processing data because programmers install spark at top of the hadoop.Spark also used HDFS for storing the data.Spark copies more data from physical server because it reduces required time to interact with physical server.
Hadoop have own file system and that is one of your desktop computer but it allows distribute data at several machines. HDFS organize that data at set of blocks and that block have each node.HDFS uses Mapreduce for storing data.Mapreduce takes backup of all data in physical server.
They do different things:
Hadoop and Spark are open source frameworks but they don’t serve same works.Hadoop is distributed file system and it distributes all data across multiple node within cluster server.Spark is the data collection tool so it collects data.It does not have own file system
You can use one without the other:
Hadoop not just storage system and it processing component also that is called mapreduce.Spark does not have separate file storage system but it integrated with another file system.
Spark is speedier:
Spark is more faster than mapreduce for processing data.Step by step process of spark is read data from cluster server and perform all the analytic operations and write results in cluster server.
Features of Hadoop:
Scalable - it stores the large amount of data.
Flexible - Easily access the new data and generate new value from that.
Fast - it process most data in one minutes.
Resilient to failure - one data send to one node and that data also replicated into another node in cluster so another copy of data available to use.
Features of Spark:
Speed - spark are processing data at very fast so it read and write data quickly
Usability - Spark allows many languages for developing application
Real time stream processing - Spark is collects real time data also
Powerful - it manages one or more systems for handles different data
If you are interested to learn more information about Spark Please Click Here