Introduction Fast and general-purpose cluster computing system. It provides high-level APIs in Java, Scala, Python, etc. Core data abstraction is the Resilient Distributed Dataset (RDD) Abstraction which provides an efficient data sharing between computations It automatically distributes the data across the cluster and parallelizes the required operations. Integrates with many storage systems (e.g., HDFS, Cassandra,Continue reading “Spark Introduction”