A unified analytics engine for incredibly fast, large-scale data processing
Apache Spark is designed to speed up processing, and runs workloads 100 times faster than other solutions. Tapping into both batch and streaming data, Apache Spark is a high performance solution that utilizes a cutting edge DAG scheduler, a query optimizer, and a physical execution engine to achieve unparalleled results.
It’s incredibly easy to use, and can quickly write applications in Scala, Python, Java, R, and SQL. In fact, Apache Spark offers over 80 high-level operators that make it simple to create parallel apps. It can even be used interactively from the Scala, Python, R, and SQL shells.
This solution combines SQL, streaming, and complex analytics. It powers a stick of libraries including GraphX, Spark Streaming, SQL and DataFrames, and MLlib for machine learning. These libraries can all be seamlessly combined into the same application, streamlining workflow and analysis.
Apache Spark runs everywhere. It can be run using its standalone cluster mode, on EC2, on Hadoop YARN, on Mesos, or on Kubernetes. It can also access diverse data sources, including data in HDFS, Alluxio, Apache Cassandra, Apache HBase, Apache Hive, and hundreds of other data sources.
The analytics engine is utilized by a number of leading brands to process large datasets, including: Amazon, Conviva, eBay, Groupon, IBM, OpenTable, and more.
Apache Spark is built by developers from over 300 companies, and boasts a robust community, with over 1,200 contributors to its development since 2009.