This workshop will provide a hands-on introduction to the Big Data ecosystem, Hadoop and Apache Spark in practice. Through practical activities in Python, you will learn how to apply Apache Spark on a range of datasets to process and analyse data at scale.
After taking this workshop you will be able to:
SETUP Download / Clone the repository: http://gitlab.cambridgespark.com/pub/bigdata-spark
Follow the instructions in the SETUP.md file: http://gitlab.cambridgespark.com/pub/bigdata-spark/blob/master/SETUP.md