Introduction to Big Data Processing using Spark and Python

Speaker(s) Raoul-Gabriel Urma

This workshop will provide a hands-on introduction to the Big Data ecosystem, Hadoop and Apache Spark in practice. Through practical activities in Python, you will learn how to apply Apache Spark on a range of datasets to process and analyse data at scale.

After taking this workshop you will be able to:

Understand the challenges in the Big Data ecosystem
Describe the fundamentals of the Hadoop ecosystem
Use the core Spark RDD APIs to express data processing queries
Understand how you can leverage cloud technologies such as Amazon EMR to process large data sets.

SETUP Download / Clone the repository: http://gitlab.cambridgespark.com/pub/bigdata-spark

Follow the instructions in the SETUP.md file: http://gitlab.cambridgespark.com/pub/bigdata-spark/blob/master/SETUP.md

in on Monday 23 July at 13:45 See schedule

in on Monday 23 July at 15:15 See schedule

Do you have some questions on this talk?

New comment

Comment:

Name:

Email address:

URL:

Captcha: