EuroPython 2018

Introduction to Big Data Processing using Spark and Python

Speaker(s) Raoul-Gabriel Urma

This workshop will provide a hands-on introduction to the Big Data ecosystem, Hadoop and Apache Spark in practice. Through practical activities in Python, you will learn how to apply Apache Spark on a range of datasets to process and analyse data at scale.

After taking this workshop you will be able to:

  • Understand the challenges in the Big Data ecosystem
  • Describe the fundamentals of the Hadoop ecosystem
  • Use the core Spark RDD APIs to express data processing queries
  • Understand how you can leverage cloud technologies such as Amazon EMR to process large data sets.

SETUP Download / Clone the repository: http://gitlab.cambridgespark.com/pub/bigdata-spark

Follow the instructions in the SETUP.md file: http://gitlab.cambridgespark.com/pub/bigdata-spark/blob/master/SETUP.md

in on Monday 23 July at 13:45 See schedule
in on Monday 23 July at 15:15 See schedule

Do you have some questions on this talk?

New comment