EuroPython 2018

Best Practices for a Blazing Fast Machine Learning Pipeline

Speaker(s) David Liu

Getting through preprocessing to analytics to insights quickly has been the crux of many a Data Scientist. In this workshop, learn about the best tools, techniques, and frameworks that allow for a fast machine learning pipeline—topics covered will include preprocessing vectorization tricks, distributed dataframe handling, along with tips and tricks to help scale out one’s machine learning code out to cloud or clusters.

Workshop Overview:


-Tools and Techniques

-Data preprocessing

-Break (15 min)

-Data Visualization

-Machine Learning

-Options for scaling and pipelining

-Break (15 min)

-Hands-on: Advanced tools

-Hands-on: Chaining it together

-Prerequisites are from the two Github repos utilized for the training: (slides in repo) and

in on Monday 23 July at 09:30 See schedule
in on Monday 23 July at 11:15 See schedule


    The repository URL in the description is wrong.
    — Ignasi Fosch,
    I think the correct repo url is
    — Stefan Gangefors,

