Getting through preprocessing to analytics to insights quickly has been the crux of many a Data Scientist. In this workshop, learn about the best tools, techniques, and frameworks that allow for a fast machine learning pipeline—topics covered will include preprocessing vectorization tricks, distributed dataframe handling, along with tips and tricks to help scale out one’s machine learning code out to cloud or clusters.
Workshop Overview:
-Introduction
-Tools and Techniques
-Data preprocessing
-Break (15 min)
-Data Visualization
-Machine Learning
-Options for scaling and pipelining
-Break (15 min)
-Hands-on: Advanced tools
-Hands-on: Chaining it together
-Prerequisites are from the two Github repos utilized for the training: (slides in repo)https://github.com/triskadecaepyon/ep_2018_workshop and https://github.com/IntelPython/workshop