EuroPython 2018

Best practices for elegant experimentation in data science projects (case study)

Speaker(s) K K

In the course of the project, data scientists face multiple issues. Difficulties with reproducibility, lack of the ability to prepare experiments quickly and dirty data are just three examples. Data science projects involve a lot of experimentation and quick adoption of new ideas and technologies. Such environment makes it difficult to keep the code clean as well as keep track of small changes that makes new experiment successful.

Here, we use an instance segmentation challenge - called Mapping Challenge - hosted on the crowdAI platform to show: 1) our best practices when working in data science projects, 2) competition results. Our best practices involve usage of the steppy library, which provides minimal interface for building machine learning pipelines. Besides this, we organized our work in a transparent and open way, publishing code, tasks and experiments results.

On the poster, we share our results regarding pre- and post-processing routines, network architectures and training scheme. We also present technology stack that we use. It is a blend of well established Python packages (like numpy and sklearn) and our own open source initiatives, that is steppy and steppy-toolkit.

Poster is for Pythonists looking for: 1) example solution to the instance segmentation task, 2) ideas how to organize data science project.

in on Friday 27 July at 10:00 See schedule

Comments

  1. Gravatar
    Hi, here is Kamil Kaczmarek (author of this proposal).

    If you have any questions or need further elucidations, please do not hesitate to ask here, in comments.
    — Kamil Kaczmarek,

New comment