Learning to use the awesome Pandas toolkit helped me immensely in lots of ways. Finding novel, efficient solutions to complex day-to-day problems with Pandas not only saves time, but can be fun and rewarding experience.
In this talk I’ll present use cases I had to solve, but the “traditional” approach proved tough and/or otherwise frustrating implement nicely. Since I was just starting to learn Pandas, decided to try an alternative solution with it. What I learned changed the way I think about data processing with Python, and it only got better since!
The use cases deals with extracting pen strokes from handwritten SVG samples, and recomposing them into reusable letters and numbers. When you need to compare each stroke to all others, often more than once, resulted in inefficient, slow, and hard to maintain code. Even a naive Pandas approach with loops helped to reduce both the memory footprint, and improve the performance considerably! Improving the implementation further, vectorizing inner loops, and taking advantage of multi-index operations, I managed to get the same results, using less memory and a lot faster (by orders of magnitude).