The data science community is such an interesting, dynamic, and fast-paced place to work. While my journey as a data scientist so far has only been around five years long, it feels as though I’ve already seen a lifetime of tools, technologies, and trends come and go. One consistent effort has been a focus on continuing to make data science easier. Lowering barriers to entry and developing better libraries have made data science more accessible than ever. That there is such a bright, diverse, and dedicated community of software architects and developers working tirelessly to improve data science for everyone has made my experience writing Data Science with Python and Dask an incredibly humbling—and at times intimidating—experience. But, nonetheless, it is a great honor to be able to contribute to this vibrant community by showcasing the truly excellent work that the entire team of Dask maintainers and contributors have produced.
I stumbled across Dask in early 2016 when I encountered my first uncomfortably large dataset at work. After fumbling around for days with Hadoop, Spark, Ambari, ZooKeeper, and the menagerie of Apache “big data” technologies, I, in my exasperation, simply Googled “big data library python.” After tabbing through pages of results, I was left with two options: continue banging my head against PySpark or figure out how to use chunking in Pandas. Just about ready to call my search efforts futile, I spotted a StackOverflow question that mentioned a library called Dask. Once I found my way over to where Dask was hosted on GitHub, I started working my way through the documentation. DataFrames for big datasets? An API that mimics Pandas? It can be installed using pip? It seemed too good to be true. But it wasn’t. I was incensed—why hadn’t I heard of this library before? Why was something this powerful and easy to use flying under the radar at a time when the big data craze was reaching fever pitch?
After having great success using Dask for my work project, I was determined to become an evangelist. I was teaching a Python for Data Science class at the University of Denver at the time, and I immediately began looking for ways to incorporate Dask into the curriculum. I also presented several talks and workshops at my local PyData chapter’s meetups in Denver. Finally, when I was approached by the folks at Manning to write a book on Dask, I agreed without hesitation. As you read this book, I hope you also come to see how awesome and useful Dask is to have in your arsenal of data science tools!