Mastering Large Datasets with Python

Другая компьютерная литература, Программы
Автор: Wolohan John T.
Добавил: Admin 23 Июн 20
Проверил: Admin 23 Июн 20
Формат:  PDF (17833 Kb)
The idea for this book came to me in the summer of 2018 after working with some especially talented developers who had managed to go a significant portion of their careers without learning how to write scalable code. I realized then that a lot of the techniques for “big data” work, or what we’ll refer to in this book as “large dataset” problems, are reserved for those who want to tackle these problems exclusively.
Because a lot of these problems occur in enterprise environments, where the mechanisms to produce data at this scale are ripe, books about this topic tend to be written in the same enterprise languages as the tools, such as Java.
This book is a little different. I’ve noticed that large dataset problems are increas-ingly being tackled in a distributed manner. Not distributed in the terms of distributed computing—though certainly that as well—but distributed in terms of who’s doing the work. Individual developers or small development teams, often working in rapid prototyping environments or with rapid development languages (such as Python), are now working with large datasets.
My hope is that this book can bring the techniques for scalable and distributed programming to a broader audience of developers. We’re living in an era where big data is becoming increasingly prevalent. Skills in parallelization and distributed programming are increasingly vital to developers’ day-to-day work. More and more program-mers are facing problems resulting from datasets that are too large for the way they’ve been taught to think about them. Hopefully, with this book, developers will have the tools to solve those big data problems and focus on the ones that got them interested in programming in the first place.

