Data Loading...
Book Collection
Learning Path
Advanced Python Programming
Build high performance, concurrent, and multi-threaded apps with Python using proven design patterns
Dr. Gabriele Lanaro, Quan Nguyen and Sakis Kasampalis FOR SALE IN INDIA ONLY
www.packt.com
Advanced Python Programming Build high performance, concurrent, and multi-threaded apps with Python using proven design patterns
Dr. Gabriele Lanaro Quan Nguyen Sakis Kasampalis
BIRMINGHAM - MUMBAI
Advanced Python Programming Copyright © 2019 Packt Publishing All rights reserved. No part of this book may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, without the prior written permission of the publisher, except in the case of brief quotations embedded in critical articles or reviews. Every effort has been made in the preparation of this book to ensure the accuracy of the information presented. However, the information contained in this book is sold without warranty, either express or implied. Neither the author(s), nor Packt Publishing or its dealers and distributors, will be held liable for any damages caused or alleged to have been caused directly or indirectly by this book. Packt Publishing has endeavored to provide trademark information about all of the companies and products mentioned in this book by the appropriate use of capitals. However, Packt Publishing cannot guarantee the accuracy of this information. First Published: February 2019 Production Reference: 2280219 Published by Packt Publishing Ltd. Livery Place, 35 Livery Street Birmingham, B3 2PB, U.K. ISBN 978-1-83855-121-6 www.packtpub.com
mapt.io
Mapt is an online digital library that gives you full access to over 5,000 books and videos, as well as industry-leading tools to help you plan your personal development and advance your career. For more information, please visit our website.
Why Subscribe? Spend less time learning and more time coding with practical eBooks and Videos from over 4,000 industry professionals Improve your learning with Skill Plans built especially for you Get a free eBook or video every month Mapt is fully searchable Copy and paste, print, and bookmark content
Packt.com Did you know that Packt offers eBook versions of every book published, with PDF and ePub files available? You can upgrade to the eBook version at www.packt.com and as a print book customer, you are entitled to a discount on the eBook copy. Get in touch with us at [email protected] for more details. At www.packt.com, you can also read a collection of free technical articles, sign up for a range of free newsletters, and receive exclusive discounts and offers on Packt books and eBooks.
Contributors About the Authors Dr. Gabriele Lanaro is passionate about good software and is the author of the chemlab and chemview open source packages. His interests span machine learning, numerical computing visualization, and web technologies. In 2013, he authored the first edition of the book High Performance Python Programming. He has been conducting research to study the formation and growth of crystals using medium and large-scale computer simulations. In 2017, he obtained his PhD in theoretical chemistry. Quan Nguyen is a Python enthusiast and data scientist. Currently, he works as a data analysis engineer at Micron Technology, Inc. With a strong background in mathematics and statistics, Quan is interested in the fields of scientific computing and machine learning. With data analysis being his focus, Quan also enjoys incorporating technology automation into everyday tasks through programming. Quan's passion for Python programming has led him to be heavily involved in the Python community. He started as a primary contributor for the Python for Scientists and Engineers book and various open source projects on GitHub. Quan is also a writer for the Python software foundation and an occasional content contributor for DataScience.com (part of Oracle). Sakis Kasampalis is a software engineer living in the Netherlands. He is not dogmatic about particular programming languages and tools; his principle is that the right tool should be used for the right job. One of his favorite tools is Python because he finds it very productive. Sakis has also technically reviewed the Mastering Object-oriented Python and Learning Python Design Patterns books, both published by Packt Publishing.
Packt Is Searching for Authors Like You If you're interested in becoming an author for Packt, please visit authors.packtpub.com and apply today. We have worked with thousands of developers and tech professionals, just like you, to help them share their insight with the global tech community. You can make a general application, apply for a specific hot topic that we are recruiting an author for, or submit your own idea.
Table of Contents Preface
1
Chapter 1: Benchmarking and Profiling Designing your application Writing tests and benchmarks Timing your benchmark
Better tests and benchmarks with pytest-benchmark Finding bottlenecks with cProfile Profile line by line with line_profiler Optimizing our code The dis module Profiling memory usage with memory_profiler Summary Chapter 2: Pure Python Optimizations Useful algorithms and data structures Lists and deques Dictionaries
Building an in-memory search index using a hash map
Sets Heaps Tries
Caching and memoization Joblib
Comprehensions and generators Summary Chapter 3: Fast Array Operations with NumPy and Pandas Getting started with NumPy Creating arrays Accessing arrays Broadcasting Mathematical operations Calculating the norm
Rewriting the particle simulator in NumPy Reaching optimal performance with numexpr Pandas Pandas fundamentals
Indexing Series and DataFrame objects
Database-style operations with Pandas
9 10 16 18 21 24 29 31 33 34 37 39 40 41 43 45 47 48 50 52 55 56 58 59 60 60 62 67 70 71 71 75 77 77 79 81
Table of Contents
Mapping Grouping, aggregations, and transforms Joining
Summary Chapter 4: C Performance with Cython Compiling Cython extensions Adding static types Variables Functions Classes
Sharing declarations Working with arrays
C arrays and pointers NumPy arrays Typed memoryviews
Particle simulator in Cython Profiling Cython Using Cython with Jupyter Summary Chapter 5: Exploring Compilers Numba
First steps with Numba Type specializations Object mode versus native mode Numba and NumPy Universal functions with Numba Generalized universal functions
JIT classes Limitations in Numba
The PyPy project
Setting up PyPy Running a particle simulator in PyPy
Other interesting projects Summary Chapter 6: Implementing Concurrency Asynchronous programming Waiting for I/O Concurrency Callbacks Futures Event loops
The asyncio framework Coroutines
[ ii ]
82 84 86 88
89 89 92 92 94 95 97 98 98 101 102 104 108 112 115 117 118 118 120 121 124 124 126 129 132 133 134 135 136 137 139 140 140 141 143 146 148 151 152
Table of Contents
Converting blocking code into non-blocking code
Reactive programming
Observables Useful operators Hot and cold observables Building a CPU monitor
Summary Chapter 7: Parallel Processing Introduction to parallel programming Graphic processing units
Using multiple processes
The Process and Pool classes The Executor interface Monte Carlo approximation of pi Synchronization and locks
Parallel Cython with OpenMP Automatic parallelism Getting started with Theano Profiling Theano
Tensorflow Running code on a GPU
Summary Chapter 8: Advanced Introduction to Concurrent and Parallel Programming Technical requirements What is concurrency? Concurrent versus sequential Example 1 – checking whether a non-negative number is prime Concurrent versus parallel A quick metaphor
Not everything should be made concurrent Embarrassingly parallel Inherently sequential
Example 2 – inherently sequential tasks
I/O bound
The history, present, and future of concurrency The history of concurrency The present The future
A brief overview of mastering concurrency in Python Why Python?
Setting up your Python environment General setup
Summary
[ iii ]
156 158 158 161 165 168 171 173 174 176 177 178 180 181 184 187 189 190 195 197 199 203 205 206 206 206 207 210 211 211 212 212 213 215 215 216 217 219 221 222 224 224 225
Table of Contents
Questions Further reading Chapter 9: Amdahl's Law Technical requirements Amdahl's Law
226 226
Terminology
Formula and interpretation
The formula for Amdahl's Law A quick example
Implications
Amdahl's Law's relationship to the law of diminishing returns How to simulate in Python Practical applications of Amdahl's Law Summary Questions Further reading Chapter 10: Working with Threads in Python Technical requirements The concept of a thread Threads versus processes Multithreading An example in Python
An overview of the threading module The thread module in Python 2 The threading module in Python 3
Creating a new thread in Python
Starting a thread with the thread module Starting a thread with the threading module
Synchronizing threads
The concept of thread synchronization The threading.Lock class An example in Python
Multithreaded priority queue
A connection between real-life and programmatic queues The queue module Queuing in concurrent programming Multithreaded priority queue
Summary Questions Further reading Chapter 11: Using the with Statement in Threads Technical requirements [ iv ]
227 227 228 228 229 229 230 230 231 232 236 237 238 238 239 240 240 240 241 243 247 247 247 248 249 251 254 254 255 255 257 257 258 259 263 264 265 265 267 267
Table of Contents
Context management
Starting from managing files The with statement as a context manager The syntax of the with statement
The with statement in concurrent programming Example of deadlock handling
Summary Questions Further reading Chapter 12: Concurrent Web Requests Technical requirements The basics of web requests HTML HTTP requests HTTP status code
The requests module
Making a request in Python Running a ping test
Concurrent web requests
Spawning multiple threads Refactoring request logic
The problem of timeout
Support from httpstat.us and simulation in Python Timeout specifications
Good practices in making web requests
Consider the terms of service and data-collecting policies Error handling Update your program regularly Avoid making a large number of requests
Summary Questions Further reading Chapter 13: Working with Processes in Python Technical requirements The concept of a process Processes versus threads Multiprocessing Introductory example in Python
An overview of the multiprocessing module
The process class The Pool class Determining the current process, waiting, and terminating processes Determining the current process
[v]
268 268 269 271 271 272 274 274 275 277 277 278 278 280 281 282 283 285 286 287 289 291 291 292 296 296 296 297 297 299 299 299 301 302 302 304 305 307 309 309 310 311 311
Table of Contents
Waiting for processes Terminating processes
Interprocess communication
Message passing for a single worker Message passing between several workers
Summary Questions Further reading Chapter 14: Reduction Operators in Processes Technical requirements The concept of reduction operators Properties of a reduction operator Examples and non-examples
Example implementation in Python Real-life applications of concurrent reduction operators Summary Questions Further reading Chapter 15: Concurrent Image Processing Technical requirements Image processing fundamentals Python as an image processing tool Installing OpenCV and NumPy
Computer image basics
RGB values Pixels and image files Coordinates inside an image
OpenCV API Image processing techniques Grayscaling Thresholding
Applying concurrency to image processing Good concurrent image processing practices Choosing the correct way (out of many) Spawning an appropriate number of processes Processing input/output concurrently
Summary Questions Further reading Chapter 16: Introduction to Asynchronous Programming Technical requirements A quick analogy Asynchronous versus other programming models [ vi ]
314 317 317 318 320 326 327 327
329 329 330 330 331 333 338 338 339 339 341 341 342 342 343 344 344 345 345 346 348 349 351 356 360 360 363 363 363 364 364 365 365 366 367
Table of Contents
Asynchronous versus synchronous programming Asynchronous versus threading and multiprocessing
An example in Python Summary Questions Further reading Chapter 17: Implementing Asynchronous Programming in Python Technical requirements The asyncio module Coroutines, event loops, and futures Asyncio API
The asyncio framework in action Asynchronously counting down A note about blocking functions Asynchronous prime-checking Improvements from Python 3.7 Inherently blocking tasks
concurrent.futures as a solution for blocking tasks Changes in the framework Examples in Python
Summary Questions Further reading Chapter 18: Building Communication Channels with asyncio Technical requirements The ecosystem of communication channels Communication protocol layers Asynchronous programming for communication channels Transports and protocols in asyncio The big picture of asyncio's server client
Python example
Starting a server Installing Telnet Simulating a connection channel Sending messages back to clients Closing the transports
Client-side communication with aiohttp Installing aiohttp and aiofiles Fetching a website's HTML code Writing files asynchronously
Summary Questions Further reading
[ vii ]
368 369 370 373 373 374 375 375 376 376 378 379 380 384 385 389 390 391 392 392 396 397 398 399 400 400 400 402 403 405 406 406 408 409 410 411 413 414 414 416 418 419 419
Table of Contents
Chapter 19: Deadlocks Technical requirements The concept of deadlock
The Dining Philosophers problem Deadlock in a concurrent system Python simulation
Approaches to deadlock situations
Implementing ranking among resources Ignoring locks and sharing resources An additional note about locks Concluding note on deadlock solutions
The concept of livelock Summary Questions Further reading Chapter 20: Starvation Technical requirements The concept of starvation
What is starvation? Scheduling Causes of starvation Starvation's relationship to deadlock
The readers-writers problem
Problem statement The first readers-writers problem The second readers-writers problem The third readers-writers problem
Solutions to starvation Summary Questions Further reading Chapter 21: Race Conditions Technical requirements The concept of race conditions Critical sections How race conditions occur
Simulating race conditions in Python Locks as a solution to race conditions The effectiveness of locks Implementation in Python The downside of locks
Turning a concurrent program sequential Locks do not lock anything
[ viii ]
421 421 422 422 425 426 430 430 436 438 439 439 442 442 442 443 443 444 444 445 446 447 448 448 449 453 456 458 459 460 460 461 461 462 462 463 465 467 467 469 470 471 473
Table of Contents
Race conditions in real life Security Operating systems Networking
Summary Questions Further reading Chapter 22: The Global Interpreter Lock Technical requirements An introduction to the Global Interpreter Lock An analysis of memory management in Python The problem that the GIL addresses Problems raised by the GIL
The potential removal of the GIL from Python How to work with the GIL
Implementing multiprocessing, rather than multithreading Getting around the GIL with native extensions Utilizing a different Python interpreter
Summary Questions Further reading Chapter 23: The Factory Pattern The factory method
Real-world examples Use cases Implementing the factory method
The abstract factory
Real-world examples Use cases Implementing the abstract factory pattern
Summary Chapter 24: The Builder Pattern Real-world examples Use cases Implementation Summary Chapter 25: Other Creational Patterns The prototype pattern Real-world examples Use cases Implementation
Singleton
[ ix ]
474 474 475 476 477 477 478 479 479 480 480 483 484 486 486 487 489 489 489 490 490 491 492 493 493 494 502 502 503 503 508 509 510 511 515 521 523 524 524 525 525 529
Learning Path Advanced Python Programming This Learning Path shows you how to leverage the power of both native and third-party Python libraries for building robust and responsive applications. You will learn about profilers and reactive programming, concurrency and parallelism, as well as tools for making your apps quick and efficient. You will discover how to write code for parallel architectures using TensorFlow and Theano, and use a cluster of computers for large-scale computations using technologies such as Dask and PySpark. With the knowledge of how Python design patterns work, you will be able to clone objects, secure interfaces, dynamically choose algorithms, and accomplish much more in high performance computing. By the end of this Learning Path, you will have the skills and confidence to build engaging models that quickly offer efficient solutions to your problems. This Learning Path includes content from the following Packt products:
Things you will learn: •
Use NumPy and pandas to import and manipulate datasets
•
Achieve native performance with Cython and Numba
•
Write asynchronous code using asyncio and RxPy
•
Design highly scalable programs with application scaffolding
•
Explore abstract methods to maintain data consistency
•
Clone objects using the prototype pattern
•
Use the adapter pattern to make incompatible interfaces compatible
•
Employ the strategy pattern to dynamically choose an algorithm
• Python High Performance - Second Edition by Gabriele Lanaro • Mastering Concurrency in Python by Quan Nguyen • Mastering Python Design Patterns by Sakis Kasampalis
www.packt.com
FOR SALE IN INDIA ONLY