This work is supported by Continuum Analytics and the XDATA Program as part of the Blaze Project
tl;dr: I learned asyncio and rewrote part of dask.distributed with it; this details my experience
asyncio
The asyncio library provides
concurrent programming in the style of Go, Clojure’s core.async
library, or
more traditional libraries like Twisted. Asyncio offers a programming paradigm
that lets many moving parts interact without involving separate threads. These
separate parts explicitly yield control to each other and to a central
authority and then regain control as others yield control to them. This lets
one escape traps like race conditions to shared state, a web of callbacks, lost
error reporting, and general confusion.
I’m not going to write too much about asyncio. Instead I’m going to briefly
describe my problem, link to a solution, and then dive into good-and-bad points
about using asyncio
while they’re fresh in my mind.
Exercise
I won’t actually discuss the application much after this section; you can safely skip this.
I decided to rewrite the
dask.distributed
Worker using
asyncio. This worker has to do the following:
- Store local data in a dictionary (easy)
- Perform computations on that data as requested by a remote connection (act as a server in a client-server relationship)
- Collect data from other workers when we don’t have all of the necessary data for a computation locally (peer-to-peer)
- Serve data to other workers who need our data for their own computations (peer-to-peer)
It’s a sort of distributed RPC mechanism with peer-to-peer value sharing. Metadata for who-has-what data is stored in a central metadata store; this could be something like Redis.
The current implementation of this is a nest of threads, queues, and callbacks. It’s not bad and performs well but tends to be hard for others to develop.
Additionally I want to separate the worker code because it’s useful outside of
dask.distributed
. Other distributed computation solutions exist in my head
that rely on this technology.
For the moment the code lives here:
https://github.com/mrocklin/dist. I like
the design. The module-level docstring of
worker.py is
short and informative. But again, I’m not going to discuss the application
yet; instead, here are some thoughts on learning/developing with asyncio
.
General Thoughts
Disclaimer I am a novice concurrent programmer. I write lots of parallel code but little concurrent code. I have never used existing frameworks like Twisted.
I liked the experience of using asyncio and recommend the paradigm to anyone building concurrent applications.
The Good:
- I can write complex code that involves multiple asynchronous calls, complex logic, and exception handling all in a single place. Complex application logic is no longer spread in many places.
- Debugging is much easier now that I can throw
import pdb; pdb.set_trace()
lines into my code and expect them to work (this fails when using threads). - My code fails more gracefully, further improving the debug experience.
Ctrl-C
works. - The paradigm shared by Go, Clojure’s
core.async
, and Python’sasyncio
felt viscerally good. I was able to reason well about my program as I was building it and made nice diagrams about explicitly which sequential processes interacted with which others over which channels. I am much more confident of the correctness of the implementation and the design of my program. However, after having gone through this exercise I suspect that I could now implement just about the same design withoutasyncio
. The design paradigm was perhaps as important as the library itself. - I have to support Python 2. Fortunately I found the
trollius port of
asyncio
to be very usable. It looks like it was a direct fork-then-modify oftulip
.
The Bad:
- There wasn’t a ZeroMQ connectivity layer for Trollius (though
aiozmq
exists in Python 3) so I ended up having to use threads anyway for inter-node I/O. This, combined with ZeroMQ’s finicky behavior did mean that my program crashed hard sometimes. I’m considering switching to plain sockets (which are supported nativel by Trollius and asyncio) due to this. - While exceptions raise cleanly I can’t determine from where they originate.
There are no line numbers or tracebacks. Debugging in a concurrent
environment is hard; my experience was definitely better than threads but
still could be improved. I hope that
asyncio
in Python 3.4 has better debugging support. - The API documentation is thorough but stackoverflow, general best
practices, and example coverage is very sparse. The project is new so
there isn’t much to go on. I found that reading documentation for Go and
presentations on Clojure’s
core.async
were far more helpful in preparing me to useasyncio
than any of the asyncio docs/presentations.
Future
I intend to pursue this into the future and, if the debugging experience is better in Python 3 am considering rewriting the dask.distributed Scheduler in Python 3 with asyncio proper. This is possible because the Scheduler doesn’t have to be compatible with user code.
I found these videos to be useful: