Working notes by Matthew Rocklin

↧

Dask Scaling Limits

June 25, 2018, 5:00 pm

This work is supported by Anaconda Inc.HistoryFor the first year of Dask’s life it focused exclusively on single node parallelism. We felt then that efficiently supporting 100+GB datasets on personal...

View Article

Image may be NSFW.
Clik here to view.

Dask Development Log

July 7, 2018, 5:00 pm

This work is supported by Anaconda IncTo increase transparency I’m trying to blog more often about the current work going on around Dask and related projects. Nothing here is ready for production. This...

View Article

Who uses Dask?

July 15, 2018, 5:00 pm

This work is supported by Anaconda IncPeople often ask general questions like “Who uses Dask?” or more specific questions like the following:For what applications do people use Dask dataframe?How many...

View Article

Dask Development Log, Scipy 2018

July 16, 2018, 5:00 pm

View Article

Image may be NSFW.
Clik here to view.

Pickle isn't slow, it's a protocol

July 22, 2018, 5:00 pm

This work is supported by Anaconda Inctl;dr:Pickle isn’t slow, it’s a protocol. Protocols are important for ecosystems.A recent Dask issue showed that using Dask with PyTorch was slow because sending...

View Article

Image may be NSFW.
Clik here to view.

Dask Development Log

August 1, 2018, 5:00 pm

View Article

Image may be NSFW.
Clik here to view.

Building SAGA optimization for Dask arrays

August 6, 2018, 5:00 pm

This work is supported by ETH Zurich, Anaconda Inc, and the Berkeley Institute for Data ScienceAt a recent Scikit-learn/Scikit-image/Dask sprint at BIDS, Fabian Pedregosa (a machine learning researcher...

View Article

Cloud Lock-in and Open Standards

August 18, 2018, 5:00 pm

This post is from conversations with Peter Wang, Yuvi Panda, and several others. Yuvi expresses his own views on this topic on his blog.SummaryWhen moving to the cloud we should be mindful to avoid...

View Article

High level performance of Pandas, Dask, Spark, and Arrow

August 27, 2018, 5:00 pm

This work is supported by Anaconda IncQuestionHow does Dask dataframe performance compare to Pandas? Also, what about Spark dataframes and what about Arrow? How do they compare?I get this question...

View Article

Image may be NSFW.
Clik here to view.

Dask Release 0.19.0

September 4, 2018, 5:00 pm

This work is supported by Anaconda Inc.I’m pleased to announce the release of Dask version 0.19.0. This is a major release with bug fixes and new features. The last release was 0.18.2 on July 23rd....

View Article

Public Institutions and Open Source Software

August 20, 2018, 5:00 pm

As general purpose open source software displaces domain-specific all-in-one solutions, many institutions are re-assessing how they build and maintain software to support their users. This is true...

View Article

Image may be NSFW.
Clik here to view.

Dask Development Log

September 16, 2018, 5:00 pm

View Article

Image may be NSFW.
Clik here to view.

So you want to contribute to open source

October 11, 2018, 5:00 pm

Welcome new open source contributor!I appreciated receiving the e-mail where you said you were excited about getting into open source and were particularly interested in working on a project that I...

View Article

Image may be NSFW.
Clik here to view.

Anatomy of an OSS Institutional Visit

November 26, 2018, 4:00 pm

I recently visited the UK Meteorology Office, a moderately large organization that serves the weather and climate forecasting needs of the UK (and several other nations). I was there with other open...

View Article

Support Python 2 with Cython

November 27, 2018, 4:00 pm

SummaryMany popular Python packages are dropping support for Python 2 next month. This will be painful for several large institutions. Cython can provide a temporary fix by letting us compile a Python...

View Article

First Impressions of GPUs and PyData

December 16, 2018, 4:00 pm

I recently moved from Anaconda to NVIDIA within the RAPIDS team, which is building a PyData-friendly GPU-enabled data science stack. For my first week I explored some of the current challenges of...

View Article

Image may be NSFW.
Clik here to view.

GPU Dask Arrays, first steps

January 2, 2019, 4:00 pm

The following code creates and manipulates 2 TB of randomly generated...

View Article

The Role of a Maintainer

May 17, 2019, 5:00 pm

What are the expectations and best practices for maintainers of open source software libraries? How can we do this better?This post frames the discussion and then follows with best practices based on...

View Article

Write Short Blogposts

June 24, 2019, 5:00 pm

I encourage my colleagues to write blogposts more frequently. This is for a few reasons:It informs your broader community what you’re up to, and allows that community to communicate back to you...

View Article

Image may be NSFW.
Clik here to view.

HTML outputs in Jupyter

July 3, 2019, 5:00 pm

SummaryUser interaction in data science projects can be improved by adding a small amount of visual deisgn.To motivate effort around visual design we show several simple-yet-useful examples. The code...

View Article