Dask Scaling Limits
This work is supported by Anaconda Inc.HistoryFor the first year of Dask’s life it focused exclusively on single node parallelism. We felt then that efficiently supporting 100+GB datasets on personal...
View ArticleDask Development Log
This work is supported by Anaconda IncTo increase transparency I’m trying to blog more often about the current work going on around Dask and related projects. Nothing here is ready for production. This...
View ArticleWho uses Dask?
This work is supported by Anaconda IncPeople often ask general questions like “Who uses Dask?” or more specific questions like the following:For what applications do people use Dask dataframe?How many...
View ArticleDask Development Log, Scipy 2018
This work is supported by Anaconda IncTo increase transparency I’m trying to blog more often about the current work going on around Dask and related projects. Nothing here is ready for production. This...
View ArticlePickle isn't slow, it's a protocol
This work is supported by Anaconda Inctl;dr:Pickle isn’t slow, it’s a protocol. Protocols are important for ecosystems.A recent Dask issue showed that using Dask with PyTorch was slow because sending...
View ArticleDask Development Log
This work is supported by Anaconda IncTo increase transparency I’m trying to blog more often about the current work going on around Dask and related projects. Nothing here is ready for production. This...
View ArticleBuilding SAGA optimization for Dask arrays
This work is supported by ETH Zurich, Anaconda Inc, and the Berkeley Institute for Data ScienceAt a recent Scikit-learn/Scikit-image/Dask sprint at BIDS, Fabian Pedregosa (a machine learning researcher...
View ArticleCloud Lock-in and Open Standards
This post is from conversations with Peter Wang, Yuvi Panda, and several others. Yuvi expresses his own views on this topic on his blog.SummaryWhen moving to the cloud we should be mindful to avoid...
View ArticleHigh level performance of Pandas, Dask, Spark, and Arrow
This work is supported by Anaconda IncQuestionHow does Dask dataframe performance compare to Pandas? Also, what about Spark dataframes and what about Arrow? How do they compare?I get this question...
View ArticleDask Release 0.19.0
This work is supported by Anaconda Inc.I’m pleased to announce the release of Dask version 0.19.0. This is a major release with bug fixes and new features. The last release was 0.18.2 on July 23rd....
View ArticlePublic Institutions and Open Source Software
As general purpose open source software displaces domain-specific all-in-one solutions, many institutions are re-assessing how they build and maintain software to support their users. This is true...
View ArticleDask Development Log
This work is supported by Anaconda IncTo increase transparency I’m trying to blog more often about the current work going on around Dask and related projects. Nothing here is ready for production. This...
View ArticleSo you want to contribute to open source
Welcome new open source contributor!I appreciated receiving the e-mail where you said you were excited about getting into open source and were particularly interested in working on a project that I...
View ArticleAnatomy of an OSS Institutional Visit
I recently visited the UK Meteorology Office, a moderately large organization that serves the weather and climate forecasting needs of the UK (and several other nations). I was there with other open...
View ArticleSupport Python 2 with Cython
SummaryMany popular Python packages are dropping support for Python 2 next month. This will be painful for several large institutions. Cython can provide a temporary fix by letting us compile a Python...
View ArticleFirst Impressions of GPUs and PyData
I recently moved from Anaconda to NVIDIA within the RAPIDS team, which is building a PyData-friendly GPU-enabled data science stack. For my first week I explored some of the current challenges of...
View ArticleGPU Dask Arrays, first steps
The following code creates and manipulates 2 TB of randomly generated...
View ArticleThe Role of a Maintainer
What are the expectations and best practices for maintainers of open source software libraries? How can we do this better?This post frames the discussion and then follows with best practices based on...
View ArticleWrite Short Blogposts
I encourage my colleagues to write blogposts more frequently. This is for a few reasons:It informs your broader community what you’re up to, and allows that community to communicate back to you...
View ArticleHTML outputs in Jupyter
SummaryUser interaction in data science projects can be improved by adding a small amount of visual deisgn.To motivate effort around visual design we show several simple-yet-useful examples. The code...
View Article