Quantcast
Channel: Working notes by Matthew Rocklin - SciPy
Browsing all 100 articles
Browse latest View live

Dask Scaling Limits

This work is supported by Anaconda Inc.HistoryFor the first year of Dask’s life it focused exclusively on single node parallelism. We felt then that efficiently supporting 100+GB datasets on personal...

View Article


Image may be NSFW.
Clik here to view.

Dask Development Log

This work is supported by Anaconda IncTo increase transparency I’m trying to blog more often about the current work going on around Dask and related projects. Nothing here is ready for production. This...

View Article


Who uses Dask?

This work is supported by Anaconda IncPeople often ask general questions like “Who uses Dask?” or more specific questions like the following:For what applications do people use Dask dataframe?How many...

View Article

Dask Development Log, Scipy 2018

This work is supported by Anaconda IncTo increase transparency I’m trying to blog more often about the current work going on around Dask and related projects. Nothing here is ready for production. This...

View Article

Image may be NSFW.
Clik here to view.

Pickle isn't slow, it's a protocol

This work is supported by Anaconda Inctl;dr:Pickle isn’t slow, it’s a protocol. Protocols are important for ecosystems.A recent Dask issue showed that using Dask with PyTorch was slow because sending...

View Article


Image may be NSFW.
Clik here to view.

Dask Development Log

This work is supported by Anaconda IncTo increase transparency I’m trying to blog more often about the current work going on around Dask and related projects. Nothing here is ready for production. This...

View Article

Image may be NSFW.
Clik here to view.

Building SAGA optimization for Dask arrays

This work is supported by ETH Zurich, Anaconda Inc, and the Berkeley Institute for Data ScienceAt a recent Scikit-learn/Scikit-image/Dask sprint at BIDS, Fabian Pedregosa (a machine learning researcher...

View Article

Cloud Lock-in and Open Standards

This post is from conversations with Peter Wang, Yuvi Panda, and several others. Yuvi expresses his own views on this topic on his blog.SummaryWhen moving to the cloud we should be mindful to avoid...

View Article


High level performance of Pandas, Dask, Spark, and Arrow

This work is supported by Anaconda IncQuestionHow does Dask dataframe performance compare to Pandas? Also, what about Spark dataframes and what about Arrow? How do they compare?I get this question...

View Article


Image may be NSFW.
Clik here to view.

Dask Release 0.19.0

This work is supported by Anaconda Inc.I’m pleased to announce the release of Dask version 0.19.0. This is a major release with bug fixes and new features. The last release was 0.18.2 on July 23rd....

View Article

Public Institutions and Open Source Software

As general purpose open source software displaces domain-specific all-in-one solutions, many institutions are re-assessing how they build and maintain software to support their users. This is true...

View Article

Image may be NSFW.
Clik here to view.

Dask Development Log

This work is supported by Anaconda IncTo increase transparency I’m trying to blog more often about the current work going on around Dask and related projects. Nothing here is ready for production. This...

View Article

Image may be NSFW.
Clik here to view.

So you want to contribute to open source

Welcome new open source contributor!I appreciated receiving the e-mail where you said you were excited about getting into open source and were particularly interested in working on a project that I...

View Article


Image may be NSFW.
Clik here to view.

Anatomy of an OSS Institutional Visit

I recently visited the UK Meteorology Office, a moderately large organization that serves the weather and climate forecasting needs of the UK (and several other nations). I was there with other open...

View Article

Support Python 2 with Cython

SummaryMany popular Python packages are dropping support for Python 2 next month. This will be painful for several large institutions. Cython can provide a temporary fix by letting us compile a Python...

View Article


First Impressions of GPUs and PyData

I recently moved from Anaconda to NVIDIA within the RAPIDS team, which is building a PyData-friendly GPU-enabled data science stack. For my first week I explored some of the current challenges of...

View Article

Image may be NSFW.
Clik here to view.

GPU Dask Arrays, first steps

The following code creates and manipulates 2 TB of randomly generated...

View Article


The Role of a Maintainer

What are the expectations and best practices for maintainers of open source software libraries? How can we do this better?This post frames the discussion and then follows with best practices based on...

View Article

Write Short Blogposts

I encourage my colleagues to write blogposts more frequently. This is for a few reasons:It informs your broader community what you’re up to, and allows that community to communicate back to you...

View Article

Image may be NSFW.
Clik here to view.

HTML outputs in Jupyter

SummaryUser interaction in data science projects can be improved by adding a small amount of visual deisgn.To motivate effort around visual design we show several simple-yet-useful examples. The code...

View Article
Browsing all 100 articles
Browse latest View live