Fast Message Serialization
This work is supported by Continuum Analytics and the XDATA Program as part of the Blaze ProjectVery high performance isn’t about doing one thing well, it’s about doing nothing poorly.This week I...
View ArticleAd Hoc Distributed Random Forests
This work is supported by Continuum Analytics and the XDATA Program as part of the Blaze ProjectA screencast version of this post is available here:...
View ArticleData Bandwidth
This work is supported by Continuum Analytics and the XDATA Program as part of the Blaze Projecttl;dr: We list and combine common bandwidths relevant in data scienceUnderstanding data bandwidths helps...
View ArticleDisk Bandwidth
This work is supported by Continuum Analytics and the XDATA Program as part of the Blaze Projecttl;dr: Disk read and write bandwidths depend strongly on block size.Disk read/write bandwidths on...
View ArticleIntroducing Dask distributed
This work is supported by Continuum Analytics and the XDATA Program as part of the Blaze Projecttl;dr: We analyze JSON data on a cluster using pure Python projects.Dask, a Python library for parallel...
View ArticlePandas on HDFS with Dask Dataframes
This work is supported by Continuum Analytics and the XDATA Program as part of the Blaze ProjectIn this post we use Pandas in parallel across an HDFS cluster to read CSV data. We coordinate these...
View ArticleDistributed Dask Arrays
This work is supported by Continuum Analytics and the XDATA Program as part of the Blaze ProjectIn this post we analyze weather data across a cluster using NumPy in parallel with dask.array. We focus...
View ArticleFast Message Serialization
This work is supported by Continuum Analytics and the XDATA Program as part of the Blaze ProjectVery high performance isn’t about doing one thing well, it’s about doing nothing poorly.This week I...
View ArticleAd Hoc Distributed Random Forests
This work is supported by Continuum Analytics and the XDATA Program as part of the Blaze ProjectA screencast version of this post is available here:...
View ArticleDask for Institutions
This work is supported by Continuum AnalyticsIntroductionInstitutions use software differently than individuals. Over the last few months I’ve had dozens of conversations about using Dask within larger...
View ArticleSupporting Users in Open Source
What are the social expectations of open source developers to help users understand their projects? What are the social expectations of users when asking for help?As part of developing Dask, an open...
View ArticleDask Distributed Release 1.13.0
I’m pleased to announce a release of Dask’s distributed scheduler, dask.distributed, version 1.13.0.conda install dask distributed -c conda-forge or pip install dask distributed --upgrade The last few...
View ArticleWhere to Write Prose?
Code is only as good as its prose.Like many programmers I spend more time writing prose than code. This is great; writing clean prose focuses my thoughts during design and disseminates understanding so...
View ArticleDask and Celery
This post compares two Python distributed task processing systems, Dask.distributed and Celery.Disclaimer: technical comparisons are hard to do well. I am biased towards Dask and ignorant of correct...
View ArticleDask Cluster Deployments
This work is supported by Continuum Analytics and the XDATA Program as part of the Blaze ProjectAll code in this post is experimental. It should not be relied upon. For people looking to deploy...
View ArticleDask Development Log
This work is supported by Continuum Analytics the XDATA Program and the Data Driven Discovery Initiative from the Moore FoundationDask has been active lately due to a combination of increased adoption...
View ArticleDask Development Log
This work is supported by Continuum Analytics the XDATA Program and the Data Driven Discovery Initiative from the Moore FoundationTo increase transparency I’m blogging weekly about the work done on...
View ArticleDask Development Log
This work is supported by Continuum Analytics the XDATA Program and the Data Driven Discovery Initiative from the Moore FoundationTo increase transparency I’m blogging weekly about the work done on...
View ArticleDask Development Log
This work is supported by Continuum Analytics the XDATA Program and the Data Driven Discovery Initiative from the Moore FoundationTo increase transparency I’m blogging weekly about the work done on...
View ArticleDask Release 0.13.0
This work is supported by Continuum Analytics the XDATA Program and the Data Driven Discovery Initiative from the Moore FoundationSummaryDask just grew to version 0.13.0. This is a signifcant release...
View Article