Ray: A System for Distributed Applications
Ray (ray.io) is a framework for scaling Python applications from single machines to large clusters. It is used in several ML/AI systems and production deployments.
I'll explain common problems in scalable, distributed computing, particularly for high-performance ML/AI applications that motivated that creation of Ray. We'll see how Ray solves them for Python-based systems (and possibly other languages in the future).
In particular, Ray supports rapid distribution, scheduling, and execution of fine-grained “tasks”, a more natural decomposition of work for many problems compared to coarse-grained decomposition. Sequencing of dependent tasks cluster-wide is also transparent and intuitive.
Ray also manages distributed state using the popular Actor model, which is essential for the next generation of “serverless” computing, where these services are stateful.
Whether or not you are a Python or ML/AI developer, the general lessons discussed are broadly applicable.