TL;DR: This site exists to provide you with news, tooling, and tutorials around Machine Learning on Kubernetes.
In 2017, two things happened: Kubernetes established itself as the de-facto standard for running containers in a cluster and Machine Learning just passed the peak of inflated expectations according to Gartner. Combining these two ideas seems like a good idea. So, I decided to set up an advocacy site aiming to help you with Machine Learning on Kubernetes by providing you with updates, tooling, and tutorials around this very topic.
This prompts two questions, immediately: why combining these two areas? and why me?
Glad you asked, here we go …
Machine Learning + Kubernetes = ?
Machine Learning (ML) is an increasingly popular field with many exciting application areas, from image analysis to speech recognition to recommender systems.
In addition to the managed offerings of the big three public cloud providers —Machine Learning on AWS, Azure Machine Learning Studio, and Google Cloud AI—a range of open source solutions are available, including but not limited to:
- R—the data scientist's go-to workhorse.
- Python-based systems like scikit-learn, PyTorch, NumPy, pandas, and Jupyter Notebook.
- Java-based ML systems such as Apache Spark's MLlib and FlinkML from Apache Flink or the venerable Apache Mahout.
- The up and coming star Tensorflow, open-sourced by Google in a surprise move in 2015.
Now, while you can and often will do ML on your laptop, there are reasons why you would want to benefit from a distributed processing setup. This means carrying out the ML task at hand in parallel and potentially benefiting from specialized hardware such as GPGPUs or FPGAs. Sounds cool? It is, until you realize that distributed systems are hard and using ML in such a setup, as a developer or data scientist, can be hard. There are many moving parts to learn and often (when depending on a cloud provider specific environment) it also means locking yourself in.
Kubernetes makes it easy to run apps written in virtually any programming language in a standardized way, on a cluster of servers. With standardized I mean that the packaging, deployment and operation of the app are well-defined and portable across different environments—no matter if you run an app on your laptop or in a public cloud setting. Also, Kubernetes' architecture is modular and extensible:
Kubernetes architecture by Lucas Käldström
Last but not least, the big three cloud providers all support Kubernetes now: Google Kubernetes Engine, Azure Container Service, and Amazon Elastic Container Service for Kubernetes (preview). Further, there are plenty of Kubernetes distributions you can choose from, for example OpenShift, should you wish to run it yourself.
What does Michael bring to the table?
My background is in large-scale data processing on the symbolic level. That is, I spent more than a decade doing research around knowledge representation for audiovisual media, data on the Web and how to semantically connect datasets. In this process I learned a lot about datasets, especially how to acquire and clean them. What I didn't know back then was how very useful this skill set would turn out to be, later on.
Everyone who's into ML can confirm that north of 70% of the hard work actually is around cleaning up the data.
When I moved to industry from research (as a data engineer) I had with Ted Dunning an excellent mentor at MapR. Ted had gathered a lot of experience with ML and I certainly benefited from it, a lot. Given that ML is sub-symbolic, that is, no explicit model or representation exists of the algorithm of interest, it was a tough time for me to re-focus and learn but it did pay off.
The combined experience from research, as a data engineer and nowadays as a container & cloud native practitioner should come in handy in making this site helpful and interesting for you. At the end of the day one thing is for sure: Kubernetes-based Machine Learning rocks!
Welcome to KML.rocks and hope you enjoy the content and find it useful.
Header photo by Samuel Zeller/Unsplash.