The primary use for the kubernetes kube-state-metrics add-on service is to provide cluster state metrics to monitoring systems like Prometheus. But, it may sometimes be useful to get the same metrics in a script or directly from your command line. That’s where kubestate comes in. It’s a command line utility that calls the kube-state-metrics API, then shows interesting views of the metrics. You can also use it to get the raw data values in various formats that can be used by scripts or other utilities.
Part 1 and Part 2 covered pods and services, which are two foundational building blocks to run an app in Kubernetes. This post will describe a third piece called Deployment
that helps you run the pod with more reliability.
As you saw in Part 1, pods are not accessible outside the cluster on their own. Running the kubectl
proxy command is an easy way to test in development. But when you go to production, a better option is required. That’s where the Service
component helps.
There are a number of reasons why pods fail to reach a running state. Missing required resources was covered in Part 1. Another scenario happens after the pod is successfully scheduled on a node. When the pod trys to start on its node and either crashes or exits unexpectedly, the restartPolicy
field of the pods PodSpec
determines what happens next. It may just fail and stop, if set to Never
. Or if set to Always
, which is the default, the pod could go into a never ending loop of exiting, trying again and exiting again… In this case its status will show as CrashLoopBackOff
.
There are a number of reasons why pods stall and never reach a running state. One common cause happens when the cluster lacks the resources required to run the pod.
Similar to learning a new coding framework, becoming productive with Kubernetes is mostly about understanding its components and how they’re used. There’s a long list of components and the list is growing daily. However, the 3 foundational pieces that arguably almost everything else builds upon are Pod
, Service
and Deployment
. For someone just learning Kubernetes, I recommend starting with those.
Kubernetes is an open source platform designed to schedule and manage containerized workloads across many hosts. The project was initially launched in 2015 by Google based on their own internal Borg system and is now maintained by the Cloud Native Computing Foundation (CNCF). Since it’s based on Borg which schedules over 2 billion containers per week, the architecture was built to scale and should be able to run your whole data center.
Happy New Year! I don’t normally make predictions for the new year. But, this year I’ve had a lightbulb moment. It took me a while to make a connection between these two seemingly independent trends. I’ve been following and working with them for a long time. The foundations of both DevOps and Machine Learning (ML) go back decades. The hype around them lately is deafening. Yet, I haven’t heard much about how well they go together. That’s about to change…
So you’ve trained your TensorFlow ML model, now what? (Part 3 - Using the model in other apps)
The first two posts in the series showed how to run a TensorFlow model, then how to deploy it to Kubernetes. Now that our image classification service is managed by a scheduler and scaled dynamically, the next step is to start using it. In this post, we’ll walk through an example Node.js single page app (SPA) that calls the service and displays the classification results.
So you’ve trained your TensorFlow ML model, now what? (Part 2 - Deploying to Kubernetes)
Part 1 showed how to run the open source pre-trained model Inception V3 as an image classification service. That’s all you need to do for a single classification instance, whether you run it on a server, your laptop or even a smart IoT device. But, what if you need to scale to handle a high volume of concurrent requests or you want multiple instances for resilency? For that, you’ll want to scale horizaontally with many instances. Then, a cluster managed by a resource scheduler like Kubernetes is a great way to go. Not only does it help with scalability, but it also enables deployment automation, improves manageability and will most likely result in better infrastructure utilization.
So you’ve trained your TensorFlow ML model, now what? (Part 1 - Running the model)
To show how to run it, I’ll use an open source pre-trained model, specifically Inception V3. Inception was developed at Google and won the ILSVRC 2014 image classification competition. Plus, it’s a fun model to play around with.