Kubernetes
About
Kubernetes is a container orchestration engine for automating deployment, scaling and management of containerized applications.
Components
Kubernetes cluster contains a set of nodes (machines).
master
nodes (Control Plane
) manage the worker
nodes and the cluster.
Control Plane
makes decisions about the cluster (for example: Pods
scheduling),
detecting and responding to cluster events (example: starting new Pod
when replicas
field of Deployment
is unsatisfied).
Control Plane
components can be run on any machine in the cluster (master
node can be worker
node in the same time),
but it’s recommended to run Control Plane
components on separate nodes (without user containers and with high availability).
For example, in AWS (Amazon Web Services) EKS (Elastic Kubernetes Service) Control Plane
runs in an account managed by AWS itself,
on its own set of Amazon EC2 instances in multiple availability zones.
Control Plane components
kube-apiserver
: Exposes the Kubernetes API (kubectl
uses this API).etcd
: Key-value storage, keeps all cluster data.kube-scheduler
: Watches for newly createdPods
with no assigned node, and selects a node for them to run on. It checks factors like resource requirements, affinity, anti-affinity, etc.kube-controller-manager
: Runs controller processes. For example:Node controller
- responsible for noticing and responding when nodes go down.cloud-controller-manager
: Cloud-specific control logic (for cloud providers).
Node components
kubelet
: Agent that runs on each node. It makes sure that containers are running. It doesn’t manage containers which were not created by Kubernetes.kube-proxy
: Network proxy that runs on each node, maintains network rules on nodes. Uses the operating system packet filtering layer (iptables
orIPVS
) if there available, otherwisekube-proxy
forwards the traffic itself.Container runtime
: Software that is responsible for running containers, for example:containerd
,CRI-O
or any other implementation of the KubernetesCRI
(Container Runtime Interface).
Other components
CoreDNS
:Deployment
(Pods
) responsible for DNS names resolution for all pods in the cluster.aws-node
(in Amazon EKS):DaemonSet
(Pods
) responsible for IP address management at the node level.
Workloads
Workload is an application running on Kubernetes.
Pods
Pod
is the smallest deployable unit, it represents a set of running containers
with shared storage and network resources.
Pod
phases: Pending
(accepted, waiting to be scheduled, downloading images), Running
(bound to node, at least one container is running), Succeeded
(all containers successfully terminated), Failed
(all containers terminated and at least one terminated with non-zero exit code), Unknown
(can’t get state, node connection issues).
Pod
statuses: Running
(phase Running
), Pending
(phase Pending
), Completed
(phase Running
or Failed
), ImagePullBackOff
(can’t pull image, will keep trying), CrashLoopBackOff
(container is started, but crashes, will keep trying to restart).
Container states: Waiting
(waiting for start up, example: pulling image), Running
(is executing without issues), Terminated
(you check reason and exit code).
Container restartPolicy
: Always
(container will be restarted even if exited successfully - zero exit code), OnFailure
(container will be restarted only if exited with a non-zero exit), Never
(container will not be restarted at all).
Workload Resources and Controllers
Workload resources manage a set of pods on your behalf, these resources configure controllers. Controller tracks Kubernetes resources, it tries to make the current state come closer to desired state.
Deployment
: Provides declarative updates forPods
andReplicaSets
. Creates and controlsReplicaSet
forPods
. Deployment is a good fit for managing a stateless applications, where anyPod
is interchangeable.ReplicaSet
: Maintain a stable set of replicaPods
running at any given time.StatefulSet
: ManagesPods
similar toDeployment
, but provides guarantees about the ordering and uniqueness of these Pods. Provides: stable persistent storage, ordered graceful deployment and scaling, ordered automated rolling updates (here “stable” is synonymous with persistence acrossPod
(re)scheduling). If application doesn’t require any stable identifiers or ordered deployment, deletion or scaling - it should be deployed usingDeployment
.DaemonSet
: Ensures that all (or some:nodeSelector
,affinity
) nodes run a copy of a Pod. Useful for node-local operations, for example: get node metrics.Jobs
: Creates one or more (ifparallelism
is greater1
)Pods
and will continue to retry (specified number of times) execution ifPod
fails. When a specified number of successful completions (completions
can me greater than1
ifparallelism
is greater than1
) is reached -Job
is successfully completed. DeletingJob
will clean up thePods
it created. Suspending (suspend: true
)Job
will delete its activePods
until theJob
is resumed again.CronJob
: Performs regular scheduled actions (scheduling with CRON syntax). CreatesJobs
according to a schedule.
Liveness, Readiness and Startup Probes
livenessProbe
is used to know when to restart a container. For example: application is running but doesn’t work properly (deadlock or some other issue).
Restarting a container in such state can help to make the application more available despite bugs.
readinessProbe
is used to know when a container is ready to accept traffic. Pod
is ready when all of its containers are ready.
When Pod
is not ready, it is removed from Service
.
startupProbe
is used to know when a container application has started. If configured, it disables livenessProbe
and readinessProbe
checks until it succeeds,
making sure those probes don’t interfere with the application startup. Useful for slow starting containers.
Probes configuration:
initialDelaySeconds
: Wait number of seconds after the container has started (before probe). Defaults to0
seconds, minimum is0
.periodSeconds
: How often (in seconds) to perform the probe. Default to10
seconds, minimum value is1
.timeoutSeconds
: Number of seconds after which the probe times out. Defaults to1
second, minimum is1
.successThreshold
: Minimum consecutive successes for the probe to be considered successful after having failed. Defaults to1
, minimum is1
.failureThreshold
: After the probe fails specified times in a row, the overall check has failed: the container is not ready / healthy / live. For the case of a startup or liveness probe - container will be restarted.
Service
Service
is an abstract way to expose application running on a set of Pods
(based on selector
- Pod
labels) as a network service (load balanced as round robin / random).
Service
types:
ClusterIP
: Default type, cluster-internal IP address. Only reachable within the cluster network.NodePort
: Exposes the service on each node at a static port.LoadBalancer
: Cloud provider will create a load balancer (integrates NodePort with cloud-based load balancers).ExternalName
: Map to a DNS name, will returnCNAME
record with specified value.
If selector
is not defined, corresponding EndpointSlice
(legacy Endpoints
) are not created automatically and should be created manually. EndpointSlice
contains references to a set of network endpoints.
Headless Service
used when you don’t need load balancing across Pods
and single IP for the Service
. If will return DNS A
/ AAAA
records for each IP or DNS CNAME
record for type: ExternalName
. It can be created if explicitly specify None
for the clusterIP
field.
Examples
Environment variables
# This example shows how to add environment variables to container from ConfigMap, Secret and
# downwards API (node: name and IP; pod: namespace, name and IP; resources: requests and limits).
# ConfigMap
apiVersion: v1
kind: ConfigMap
metadata:
name: config-environment-all
data:
CONFIG_VARIABLE1: config-value1
CONFIG_VARIABLE2: config-value2
---
apiVersion: v1
kind: ConfigMap
metadata:
name: config-environment-custom
data:
config_custom_variable1: config-custom-value1
config_custom_variable2: config-custom-value2
---
# Secret
apiVersion: v1
kind: Secret
type: Opaque
metadata:
name: secret-environment-all
data:
SECRET_VARIABLE1: c2VjcmV0LXZhbHVlMQ==
SECRET_VARIABLE2: c2VjcmV0LXZhbHVlMg==
---
apiVersion: v1
kind: Secret
type: Opaque
metadata:
name: secret-environment-custom
data:
secret_custom_variable1: c2VjcmV0LWN1c3RvbS12YWx1ZTE=
secret_custom_variable2: c2VjcmV0LWN1c3RvbS12YWx1ZTI=
---
# Deployment
apiVersion: apps/v1
kind: Deployment
metadata:
name: environment-variables
spec:
selector:
matchLabels:
app: environment-variables
template:
metadata:
labels:
app: environment-variables
spec:
# Don't wait (SIGTERM) just kill (SIGKILL)
terminationGracePeriodSeconds: 0
containers:
- name: alpine
image: alpine:latest
command: ["/bin/sh", "-c", "--"]
args:
- >
while true; do
now=$(TZ=UTC date +"%Y-%m-%d %H:%M:%S UTC");
echo "--";
echo "Time: ${now}";
echo "NODE_NAME: $NODE_NAME, NODE_IP: $NODE_IP";
echo "POD_NAMESPACE: $POD_NAMESPACE, POD_NAME: $POD_NAME, POD_IP: $POD_IP";
echo "SERVICE_ACCOUNT: $SERVICE_ACCOUNT";
echo "REQUESTS_CPU: $REQUESTS_CPU, REQUESTS_MEMORY: $REQUESTS_MEMORY"
echo "LIMITS_CPU: $LIMITS_CPU, LIMITS_MEMORY: $LIMITS_MEMORY";
echo "CONFIG_VARIABLE1: $CONFIG_VARIABLE1, CONFIG_VARIABLE2: $CONFIG_VARIABLE2";
echo "CONFIG_CUSTOM_VARIABLE1: $CONFIG_CUSTOM_VARIABLE1"
echo "CONFIG_CUSTOM_VARIABLE2: $CONFIG_CUSTOM_VARIABLE2";
echo "SECRET_VARIABLE1: $SECRET_VARIABLE1, SECRET_VARIABLE2: $SECRET_VARIABLE2";
echo "SECRET_CUSTOM_VARIABLE1: $SECRET_CUSTOM_VARIABLE1"
echo "SECRET_CUSTOM_VARIABLE2: $SECRET_CUSTOM_VARIABLE2";
sleep 5;
done
env:
# Downwards API (see: https://kubernetes.io/docs/concepts/workloads/pods/downward-api/)
- name: NODE_NAME
valueFrom:
fieldRef:
fieldPath: spec.nodeName
- name: NODE_IP
valueFrom:
fieldRef:
fieldPath: status.hostIP
- name: POD_NAMESPACE
valueFrom:
fieldRef:
fieldPath: metadata.namespace
- name: POD_NAME
valueFrom:
fieldRef:
fieldPath: metadata.name
- name: POD_IP
valueFrom:
fieldRef:
fieldPath: status.podIP
- name: SERVICE_ACCOUNT
valueFrom:
fieldRef:
fieldPath: spec.serviceAccountName
- name: REQUESTS_CPU
valueFrom:
resourceFieldRef:
resource: requests.cpu
- name: REQUESTS_MEMORY
valueFrom:
resourceFieldRef:
resource: requests.memory
- name: LIMITS_CPU
valueFrom:
resourceFieldRef:
resource: limits.cpu
- name: LIMITS_MEMORY
valueFrom:
resourceFieldRef:
resource: limits.memory
# Config (custom)
- name: CONFIG_CUSTOM_VARIABLE1
valueFrom:
configMapKeyRef:
name: config-environment-custom
key: config_custom_variable1
- name: CONFIG_CUSTOM_VARIABLE2
valueFrom:
configMapKeyRef:
name: config-environment-custom
key: config_custom_variable2
# Secret (custom)
- name: SECRET_CUSTOM_VARIABLE1
valueFrom:
secretKeyRef:
name: secret-environment-custom
key: secret_custom_variable1
- name: SECRET_CUSTOM_VARIABLE2
valueFrom:
secretKeyRef:
name: secret-environment-custom
key: secret_custom_variable2
envFrom:
# Config (all)
- configMapRef:
name: config-environment-all
# Secret (all)
- secretRef:
name: secret-environment-all
resources:
requests:
cpu: 50m
memory: 128Mi
limits:
cpu: 100m
memory: 256Mi