Kubernetes
→ 返回运维工具
容器编排平台,自动化容器的部署、扩缩容、滚动更新和自愈。简称 K8s。
架构概览
┌─────────────────────────────────┐
│ Control Plane │
│ ┌─────────┐ ┌───────────────┐ │
kubectl ─────────►│ │ API │ │ etcd │ │
│ │ Server │ │ (集群状态存储) │ │
│ └─────────┘ └───────────────┘ │
│ ┌──────────┐ ┌───────────────┐ │
│ │Scheduler │ │ Controller │ │
│ │(调度器) │ │ Manager │ │
│ └──────────┘ └───────────────┘ │
└─────────────────────────────────┘
│
┌───────────────┼───────────────┐
▼ ▼ ▼
┌──────────┐ ┌──────────┐ ┌──────────┐
│ Node 1 │ │ Node 2 │ │ Node 3 │
│ kubelet │ │ kubelet │ │ kubelet │
│kube-proxy│ │kube-proxy│ │kube-proxy│
│ Pod... │ │ Pod... │ │ Pod... │
└──────────┘ └──────────┘ └──────────┘
| 组件 | 说明 |
|---|---|
| API Server | 所有操作的统一入口,提供 REST API |
| etcd | 分布式键值存储,保存集群全部状态 |
| Scheduler | 将 Pod 分配到合适的 Node |
| Controller Manager | 维护期望状态(副本数、节点状态等) |
| kubelet | Node 上的 Agent,管理 Pod 生命周期 |
| kube-proxy | 维护 Node 上的网络规则,实现 Service 路由 |
核心资源
| 资源 | 说明 |
|---|---|
| Pod | 最小调度单位,包含一个或多个容器 |
| Deployment | 管理无状态应用,支持滚动更新和回滚 |
| StatefulSet | 管理有状态应用,Pod 名称和存储稳定有序 |
| DaemonSet | 每个 Node 上运行一个副本(日志采集、监控等) |
| Job | 运行一次性任务,成功完成即退出 |
| CronJob | 按 cron 表达式定时运行 Job |
| Service | 为 Pod 提供稳定的网络访问入口 |
| Ingress | HTTP/HTTPS 路由,暴露服务到外部 |
| ConfigMap | 存储非敏感配置 |
| Secret | 存储敏感信息(密码、Token) |
| Namespace | 逻辑隔离,区分环境 |
| PV / PVC | 持久化存储卷及其声明 |
| HPA | 基于指标自动水平扩缩容 |
| ServiceAccount | Pod 访问 API Server 的身份 |
kubectl 常用命令
集群与上下文
kubectl config get-contexts # 列出所有上下文
kubectl config current-context # 当前上下文
kubectl config use-context <name> # 切换集群/命名空间
kubectl config set-context --current --namespace=dev # 设置默认命名空间
kubectl cluster-info # 集群基本信息
kubectl get nodes # 查看所有节点
kubectl get nodes -o wide # 含 IP、系统版本等
kubectl describe node <node-name> # 节点详情
kubectl top node # 节点资源使用(需 metrics-server)查看资源
# Pod
kubectl get pods -n <namespace>
kubectl get pods -A # 所有命名空间
kubectl get pods -o wide # 含节点信息
kubectl get pods -l app=myapp # 按标签过滤
kubectl get pods --watch # 实时监听状态变化
kubectl describe pod <pod-name> -n <ns> # 详情(含 Events)
# 其他资源
kubectl get deploy,svc,ing -n dev # 多类型一起查
kubectl get all -n dev # 查看命名空间所有资源
kubectl get events -n dev --sort-by=.lastTimestamp # 按时间排序事件日志与调试
kubectl logs <pod-name> -f # 实时日志
kubectl logs <pod-name> -c <container> # 多容器时指定
kubectl logs <pod-name> --previous # 上一次崩溃的容器日志
kubectl logs <pod-name> --tail=200 # 最近 200 行
kubectl exec -it <pod-name> -- bash # 进入容器
kubectl exec -it <pod-name> -c <c> -- sh # 多容器时指定
kubectl port-forward pod/<pod-name> 8080:8080 # 端口转发(调试用)
kubectl port-forward svc/<svc-name> 8080:80 # 转发到 Service
kubectl cp <pod-name>:/path/to/file ./file # 从容器复制文件
# 临时 Debug 容器(K8s 1.25+)
kubectl debug -it <pod-name> --image=busybox --target=<container>部署操作
kubectl apply -f deployment.yaml # 创建或更新
kubectl apply -f ./k8s/ # 应用目录下所有 YAML
kubectl delete -f deployment.yaml
kubectl delete pod <pod-name> --force # 强制删除(卡住时用)
kubectl scale deployment myapp --replicas=3
kubectl rollout status deployment myapp # 查看滚动更新进度
kubectl rollout history deployment myapp
kubectl rollout undo deployment myapp # 回滚到上一版本
kubectl rollout undo deployment myapp --to-revision=2
kubectl rollout restart deployment myapp # 触发重启(拉取新镜像等)
kubectl set image deployment/myapp myapp=myapp:2.0 # 更新镜像
kubectl label pod <pod-name> env=debug # 打标签
kubectl annotate deployment myapp description="my app"
kubectl taint nodes <node> key=val:NoSchedule # 污点资源监控
kubectl top pods -n dev # Pod CPU/内存(需 metrics-server)
kubectl top pods --containers # 按容器细分
kubectl top node
kubectl get hpa -n dev # 查看 HPA 状态Deployment
apiVersion: apps/v1
kind: Deployment
metadata:
name: myapp
namespace: dev
labels:
app: myapp
spec:
replicas: 3
selector:
matchLabels:
app: myapp
strategy:
type: RollingUpdate
rollingUpdate:
maxSurge: 1 # 最多多出 1 个 Pod
maxUnavailable: 0 # 更新期间不允许不可用 Pod
template:
metadata:
labels:
app: myapp
spec:
containers:
- name: myapp
image: myapp:1.0
ports:
- containerPort: 8080
resources:
requests:
cpu: "100m"
memory: "256Mi"
limits:
cpu: "500m"
memory: "512Mi"
# 就绪探针:通过后才加入 Service 流量
readinessProbe:
httpGet:
path: /actuator/health/readiness
port: 8080
initialDelaySeconds: 20
periodSeconds: 10
failureThreshold: 3
# 存活探针:失败则重启容器
livenessProbe:
httpGet:
path: /actuator/health/liveness
port: 8080
initialDelaySeconds: 30
periodSeconds: 15
failureThreshold: 3
# 启动探针:启动期间禁用存活检查(慢启动应用)
startupProbe:
httpGet:
path: /actuator/health
port: 8080
failureThreshold: 30
periodSeconds: 10
env:
- name: SPRING_PROFILES_ACTIVE
value: prod
envFrom:
- configMapRef:
name: app-config
- secretRef:
name: app-secret
volumeMounts:
- name: logs
mountPath: /app/logs
volumes:
- name: logs
emptyDir: {}
# 亲和性:尽量分散到不同节点
affinity:
podAntiAffinity:
preferredDuringSchedulingIgnoredDuringExecution:
- weight: 100
podAffinityTerm:
labelSelector:
matchLabels:
app: myapp
topologyKey: kubernetes.io/hostnameService
apiVersion: v1
kind: Service
metadata:
name: myapp-svc
namespace: dev
spec:
selector:
app: myapp
ports:
- name: http
port: 80
targetPort: 8080
type: ClusterIP # ClusterIP / NodePort / LoadBalancerService 类型说明
| 类型 | 说明 | 适用场景 |
|---|---|---|
ClusterIP | 仅集群内访问(默认) | 服务间通信 |
NodePort | 在每个节点开放端口(30000-32767) | 测试环境外部访问 |
LoadBalancer | 云厂商提供外部负载均衡器 | 生产环境对外暴露 |
ExternalName | DNS CNAME 别名 | 访问集群外服务 |
Headless | clusterIP: None,直接返回 Pod IP | StatefulSet 有序访问 |
Ingress
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: myapp-ingress
namespace: dev
annotations:
nginx.ingress.kubernetes.io/rewrite-target: /
nginx.ingress.kubernetes.io/ssl-redirect: "true"
spec:
ingressClassName: nginx
tls:
- hosts:
- api.example.com
secretName: tls-secret
rules:
- host: api.example.com
http:
paths:
- path: /api
pathType: Prefix
backend:
service:
name: myapp-svc
port:
number: 80
- path: /admin
pathType: Prefix
backend:
service:
name: admin-svc
port:
number: 80ConfigMap 与 Secret
apiVersion: v1
kind: ConfigMap
metadata:
name: app-config
namespace: dev
data:
APP_ENV: production
LOG_LEVEL: info
# 整个配置文件作为值
application.yml: |
server:
port: 8080
logging:
level:
root: INFOapiVersion: v1
kind: Secret
metadata:
name: app-secret
namespace: dev
type: Opaque
data:
DB_PASSWORD: MTIzNDU2 # base64: echo -n '123456' | base64
DB_URL: amRiYzpteXNxbDovL2xvY2FsaG9zdDozMzA2L215ZGI=挂载为文件(适合多行配置):
volumeMounts:
- name: config-vol
mountPath: /app/config
volumes:
- name: config-vol
configMap:
name: app-config
items:
- key: application.yml
path: application.ymlStatefulSet
适用于 MySQL、Redis、Elasticsearch 等有状态应用。每个 Pod 有稳定的网络标识(pod-0、pod-1)和独立存储。
apiVersion: apps/v1
kind: StatefulSet
metadata:
name: mysql
namespace: dev
spec:
serviceName: mysql-headless # 必须配合 Headless Service
replicas: 3
selector:
matchLabels:
app: mysql
template:
metadata:
labels:
app: mysql
spec:
containers:
- name: mysql
image: mysql:8.0
env:
- name: MYSQL_ROOT_PASSWORD
valueFrom:
secretKeyRef:
name: mysql-secret
key: password
volumeMounts:
- name: data
mountPath: /var/lib/mysql
# 每个 Pod 自动创建独立 PVC
volumeClaimTemplates:
- metadata:
name: data
spec:
accessModes: ["ReadWriteOnce"]
storageClassName: standard
resources:
requests:
storage: 10Gi
---
# Headless Service:让 Pod 可通过 mysql-0.mysql-headless 访问
apiVersion: v1
kind: Service
metadata:
name: mysql-headless
namespace: dev
spec:
clusterIP: None
selector:
app: mysql
ports:
- port: 3306DaemonSet
每个 Node 自动运行一个 Pod,节点加入集群时自动部署。常用于日志采集(Filebeat)、监控代理(node-exporter)、网络插件。
apiVersion: apps/v1
kind: DaemonSet
metadata:
name: node-exporter
namespace: monitoring
spec:
selector:
matchLabels:
app: node-exporter
template:
metadata:
labels:
app: node-exporter
spec:
tolerations:
- key: node-role.kubernetes.io/control-plane
effect: NoSchedule # 允许在 control-plane 节点运行
hostNetwork: true # 使用宿主机网络
hostPID: true
containers:
- name: node-exporter
image: prom/node-exporter:latest
ports:
- containerPort: 9100
hostPort: 9100
securityContext:
privileged: true
volumeMounts:
- name: proc
mountPath: /host/proc
readOnly: true
volumes:
- name: proc
hostPath:
path: /procJob 与 CronJob
# 一次性任务
apiVersion: batch/v1
kind: Job
metadata:
name: db-migration
spec:
completions: 1 # 需要成功完成的次数
parallelism: 1 # 并行 Pod 数
backoffLimit: 3 # 失败重试次数
ttlSecondsAfterFinished: 300 # 完成后 5 分钟自动清理
template:
spec:
restartPolicy: OnFailure # Job 必须设置 OnFailure 或 Never
containers:
- name: migration
image: myapp:1.0
command: ["java", "-jar", "app.jar", "--migrate"]# 定时任务
apiVersion: batch/v1
kind: CronJob
metadata:
name: cleanup-job
spec:
schedule: "0 2 * * *" # 每天凌晨 2 点
concurrencyPolicy: Forbid # 禁止并发(Allow/Forbid/Replace)
successfulJobsHistoryLimit: 3
failedJobsHistoryLimit: 1
jobTemplate:
spec:
template:
spec:
restartPolicy: OnFailure
containers:
- name: cleanup
image: myapp:1.0
command: ["java", "-jar", "app.jar", "--cleanup"]HPA 自动扩缩容
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: myapp-hpa
namespace: dev
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: myapp
minReplicas: 2
maxReplicas: 10
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 70 # CPU 使用率超 70% 时扩容
- type: Resource
resource:
name: memory
target:
type: AverageValue
averageValue: 400Mi
behavior:
scaleDown:
stabilizationWindowSeconds: 300 # 缩容稳定窗口 5 分钟
policies:
- type: Percent
value: 20
periodSeconds: 60
scaleUp:
stabilizationWindowSeconds: 60
policies:
- type: Pods
value: 2
periodSeconds: 60前提:集群中需安装
metrics-server。
PersistentVolume 与 PVC
# PV:管理员手动创建(或由 StorageClass 动态供应)
apiVersion: v1
kind: PersistentVolume
metadata:
name: nfs-pv
spec:
capacity:
storage: 50Gi
accessModes:
- ReadWriteMany # RWO(单节点)/ ROX(多节点只读)/ RWX(多节点读写)
persistentVolumeReclaimPolicy: Retain # Retain / Recycle / Delete
storageClassName: nfs
nfs:
path: /data/k8s
server: 192.168.1.100
---
# PVC:应用声明存储需求
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: app-pvc
namespace: dev
spec:
accessModes:
- ReadWriteOnce
storageClassName: standard # 留空则匹配无 storageClass 的 PV
resources:
requests:
storage: 10Gi在 Pod 中使用:
volumes:
- name: data
persistentVolumeClaim:
claimName: app-pvc
containers:
- volumeMounts:
- name: data
mountPath: /dataRBAC 权限控制
# Role:命名空间级别权限
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
name: pod-reader
namespace: dev
rules:
- apiGroups: [""]
resources: ["pods", "pods/log"]
verbs: ["get", "list", "watch"]
- apiGroups: ["apps"]
resources: ["deployments"]
verbs: ["get", "list", "watch", "patch"]
---
# RoleBinding:将 Role 绑定到用户/SA
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
name: pod-reader-binding
namespace: dev
subjects:
- kind: ServiceAccount
name: ci-bot
namespace: dev
- kind: User
name: alice
apiGroup: rbac.authorization.k8s.io
roleRef:
kind: Role
name: pod-reader
apiGroup: rbac.authorization.k8s.io# ClusterRole:集群级别权限(跨命名空间)
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
name: node-reader
rules:
- apiGroups: [""]
resources: ["nodes"]
verbs: ["get", "list", "watch"]Init 容器与 Sidecar
spec:
# Init 容器:按顺序运行,全部成功后主容器才启动
initContainers:
- name: wait-for-db
image: busybox
command: ['sh', '-c', 'until nc -z mysql 3306; do sleep 2; done']
- name: db-migrate
image: myapp:1.0
command: ["java", "-jar", "app.jar", "--migrate-only"]
containers:
- name: myapp
image: myapp:1.0
# Sidecar:与主容器同生命周期(日志采集等)
- name: log-shipper
image: fluent/fluent-bit:latest
volumeMounts:
- name: logs
mountPath: /app/logs
readOnly: trueNamespace 管理
kubectl create namespace dev
kubectl get namespaces
kubectl delete namespace dev # 删除命名空间及其所有资源
# 设置资源配额(ResourceQuota)
kubectl apply -f - <<EOF
apiVersion: v1
kind: ResourceQuota
metadata:
name: dev-quota
namespace: dev
spec:
hard:
pods: "20"
requests.cpu: "4"
requests.memory: 8Gi
limits.cpu: "8"
limits.memory: 16Gi
persistentvolumeclaims: "10"
EOF
# 设置默认资源限制(LimitRange)
kubectl apply -f - <<EOF
apiVersion: v1
kind: LimitRange
metadata:
name: dev-limit-range
namespace: dev
spec:
limits:
- type: Container
default:
cpu: "200m"
memory: "256Mi"
defaultRequest:
cpu: "100m"
memory: "128Mi"
EOFHelm 包管理
Helm 是 K8s 的包管理器,将一组相关 K8s 资源打包为 Chart。
# 安装 Helm
curl https://raw.githubusercontent.com/helm/helm/main/scripts/get-helm-3 | bash
# 仓库管理
helm repo add stable https://charts.helm.sh/stable
helm repo add bitnami https://charts.bitnami.com/bitnami
helm repo update
helm repo list
# 搜索与查看
helm search repo mysql
helm show values bitnami/mysql # 查看默认配置
# 安装
helm install my-mysql bitnami/mysql \
--namespace dev \
--create-namespace \
--set auth.rootPassword=secret \
--set primary.persistence.size=20Gi
# 用 values 文件安装
helm install my-app ./mychart -f values-prod.yaml -n prod
# 管理
helm list -n dev # 列出已安装
helm status my-mysql -n dev
helm upgrade my-mysql bitnami/mysql --set auth.rootPassword=newpwd -n dev
helm rollback my-mysql 1 -n dev # 回滚到版本 1
helm uninstall my-mysql -n dev
# 创建自定义 Chart
helm create mychart
helm lint mychart # 验证 Chart
helm template mychart # 渲染模板(不部署,用于调试)
helm package mychart # 打包为 .tgz从0-1 快速安装(k3s 单机版)
k3s 是轻量级 K8s,适合开发/测试环境,单命令安装。
# 安装 k3s(Server 节点)
curl -sfL https://get.k3s.io | sh -
# 国内加速(使用官方镜像源)
curl -sfL https://rancher-mirror.rancher.cn/k3s/k3s-install.sh | \
INSTALL_K3S_MIRROR=cn sh -
# 查看状态
sudo systemctl status k3s
sudo kubectl get nodes
# 配置 kubectl(非 root 用户)
mkdir -p ~/.kube
sudo cp /etc/rancher/k3s/k3s.yaml ~/.kube/config
sudo chown $(id -u):$(id -g) ~/.kube/config
# 添加 Worker 节点
NODE_TOKEN=$(sudo cat /var/lib/rancher/k3s/server/node-token)
# 在 Worker 节点执行:
curl -sfL https://get.k3s.io | K3S_URL=https://<server-ip>:6443 \
K3S_TOKEN=<NODE_TOKEN> sh -
# 卸载
sudo /usr/local/bin/k3s-uninstall.sh从0-1 生产安装(kubeadm 多节点)
# 所有节点执行(Ubuntu 22.04)
# 1. 关闭 swap
sudo swapoff -a
sudo sed -i '/ swap / s/^/#/' /etc/fstab
# 2. 内核参数
cat <<EOF | sudo tee /etc/sysctl.d/k8s.conf
net.bridge.bridge-nf-call-iptables = 1
net.bridge.bridge-nf-call-ip6tables = 1
net.ipv4.ip_forward = 1
EOF
sudo sysctl --system
# 3. 安装 containerd
sudo apt-get update && sudo apt-get install -y containerd
sudo mkdir -p /etc/containerd
containerd config default | sudo tee /etc/containerd/config.toml
# 改 SystemdCgroup = true
sudo sed -i 's/SystemdCgroup = false/SystemdCgroup = true/' /etc/containerd/config.toml
sudo systemctl restart containerd
# 4. 安装 kubeadm / kubelet / kubectl
sudo apt-get install -y apt-transport-https ca-certificates curl gpg
curl -fsSL https://pkgs.k8s.io/core:/stable:/v1.29/deb/Release.key | \
sudo gpg --dearmor -o /etc/apt/keyrings/kubernetes-apt-keyring.gpg
echo 'deb [signed-by=/etc/apt/keyrings/kubernetes-apt-keyring.gpg] https://pkgs.k8s.io/core:/stable:/v1.29/deb/ /' | \
sudo tee /etc/apt/sources.list.d/kubernetes.list
sudo apt-get update
sudo apt-get install -y kubelet kubeadm kubectl
sudo apt-mark hold kubelet kubeadm kubectl
# ---- 仅 Control Plane 节点 ----
# 5. 初始化集群
sudo kubeadm init \
--pod-network-cidr=10.244.0.0/16 \
--apiserver-advertise-address=<control-plane-ip>
# 6. 配置 kubectl
mkdir -p $HOME/.kube
sudo cp /etc/kubernetes/admin.conf $HOME/.kube/config
sudo chown $(id -u):$(id -g) $HOME/.kube/config
# 7. 安装网络插件(Flannel)
kubectl apply -f https://github.com/flannel-io/flannel/releases/latest/download/kube-flannel.yml
# ---- Worker 节点 ----
# 8. 加入集群(kubeadm init 输出的命令)
sudo kubeadm join <control-plane-ip>:6443 \
--token <token> \
--discovery-token-ca-cert-hash sha256:<hash>常见排障
# Pod 一直 Pending
kubectl describe pod <pod> -n <ns> # 看 Events,通常是资源不足或镜像拉取失败
# Pod CrashLoopBackOff
kubectl logs <pod> --previous # 查看上次崩溃日志
# 镜像拉取失败(ImagePullBackOff)
kubectl describe pod <pod> | grep -A5 Events
# → 检查镜像名/tag、私有仓库 Secret
# 节点 NotReady
kubectl describe node <node> # 看 Conditions 和 Events
kubectl get pods -n kube-system # 检查 CNI 插件 Pod 状态
# Service 无法访问
kubectl get endpoints <svc> -n <ns> # 检查 Endpoint 是否有 IP(selector 不匹配则为空)
# 查看资源使用最高的 Pod
kubectl top pods -A --sort-by=cpu
kubectl top pods -A --sort-by=memory相关文档
- Docker — 构建容器镜像
- Nginx — Nginx Ingress Controller
- Jenkins — CI/CD 部署到 K8s
- GitHub Actions — 自动化部署
- Prometheus — K8s 集群监控