问题现象
工作节点(k8s-node1)加入集群后,无法拉取容器镜像,导致:
Pod 状态显示
ImagePullBackOffcrictl pull nginx:alpine命令失败错误信息:
connection refused或i/o timeout
故障排查过程
第一层:检查容器运行时状态
执行命令:
systemctl status containerd
crictl info
发现:
containerd 服务正在运行
但
crictl info报错:unknown service runtime.v1.ImageService
根本原因: containerd 的 CRI 插件未正确加载
第二层:检查 containerd CRI 插件
执行命令:
ctr plugin ls | grep cri
输出:

结论: CRI 插件状态为 error,导致 Kubernetes 无法通过 containerd 拉取镜像
第三层:分析网络连通性
执行命令:
curl -v https://registry-1.docker.io/v2/
nslookup registry-1.docker.io
ip route show | grep default
发现:
DNS 解析正常:
registry-1.docker.io→108.160.165.9但 TCP 连接失败:
connect to 157.240.7.5 port 443 failed: Connection refused网络出口正常:
default via 192.168.102.2 dev ens33
结论: 工作节点无法直接访问 Docker Hub(网络层面被阻断)
解决方案
步骤一:修复 containerd CRI 插件(必须首先解决)
问题: containerd 的 CRI 插件未加载
解决步骤:
# 1. 完全卸载 containerd
systemctl stop containerd kubelet
apt remove -y containerd
apt autoremove -y
rm -rf /etc/containerd /var/lib/containerd /run/containerd
# 2. 重新安装 containerd
apt update
apt install -y containerd
# 3. 生成默认配置
mkdir -p /etc/containerd
containerd config default > /etc/containerd/config.toml
# 4. 配置必要的参数(不添加镜像加速)
sed -i 's/SystemdCgroup = false/SystemdCgroup = true/' /etc/containerd/config.toml
sed -i 's|sandbox_image = "registry.k8s.io/pause:3.6"|sandbox_image = "registry.aliyuncs.com/google_containers/pause:3.9"|' /etc/containerd/config.toml
# 5. 启动 containerd
systemctl start containerd
# 6. 验证 CRI 插件状态
ctr plugin ls | grep cri
# 预期输出:io.containerd.grpc.v1 cri linux/amd64 ok

步骤二:配置镜像加速器(解决网络不通问题)
问题: 工作节点无法直接访问 Docker Hub
解决步骤:
# 1. 编辑 containerd 配置文件
vi /etc/containerd/config.toml
找到 [plugins."io.containerd.grpc.v1.cri".registry] 部分,修改为:
[plugins."io.containerd.grpc.v1.cri".registry]
config_path = ""
[plugins."io.containerd.grpc.v1.cri".registry.mirrors]
[plugins."io.containerd.grpc.v1.cri".registry.mirrors."docker.io"]
endpoint = ["https://docker.m.daocloud.io", "https://hub-mirror.c.163.com"]
[plugins."io.containerd.grpc.v1.cri".registry.mirrors."registry.k8s.io"]
endpoint = ["https://registry.aliyuncs.com/google_containers"]
# 2. 重启 containerd
systemctl restart containerd
# 3. 测试拉取镜像
crictl pull nginx:alpine
步骤三:配置 DNS 备用服务器
如果镜像加速器域名无法解析,添加公共 DNS:
# 临时添加
echo "nameserver 8.8.8.8" >> /etc/resolv.conf
echo "nameserver 114.114.114.114" >> /etc/resolv.conf
# 永久配置(Debian 12)
vi /etc/systemd/resolved.conf
# 添加:
DNS=8.8.8.8 114.114.114.114
systemctl restart systemd-resolved
验证
1. 验证 CRI 插件状态
ctr plugin ls | grep cri
# 必须显示 "ok",不能是 "error"
2. 验证镜像拉取
# 在 master 上执行
kubectl run test-nginx --image=nginx:alpine --restart=Never
kubectl get pods -w
# 预期:Pod 状态从 ContainerCreating → Running