《Kubernetes 常见故障排查和处理.docx》由会员分享,可在线阅读,更多相关《Kubernetes 常见故障排查和处理.docx(29页珍藏版)》请在优知文库上搜索。
1、排查命令和方式Iskubectlgetpods2、kubectldescribepodsmy-pod3、kubectllogsmy-pod4、kubectlexecmy-pod-it/bin/bash后进入容器排直5、瓷看宿主机日志文件varlogpods*varloqcontainers*1POd故障排查处理1、查看方式:主要通过以下命令枪百kubectlgetpods-nnamespaceca11co-kte-conrollers-5df986M4c-wqv5xcalico-node6cIjbcallco-node-9959xcal1co-no0e-c6xcalico-notenr9hdc
2、oredns-bccdc95cf-hf976coreApi39rv9r-sting-k8s67.9-proxy-jcjplkube-proxy-p9rnlkube-proxy-srwdkub-schdul9r.tsting-k8367.cchcnqkubernetes-da5hboard-Sdc4cS4b55-a8ntbV.一_.G一EAI/1/1/1/1/1/1/1/1/1/1/1/1“/1/IRioiiiiiiiiiiiiiAnJnnnn11nMnnnnnnnnmMnn11nnnnnnnsRRURURUltuRURRURURURURURURURURESTARTS10.10.1;10.9.1
3、0.10.9.10.10.9.10.10.9.10.10.10.1;10.10.2410.9.10.10.9.10.10.9.10.10.9.10.10.9.10.10.9.10.10.9.10.10.10.24在上图status列,我们可以看到pod容器的状态2.查看STATUS状态以下是statuslist:Running,Succeeded,Waiting,ContainerCreating,Failed,Pending,Terminating,unknown,Crash1.oopBackOff,ErrImagePuII,ImagePuIIBackOffstatus定义说明:Runnin
4、g:pod运行中(容器内进程重启中也可能是Running状态)Succeeded:Pod成功退出,不再自动启动Waiting:等待ContainerCreating:创建容器中Pll1.不了国外傥像源,或者镜像太大导致PU1.I.超时CNl网络错误,无法间置Pod网络,无法分配IP地址Failed:失败,此POd里至少有一个容器未正常停止Pending:挂起,此POd因为网络或其他原因,如正在PUIlimageunknown:未知,无法获取Pod状态,可能是Xode连接不正常Terminating:POd未正常执行co三and,需要删除Pod重建Crash1.oopBackOffKuberne
5、tes正尽力后动Pod,但是个或多个容器已经挂了,或者正被删除.ErrImagcPul1:镜像错误,pull镜像失败ImagePu11BackOff:镜像名称配置错误或者镜像的密钥配置错误如出现异常状态,可看看pod日志内容kubectldescribepod容器名nnamespace查看State状态3、堂看Conditions状态conditions:TypeStatusInitializedTrueReadyTrueContainersReadyTruePodScheduledTrueTrue表示成功,False表示失败Initializedpod容器初始化完毕Readypod可正常提供
6、服务ContainersReady容器可正常提供服务PodScheduIedpod正在调度中,有合适的节点就会绑定,并更新到etcdUnschedulablepod不能调度,没有找到合适的节点如有False状态显示查看Events信息Events:TypeReasonAgewarningunhealthy4mis(x3O2812over35d)*c6c-c,Cg-g-1.T4-Jcd.TPQCCReason显示UnheaIthy异常,仔细肯看后面的报错信息,有针对性修复4、EVentS报错信息整理如下:(1)Failedtopullimagexx:Error:imagexxxnotfound原
7、因:提示拉取镜像失败,找不到镜像找到可以访问的镜像地址以及正确的tag,并修改镜像仓库未login,需要IoginK8s没有pull镜像的权限,需要开通权限再pull(2)WarningFaiIedSyncErrorsyncingpod,skipping:failedtowithRunContainerError:GenerateRunContainerOptions:XXXnotfound,原因:此podXXX的name在namespace下找不到,解决方式:需要重启pod解决,kubectlreplace-force-fpod.yamlWarningFaiIedSyncErrorsynci
8、ngpod,skipping:failedtoStartContainer-forXXX*withRunContainerError:GenerateRunContainerOptions:configmapsXX*notfound原因:NameSPaCe下找不到XXX命名的COnfigM叩,解决方式:*新新建ConfigMapkubectlcreate-fConfigmaP.yaml(4)WarningFaiIedMountMountVoIume.SetUpfailedforvolume*kubernetes.iosecret*(spec.Name:XXXsecref)podwith:sec
9、rets“XXXsecretnotfound原因:缺失Secret解决方式:需要新建Secretkubectlcreatesecretdocker-registrysecret名-docker-server=仓库url-docker-username=-docker-password=-nnamespace以下内容,如修改yaml文件后,执行kubectlapply-fpod.yaml奥启pod才生效(5)NormalKillingKillingcontainerwithdockeridXXX:podXXXcontainer,XXX,isunhealthy,itwillbekilledandr
10、e-created.容器的活跃度探测失败,Kubernetes正在k川问题容器原因:探测不正确,health检直的UR1.不正确,或者应用未响应修改yaml文件内health检查的periodSeconds等数值,调大- ZbinZsh- -C- touchtaphealthy;sleep3;11三-rftmphealthy;sleep6IivenessProbe:exec:COMand:-cat-t三phealthyInitidlDeldvSeconds:lperiodSeconds:5(6)WarningFaiIedCreateErrorcreating:podsXXXX,isforbid
11、den:maximummemoryusageperPodisXXX1butrequestisXXX1maximummemoryusageperContainerisXXX,butrequestisXXX.原因:K8s内存限制配额小于pod使用的大小,导致报错解决方式:调大k8s内存配额,或者减小pod的内存大小解决containers:-name:constramts-mem-image:nginxresources:limits:requests:memory:SOOMipod(XXX)failedtofitinanynodefitfailureonnode(XXX):Insufficien
12、tcpuEvents:FlrstSeen1.astSeenCountFroeSubObjectPathTypeReason3n3n1default-scheduler)MamingFAlledSchedulingPOfit*lluronnod.g1.:jSMHHa*ntfitfailureonnode.:InsufficientcpufitonnodeInsuHicientcpu原因:node没有足够的CPU供调用,解决方式:需要减少POd内CPU的使用数量,yaml内修改spec:limits:-max:ICPiK-2memory:IGi(8)FaiIedMountUnabletomount
13、volumesforpod,XXX:timeoutexpiredwaitingforvolumestoattach/mountforpod,XXX*fai.listofunattached/unmountedVoIumes=XXXFaiIedSyncErrorsyncingpod,skipping:timeoutexpiredwaitingforvolumestoattach/mountforpodXXXVfai.listofUnattaChed/unmountedvolumes=XX原因:podXXX挂载卷失败解决方式:需要直看下是否建了卷,volumemontPath目录是否正确用yaml
14、文件建VolUme并mountspec:conaners:一xage:nginx:1.12na三e:zest-containerv0IureM0un3:-nountPat:Zda;anare:test-voluaevolunes:-nane:est-volumehostrath:pa:etcdefaulrtype:Directory(9)FaiIedMountFailedtoattachvolumeXXX,onnode,XXXwith:GCEpersistentdisknotfound:diskName=XXdiskzone=解决方式:检音persistentdisk是否正确创建Yaml文件创建persistent方式如下apiVersion:vlZind:PersistenxVolizEemetadata:name:task-pv-volumelabels:type:localspec:(10)error:errorvalidatingXXX.yam:errorvalidatingdata:foundinvalidfieldr