《2020网络TRACK故障排查.docx》由会员分享,可在线阅读,更多相关《2020网络TRACK故障排查.docx(9页珍藏版)》请在优知文库上搜索。
1、TRACK故障排查一、开始定位故障的思路是:首先观察TraCk项状态是否稳定。如果TraCk项状态频繁切换,则检查网络质量、CPU占用率以及探测报文发送频率/超时时间的设置;如果TraCk项稳定在NegatiVe状态,则检查互连网络状态、BFD/NQA的配置以及探测报文的收发状况;如果TraCk项稳定在其他状态,则检查联动的配置。TraCk项的状1、查看TraCk状态是否稳定Administrator2015-12-1310:02:12当TraCk项与应用模块进行联动时,:错误:;与VRRP联动,TraCk项状态为InVaIid时,与该TraCk项关联的VRlD优先级保持不变J与策略路由联动时
2、,TraCk项状态为POStiVe/Invalid时,与该TraCk项关联K策出各M策,山川,秋:二必ir,描!与静态H3Cdisplaytrack1如果TnTrackID:1与应用j.D.Status:PositiveDuration:OdaysOhours1minutes22secondsNotificationdelay:PositiveO,NegativeO(inseconds)Referenceobject:BFDsession:Packettype:EchoInterface:Vlan-interfacelRemoteIP:1.1.1.2531.ocalIP:1.1.1.2522、
3、检查网络连通性使用探测报文的源、目的地址互Ping时是否有回应超时;(2)检查互连端口是否有错误报文的增长;检查链路状况是否有震荡,是否有端口频繁Up/Down;(4)如果是光介质,检查光衰减是否在正常范围之内。3、检查CPU占用率一般情况下,CPU利用率超过90%时可能会导致探测报文上送CPU丢包,这时需要排查CPU高的原因。命令:displaycpu-usage例如:查看设备的CPU占用率。displaycpu-usageSlot1CPUusage:92%inlast5seconds90%inlast1minute81%inlast5minutes4、检查探测报文发送频率/超时时间根据网络
4、的流量大小、设备的性能、链路质量状况等因素,合理配置探测报文的发送间隔以及失效时间。(1)修改BFD探测报文的发送间隔和允许失效的最大个数:命令:bfdmin-echo-receive-intervalvaluebfddetect-multipliervalue例如:配置bfdecho报文发送间隔为1000ms,最多允许6个报文失效。displaybfdsessionverboseTotalsessionnumber:1Upsessionnumber:1Initmode:ActiveIPv4sessionworkingunderEchoLocalDiscr:35mode:SourceIP:1.
5、1.1.252DestinationIP:1.1.1.253SessionState:UpInterface:Vlan-interfacelMinRecvInter:1000msActTransInter:1000msActDetectInter:6000msRunningUpfor:00:10:47ConnectType:DirectProtocol:TrackDiagInfo:NoDiagnosticBoardNum:2(2)修改NQA探测报文的发送间隔和允许连续的探测失败次数:命令:frequencyintervalreactionitem-numberchecked-elementpr
6、obe-failthresholdtypeconsecutiveconsecutive-occurrencesaction-typetrigger-only例如:配置NQA探测报文发送间隔为1500ms,最多允许连续5个报文失效。H3C-11-admin-test-icj-echodisplaythis#typeicp-echofrequency1500reaction1checked-elementprobe-failthreshold-typeconsecutive5action-typetrigger-only5、查看TraCk状态是否为NegatiVe通过命令查看TraCk项状态,确定
7、稳定在NegatiVe状态,还是稳定在PoSitiVe或者InVaIid状态。命令:displaytracktrack-entry-number例如:通过命令查看TraCk1的当前状态。H3Cdisplaytrack1TrackID:1Status:NegativeDuration:0days0hours0minutes2secondsNotificationdelay:Positive0,Negative0(inseconds)Referenceobject:NQAentry:admintestReaction:16、检查到达监测目的IP互通状况如果互Ping监测的目的IP地址不通,主要检查
8、以下内容:检查接口是否Up,链路是否故障;(2)检查二层接口是否允许探测报文通过;(3)端口下是否开启了认证、ACL过滤等安全功能。7、检查BFD/NQA配置(1)配置TraCk与BFD联动时,VRRP备份组的虚拟IP地址不能作为BFD会话探测的LoCalIP和ReInOteIP;(2)对于BFDEcho模式,必须BFDEChO报文的源IP地址,这一IP地址不属于该设备任何一个接口所在网段;(3)对于NQA,检查是否配置了测试组进行测试的启动时间和持续时间,并检查系统时间是否在启动时间到启动时间+持续时间范围之内;(4)当TraCk与BFD联动时,检查BFD会话状态是否正常;命令:displa
9、ybfdsessionverbose例如:通过命令查看BFD会话的当前状态。displaybfdsessionverboseTotalSessionNum:1InitMode:ActiveIPv4SessionWorkingUnderEchoLocalDiscr:36Mode:SourceIP:1.1.1.252DestinationIP:1.1.1.253SessionState:Interface:Vlan-interfacelMinRecvInter:400msActTransInter:400msActDetectInter:2000msRunningUpfor:00:00:09Con
10、nectType:DirectProtocol:TrackDiagInfo:NoDiagnosticBoardNum:1当TraCk与NQA联动时,检查NQA测试组的监测结果。命令:displaynqaresultadmin-nameoperation-tag例如:通过命令多次查看,如果红色部分数字全为0,则说明NQA检测失败。H3CdisplaynqaresultadmintestNQAentryQdminain,tagtest)testresults:DestinationIPaddress:1.1.1.253Sendoperationtiroes:1Receiveresponsetime
11、s:1!lin三ax/Averageroundtriptime:9/9/9SqUarLSUn)ofroundtriptime:811.astsucceededprobetine:2012-11-0817:20:11.88、检查探测报文收发状况对于BFD探测报文,匹配源地址为BFDEcho-Source-IP,源、目的端口为3785的UDP报文;对于NQA探测报文,根据配置的源、目的IP地址及报文类型匹配报文。命令:displayqospolicyinterfaceinterface-typeinterface-number例如:通过匹配指定的源、目的IP、协议类型为ICMP的NQA探测报文,对
12、这一类型的报文在接口下做流量统计,观察报文的收发情况是否正常。H3Cdisplaycurrent-configurationaclnumber3000rule0permitic三psourceLLL2520destinationLLL2530#H3CdisplayqospolicyinterfaceGigitEthernet1/0/1Interface:Gigabi!Ethernet1/0/1Direction:InbcMndPolicy:accClassifier:accOperator:ANDRule(s):If-matchacl3000Behavior:accAccountingEnab
13、le:26152(Packets)注意:在封装探测报文设备的出接口做流量统计,统计不到探测报文数据为正常现象。(2)命令:displaynqastatistics例如:通过命令查看NQA报文统计信息,检查是否收到或发出了探测报文。H3CdisplayngastatisticsNQAentry(adminadmin,tagtest)teststatistics:NO.:1DestinationIPaddress:1.1.1.253Starttime:2012-11-0813:05:02.21.ifetime:1351secondsSendoperationtimes:7045Receiveres
14、ponsetimes:6620Min/Max/Averageroundtriptime:1/51/5Square-Sumofroundtriptime:274288Extendedresults:Packetlossintest:6%Failuresduetotimeout:425Failuresduetodisconnect:0Reactionstatistics:IndexCheckedElementThresholdTypeCheckedNumOver-thresho1dNumV*W*WWW2probe-failconsecutive70444259、检查引用NQA配置配置TraCk项时
15、,如果引用了错误的NQA测试组,则该TraCk项的状态为Invalido例如:TraCk项错误的引用了NQA测试组admintestReaction1导致TraCk项状态为InVaIidoH3Cdisplaytrack1TrackID:1Status:InvalidDuration:0days0hours1minutes9secondsNotificationdelay:Positive0,Negative0(inseconds)Referenceobject:NQAentry:addntestReaction:1H3CdisplayreactioncountersadmintestNQAentry(adninadmin,tagtest)reactioncounters:IndexCheckedElemen