Author Topic: Systems lose heart beats - IP trunks fail  (Read 2167 times)

Offline ralph

  • Mitel Forums Admin
  • Hero Member
  • *****
  • Posts: 5741
  • Country: us
  • Karma: +468/-0
  • Published Author: http://amzn.to/2dcYSY5
    • View Profile
Systems lose heart beats - IP trunks fail
« on: February 23, 2018, 08:30:23 AM »
I have an ISS server and a 3300 MCD.
They're both on the same subnet.
They're both connected to the same dataswitch.

The IP trunks fail between the two of them frequently - a few times a day usually.
Since I have more systems in the cluster I can route around the problem.

When trying to figure out the cause, I put a continuous ping from the SSH shell of the ISS server to the 3300.
Since I've done that the trunks haven't failed.

Does anyone have any guesses as to why a ping would make a difference?

Ralph


Offline v2win

  • Global Moderator
  • Hero Member
  • *****
  • Posts: 628
  • Country: us
  • Karma: +11/-0
    • View Profile
Re: Systems lose heart beats - IP trunks fail
« Reply #1 on: February 23, 2018, 09:28:14 AM »
Any interface errors on the switch?

Move to different ports on the switch or a different switch all together?

Offline ralph

  • Mitel Forums Admin
  • Hero Member
  • *****
  • Posts: 5741
  • Country: us
  • Karma: +468/-0
  • Published Author: http://amzn.to/2dcYSY5
    • View Profile
Re: Systems lose heart beats - IP trunks fail
« Reply #2 on: February 23, 2018, 09:36:50 AM »
The customer reports there are no errors on either port.

Ralph

Offline v2win

  • Global Moderator
  • Hero Member
  • *****
  • Posts: 628
  • Country: us
  • Karma: +11/-0
    • View Profile
Re: Systems lose heart beats - IP trunks fail
« Reply #3 on: February 23, 2018, 09:45:51 AM »
It kind of sounds like the ARP table or forwarding table in the switch is getting cleared but your ping is keeping it active.

Offline x-man

  • Hero Member
  • *****
  • Posts: 1129
  • Country: gb
  • Karma: +25/-0
    • View Profile
Re: Systems lose heart beats - IP trunks fail
« Reply #4 on: February 23, 2018, 09:59:53 AM »
Keepalive being overridden by ping thus maintaining the link?

Offline ralph

  • Mitel Forums Admin
  • Hero Member
  • *****
  • Posts: 5741
  • Country: us
  • Karma: +468/-0
  • Published Author: http://amzn.to/2dcYSY5
    • View Profile
Re: Systems lose heart beats - IP trunks fail
« Reply #5 on: February 23, 2018, 11:29:28 AM »
It kind of sounds like the ARP table or forwarding table in the switch is getting cleared but your ping is keeping it active.

That was one of my first guesses.  Haven't ruled out an arp table issue yet.
Not sure on how to even troubleshoot that.

Ralph

Offline v2win

  • Global Moderator
  • Hero Member
  • *****
  • Posts: 628
  • Country: us
  • Karma: +11/-0
    • View Profile
Re: Systems lose heart beats - IP trunks fail
« Reply #6 on: February 23, 2018, 12:19:01 PM »
What type of switch?

Offline ralph

  • Mitel Forums Admin
  • Hero Member
  • *****
  • Posts: 5741
  • Country: us
  • Karma: +468/-0
  • Published Author: http://amzn.to/2dcYSY5
    • View Profile
Re: Systems lose heart beats - IP trunks fail
« Reply #7 on: February 23, 2018, 01:48:19 PM »
What type of switch?

cisco Nexus7000 C7009

Offline august

  • Sr. Member
  • ****
  • Posts: 258
  • Country: ca
  • Karma: +9/-0
    • View Profile
Re: Systems lose heart beats - IP trunks fail
« Reply #8 on: February 23, 2018, 04:25:05 PM »
I have the same problem with two MxeIII. 

Offline BlackSunshine

  • Full Member
  • ***
  • Posts: 190
  • Country: us
  • Karma: +1/-0
    • View Profile
Re: Systems lose heart beats - IP trunks fail
« Reply #9 on: February 23, 2018, 07:22:02 PM »
Sounds like a NIC issue or port issue.  I would wait until after hours and hit both ends with a 1000 byte Ping ....if you get any dropped packets you have hardware issue somewhere...Trust me it works

Offline ralph

  • Mitel Forums Admin
  • Hero Member
  • *****
  • Posts: 5741
  • Country: us
  • Karma: +468/-0
  • Published Author: http://amzn.to/2dcYSY5
    • View Profile
Re: Systems lose heart beats - IP trunks fail
« Reply #10 on: February 24, 2018, 08:51:57 AM »
Sounds like a NIC issue or port issue.  I would wait until after hours and hit both ends with a 1000 byte Ping ....if you get any dropped packets you have hardware issue somewhere...Trust me it works

I tried pinging with 2024 byte packets.  No errors.  No drops.
I checked the interface for errors.  None.

We've also discovered that if we put a continuous ping against the 3300, it stops dropping the IP trunk.
I don't understand why that would be unless there is some type of ARP issue.  I just don't know hot to TS that further.

Ralph

Offline x-man

  • Hero Member
  • *****
  • Posts: 1129
  • Country: gb
  • Karma: +25/-0
    • View Profile
Re: Systems lose heart beats - IP trunks fail
« Reply #11 on: February 24, 2018, 11:25:31 AM »
sounds like a sip peer problem to me. Keepalive as already mentioned? But not necessarily a timer issue. What type of registration does it use IP or FQDN? You could wireshark it ans see if something is giving a disconnect to the peer or the 3300 is getting a disconnect..... Is there a definite time (as in length of time) involved? i.e if no calls are being made across the link does it disconnect after X minutes?

Offline johnp

  • Global Moderator
  • Hero Member
  • *****
  • Posts: 2183
  • Country: us
  • Karma: +66/-0
    • View Profile
Re: Systems lose heart beats - IP trunks fail
« Reply #12 on: February 24, 2018, 12:35:23 PM »
I had an issue where the MCD within a MSL would not reconnect to remote cx controllers although the MSL could ping them. It eneded up being an icmp redirect issue on the head end switch

Offline VinceWhirlwind

  • Hero Member
  • *****
  • Posts: 899
  • Country: au
  • Karma: +31/-0
    • View Profile
Re: Systems lose heart beats - IP trunks fail
« Reply #13 on: February 25, 2018, 05:45:00 PM »
1/ ARP - the two devices are in the same subnet, therefore the switch is not using ARP for this communication, just passing frames at Layer2 between them.
--> check the ARP table on the server itself during an event
--> initiate a ping during an event
2/ MAC-address table - if a switch receives a frame with a destination MAC address that is no longer in its MAC address table, it sends the frame out all ports except the one it received it on, so the frame will be successfully transmitted whether it is in the table or not.
--> Double-check the Nexus switch doesn't have some weird feature that disables Layer2 flooding.
--> put a static MAC-address in the switch MAC-address table to rule this out
-->  Are there any aggregated links between the two devices?
 
Sounds more like one of those NICs that goes into power saving mode, although the second ping you describe would seem to rule out a problem on the ISS server. What software version is on the 3300?

Offline ralph

  • Mitel Forums Admin
  • Hero Member
  • *****
  • Posts: 5741
  • Country: us
  • Karma: +468/-0
  • Published Author: http://amzn.to/2dcYSY5
    • View Profile
Re: Systems lose heart beats - IP trunks fail
« Reply #14 on: February 26, 2018, 09:13:04 AM »
1/ ARP - the two devices are in the same subnet, therefore the switch is not using ARP for this communication, just passing frames at Layer2 between them.
--> check the ARP table on the server itself during an event
--> initiate a ping during an event
2/ MAC-address table - if a switch receives a frame with a destination MAC address that is no longer in its MAC address table, it sends the frame out all ports except the one it received it on, so the frame will be successfully transmitted whether it is in the table or not.
--> Double-check the Nexus switch doesn't have some weird feature that disables Layer2 flooding.
--> put a static MAC-address in the switch MAC-address table to rule this out
-->  Are there any aggregated links between the two devices?
 
Sounds more like one of those NICs that goes into power saving mode, although the second ping you describe would seem to rule out a problem on the ISS server. What software version is on the 3300?

Thanks for your Vince.
What do you mean by 'aggregated links"?
The primary server is an ISS server running 7.1 PR2.
The controller we're losing connection to is a MXe running 7.2
We have 3 other MCD controllers and one ISS server in the cluster.  We only lose connection between these two controllers.  These two servers are the must busy in the cluster.  The ISS server has ~200 users and the MCD controller has all the trunks - both SIP and T1.

Saturday we turned off the ping that was against the ISS server from a PC but we left the pings going from the ISS server MSL to the MCD controller going.   It lost the connection to the MCD controller later on that morning but it's been good since. 



 

Sitemap 1 2 3 4 5 6 7 8 9 10