Mitel Forums - The Unofficial Source

Mitel Forums => Mitel MiVoice Business/MCD/3300 => Topic started by: ralph on February 23, 2018, 08:30:23 AM

Title: Systems lose heart beats - IP trunks fail
Post by: ralph on February 23, 2018, 08:30:23 AM: I have an ISS server and a 3300 MCD.
They're both on the same subnet.
They're both connected to the same dataswitch.

The IP trunks fail between the two of them frequently - a few times a day usually.
Since I have more systems in the cluster I can route around the problem.

When trying to figure out the cause, I put a continuous ping from the SSH shell of the ISS server to the 3300.
Since I've done that the trunks haven't failed.

Does anyone have any guesses as to why a ping would make a difference?

Ralph
Title: Re: Systems lose heart beats - IP trunks fail
Post by: v2win on February 23, 2018, 09:28:14 AM: Any interface errors on the switch?

Move to different ports on the switch or a different switch all together?
Title: Re: Systems lose heart beats - IP trunks fail
Post by: ralph on February 23, 2018, 09:36:50 AM: The customer reports there are no errors on either port.

Ralph
Title: Re: Systems lose heart beats - IP trunks fail
Post by: v2win on February 23, 2018, 09:45:51 AM: It kind of sounds like the ARP table or forwarding table in the switch is getting cleared but your ping is keeping it active.
Title: Re: Systems lose heart beats - IP trunks fail
Post by: x-man on February 23, 2018, 09:59:53 AM: Keepalive being overridden by ping thus maintaining the link?
Title: Re: Systems lose heart beats - IP trunks fail
Post by: ralph on February 23, 2018, 11:29:28 AM: Quote from: v2win on February 23, 2018, 09:45:51 AM
It kind of sounds like the ARP table or forwarding table in the switch is getting cleared but your ping is keeping it active.

That was one of my first guesses. Haven't ruled out an arp table issue yet.
Not sure on how to even troubleshoot that.

Ralph
Title: Re: Systems lose heart beats - IP trunks fail
Post by: v2win on February 23, 2018, 12:19:01 PM: What type of switch?
Title: Re: Systems lose heart beats - IP trunks fail
Post by: ralph on February 23, 2018, 01:48:19 PM: Quote from: v2win on February 23, 2018, 12:19:01 PM
What type of switch?

cisco Nexus7000 C7009
Title: Re: Systems lose heart beats - IP trunks fail
Post by: august on February 23, 2018, 04:25:05 PM: I have the same problem with two MxeIII.
Title: Re: Systems lose heart beats - IP trunks fail
Post by: BlackSunshine on February 23, 2018, 07:22:02 PM: Sounds like a NIC issue or port issue. I would wait until after hours and hit both ends with a 1000 byte Ping ....if you get any dropped packets you have hardware issue somewhere...Trust me it works
Title: Re: Systems lose heart beats - IP trunks fail
Post by: ralph on February 24, 2018, 08:51:57 AM: Quote from: BlackSunshine on February 23, 2018, 07:22:02 PM
Sounds like a NIC issue or port issue. I would wait until after hours and hit both ends with a 1000 byte Ping ....if you get any dropped packets you have hardware issue somewhere...Trust me it works

I tried pinging with 2024 byte packets. No errors. No drops.
I checked the interface for errors. None.

We've also discovered that if we put a continuous ping against the 3300, it stops dropping the IP trunk.
I don't understand why that would be unless there is some type of ARP issue. I just don't know hot to TS that further.

Ralph
Title: Re: Systems lose heart beats - IP trunks fail
Post by: x-man on February 24, 2018, 11:25:31 AM: sounds like a sip peer problem to me. Keepalive as already mentioned? But not necessarily a timer issue. What type of registration does it use IP or FQDN? You could wireshark it ans see if something is giving a disconnect to the peer or the 3300 is getting a disconnect..... Is there a definite time (as in length of time) involved? i.e if no calls are being made across the link does it disconnect after X minutes?
Title: Re: Systems lose heart beats - IP trunks fail
Post by: johnp on February 24, 2018, 12:35:23 PM: I had an issue where the MCD within a MSL would not reconnect to remote cx controllers although the MSL could ping them. It eneded up being an icmp redirect issue on the head end switch
Title: Re: Systems lose heart beats - IP trunks fail
Post by: VinceWhirlwind on February 25, 2018, 05:45:00 PM: 1/ ARP - the two devices are in the same subnet, therefore the switch is not using ARP for this communication, just passing frames at Layer2 between them.
--> check the ARP table on the server itself during an event
--> initiate a ping during an event
2/ MAC-address table - if a switch receives a frame with a destination MAC address that is no longer in its MAC address table, it sends the frame out all ports except the one it received it on, so the frame will be successfully transmitted whether it is in the table or not.
--> Double-check the Nexus switch doesn't have some weird feature that disables Layer2 flooding.
--> put a static MAC-address in the switch MAC-address table to rule this out
--> Are there any aggregated links between the two devices?

Sounds more like one of those NICs that goes into power saving mode, although the second ping you describe would seem to rule out a problem on the ISS server. What software version is on the 3300?
Title: Re: Systems lose heart beats - IP trunks fail
Post by: ralph on February 26, 2018, 09:13:04 AM: Quote from: VinceWhirlwind on February 25, 2018, 05:45:00 PM
1/ ARP - the two devices are in the same subnet, therefore the switch is not using ARP for this communication, just passing frames at Layer2 between them.
--> check the ARP table on the server itself during an event
--> initiate a ping during an event
2/ MAC-address table - if a switch receives a frame with a destination MAC address that is no longer in its MAC address table, it sends the frame out all ports except the one it received it on, so the frame will be successfully transmitted whether it is in the table or not.
--> Double-check the Nexus switch doesn't have some weird feature that disables Layer2 flooding.
--> put a static MAC-address in the switch MAC-address table to rule this out
--> Are there any aggregated links between the two devices?

Sounds more like one of those NICs that goes into power saving mode, although the second ping you describe would seem to rule out a problem on the ISS server. What software version is on the 3300?

Thanks for your Vince.
What do you mean by 'aggregated links"?
The primary server is an ISS server running 7.1 PR2.
The controller we're losing connection to is a MXe running 7.2
We have 3 other MCD controllers and one ISS server in the cluster. We only lose connection between these two controllers. These two servers are the must busy in the cluster. The ISS server has ~200 users and the MCD controller has all the trunks - both SIP and T1.

Saturday we turned off the ping that was against the ISS server from a PC but we left the pings going from the ISS server MSL to the MCD controller going. It lost the connection to the MCD controller later on that morning but it's been good since.
Title: Re: Systems lose heart beats - IP trunks fail
Post by: VinceWhirlwind on February 26, 2018, 07:14:27 PM: Quote from: ralph on February 26, 2018, 09:13:04 AM
What do you mean by 'aggregated links"?
That would be where a "link" between two devices is formed using 2 or more physical links, configured to act as a single virtual link using LACP usually. ("Trunks", "Etherchannel", "802.3ad", various names for it depending on vendor).
I have (rarely) had comms affected by a bug in link aggregation whereby after some random amount of time the aggregated link stops passing *some* traffic. I've had this on Nortel as well as VMWare.
To rule out such a bug, disable all physical links in the aggregated link except for one and monitor for a period

Quote from: ralph on February 26, 2018, 09:13:04 AM
We only lose connection between these two controllers. These two servers are the must busy in the cluster. The ISS server has ~200 users and the MCD controller has all the trunks - both SIP and T1.

Doesn't sound exceptionally busy, however that's all a matter of perspective from the point of view of the switch - that switch could have some kind of per-protocol rate-limiting configured on its in-path interfaces and it is blocking traffic after it reaches the configured limit. Bit of a stretch really, but you never know, those Nexus switches can be a bit odd, I haven't used them for years.
Title: Re: Systems lose heart beats - IP trunks fail
Post by: ralph on February 27, 2018, 07:41:11 AM: Thanks for the replies.
I took another look at the MCD controller yesterday. I can see that it lost connections to all the other controllers on Saturday. No work was being done on the network.
They just happen to be scheduled for an upgrade in just a little over 2 weeks.
The hardware will be replaced at that time.
We're going to leave a ping against that controller until then.

Ralph