Author Topic: Phones reboot daily - HeartbeatRecvTimeoutHandler.cpp;273 (Read 4162 times)

Madikus · « **on:** January 15, 2013, 08:06:47 PM »

Hi,

We are in the middle of a new implementation of a 3300. Everything is up and working (Mitel did the entire install) with one exception. We haven't gone live yet because every phone in any of our locations connected to our data center over MPLS experiences a heartbeat timeout randomly, at least once a day (this does not happen to devices connected inside the data center). This causes the phones to reboot, drops the network port and disconnects the connected PC. The phone immediately reboots and we are good for the rest of the day. Mitel has been working on this for 3 months without a resolution. They continue to blame our Sonicwall, our network configs and our virtual hardware. The Sonicwall is our central gateway and has all the subinterfaces built for our VLANs. SonicWall voip engineers have looked at our configs from top to bottom and insist it isn't the SonicWall causing the issue.

The phone reboots happen at randoms times from random locations (we have 15 remote sites, all connected via Netsolutions MPLS). The only thing consistent is that it happens to every location at least once per day. Sometimes more. Other than this random reboot, the phones work perfectly. All locations have roughly 15 phones. They are connected using Mitel supplied HP Procurve PoE switches. The power consumption is nominal. The handsets are 5330's. DHCP is handed out via each locations Adtran MPLS router. We've disabled ICMP redirects on a few of the locations with no success.

Here's an example description:

Code: [Select]

ICP has lost contact with (10.0.122.4 (08-00-0F-6D-38-16)), eAtlas cluster pool free: 6144 and low water mark: 5972
We have tried giving a couple locations priority on the Sonicwall for all traffic bound for the controller at the data center without success. We've increased TCP and UDP timeouts for all traffic to the controllers without success. We've isolated all controller, mpls and gateway nics to the same physical switch with no success.

Any suggestions? I'm out of patience waiting for them to figure it out...

marcolive · « **Reply #1 on:** January 15, 2013, 10:32:47 PM »

Wow! Strange thing!

We have some customers running Sonciwall routers/firewalls with no problems. Did you tried to disable security services (antivirus/spam/instrusion detection, etc.)?

Are those phones connected to a vMCD? Mitel sent us a bulletin (#12-5191-00292) in December about Large Receive Offload (LRO) which is impacting TCP connections (and MiNET protocol uses TCP).

Don't know if you have root access to the MSL Server running vMCD, but you could try that command :

[root@vmcd-253 ~]# ethtool -S eth0 | grep LRO

The output should look like this :
LRO pkts rx: 0
LRO byte rx: 0

If you have "large" numbers on those fields, your problem could be related to that. I know that Mitel released a new OVA to address that issue and it's available on a tech support FTP site (and not Mitel Online). Also, MCD 6 released today should be OK.

What is you DHCP lease time?

Is there any kind of error messages displayed on phones before the reboot?

Maybe already done, but a great way to see what happen is to activate port mirroring on a phone with a laptop connected in PC port. A wireshark trace could help.

Madikus · « **Reply #2 on:** January 15, 2013, 11:26:03 PM »

Thanks for the reply!

We did have the LRO adjusted per the bulletin. DHCP lease timer is set to 7 days. Don't know that the phones display an error. All reports have been it just goes black, then turns back on and boots normally. Our Mitel tech did port mirroring and submitted his finding to "level 2 support" - that was a week ago and supposedly he still hasn't heard back.

One thing we do see is occasional TCP and UDP drops from a phone to the vMCD. It's a pretty small amount, I'm talking 1 drop per maybe 8000+ forwarded. There doesn't seem to be any pattern. The drops usually coincide with the phone rebooting, but not always. The odd thing is that the packet capture via the sonicwall tags the drops as being a firewall rule, however SonicWall says there aren't any rules that would be affecting it. We have a LAN>LAN rule for all MiNet traffic set as highest priority, but that has had no effect (and we see all the traffic on the rule stats, so we know it's working.) Plus, if the phones are good for 23 hours and 59 minutes a day, it doesn't make sense that a rule would randomly stop working for a few seconds to kill these phones, and it doesn't make sense that it would happen every day, at random hours to every phone across all 15 of our sites. We have maybe 100 phones total coming in over MPLS, each on a 6 meg circuit and coming in to a DS3 at our data center.

I have not touched the IPS settings, and I did ask SonicWall about it, but they insisted I leave it as-is. I've now had 2 different SonicWall engineers look at all our settings and both confirmed everything was correct.

We haven't disabled

Mitel Forums - The Unofficial Source

News:

Author Topic: Phones reboot daily - HeartbeatRecvTimeoutHandler.cpp;273 (Read 4162 times)

Madikus

Phones reboot daily - HeartbeatRecvTimeoutHandler.cpp;273

marcolive

Re: Phones reboot daily - HeartbeatRecvTimeoutHandler.cpp;273

Madikus

Re: Phones reboot daily - HeartbeatRecvTimeoutHandler.cpp;273