Author Topic: MiCollab/MBG - how to read tug.log data (Read 5685 times)

lundah · « **on:** April 20, 2022, 05:20:19 PM »

Anyone have a guide on how to read the info in the tug.log file on an MBG/MiCollab server? Trying to chase down some packet loss and audio issues, KMS tells me all the info I'm looking for is in the tug.log file, but I can't find any info on how to actually extract the information I'm looking for.

dilkie · « **Reply #1 on:** April 20, 2022, 05:46:59 PM »

seriously? There's a KMS article telling you to look in the tug logs? wow.. brave.

there's no details because the logs are meant for R&D developers and they are constantly changing.

for voice quality issues, there are two different pieces in the tug.log to help you.
one is the stats dump at the end of the call and that same dump occurs periodically throughout the call (every minute or so).
It looks like this:

[DLG 1s] RTP/AVP (audio): 192.168.5.61:28000 Payload(IS/SS): 0/0 FrmSz(IS/SS): 30/20
[DLG 1s] Started(SS/IS): 0/0 Enabled Dur(SS/IS): 17/17 Stream Dur(SS/IS): 17/17
[DLG 1s] SS Tx:560 Rx:834 BwRx:90.2K BwTx:81.7K Lost:0(0%) OOS:0(0%) ClkSync:+0.0013
[DLG 1s] Jtr Max:20 Avg:0 BinSize:10ms Histo:831,1,2,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
[DLG 1s] Loss Max:0 BinSize:1 Histo:0,0,0,0,0,0,0,0,0,0
[DLG 1s] SS JitterQ, Len: 0 LWM: 0 HWM: 2 AvgQ: 1.990 Udr: 2 Ovr: 0 Shrinks: 1
[DLG 1s] IS Tx:835 Rx:561 BwRx:79.1K BwTx:86.4K Lost:0(0%) OOS:0(0%) ClkSync:+0.0036
[DLG 1s] Jtr Max:30 Avg:0 BinSize:10ms Histo:556,2,2,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
[DLG 1s] Loss Max:0 BinSize:1 Histo:0,0,0,0,0,0,0,0,0,0
[DLG 1s] IS JitterQ, Len: 1 LWM: 1 HWM: 2 AvgQ: 2.056 Udr: 0 Ovr: 0 Shrinks: 2
[DLG 1s] RTCP SS Rtt:0.0ms CNAME:10009@192.168.5.211 RxJtr:0.0ms RxLoss:0 TxJtr:0.0ms TxLoss:0
[DLG 1s] RTCP IS Rtt:0.0ms CNAME:08:00:0F:8C:69:94_IS@mbg-61.dilkie.com RxJtr:0.0ms RxLoss:0 TxJtr:0.0ms TxLoss:0
[DLG 1s] SRTP Cipher SS:AES_CM_128_HMAC_SHA1_80 IS:NULL
[DLG 1s] Set->SS(192.168.51.2:44236->29822)
[DLG 1s] SS->Set(29822->192.168.51.2:44236)
[DLG 1s] Icp->IS(192.168.5.61:28000->20130)
[DLG 1s] IS->Icp(20130->192.168.5.61:28000)

disected it looks like:
[DLG 1s] RTP/AVP (audio): 192.168.5.61:28000 Payload(IS/SS): 0/0 FrmSz(IS/SS): 30/20
DLG is the sip dialog ID (not the call ID), media type, rtp payload value (0 is uLaw) and the packet framsize (IS - on the LAN side of mbg, SS is the WAN side facing the client)

[DLG 1s] Started(SS/IS): 0/0 Enabled Dur(SS/IS): 17/17 Stream Dur(SS/IS): 17/17
Started is 0-(off) 1-(on) Duration is in seconds, stream duration is the duration of the actual stream (rather than signaling)

[DLG 1s] SS Tx:560 Rx:834 BwRx:90.2K BwTx:81.7K Lost:0(0%) OOS:0(0%) ClkSync:+0.0013
SS (SetSide/WAN) Tx Rx packets, meansured bandwidth, running loss packet count, running Out Of Sequence count, and the drift between the senders clock and ours.

[DLG 1s] Jtr Max:20 Avg:0 BinSize:10ms Histo:831,1,2,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
Jitter stats, max observed, Average jitter observed and a histogram of count values showing the distribution of jitter (needed to observe transients)

[DLG 1s] Loss Max:0 BinSize:1 Histo:0,0,0,0,0,0,0,0,0,0
Packet loss stats, max consecutive loss, histogram of consecutive loss

[DLG 1s] SS JitterQ, Len: 0 LWM: 0 HWM: 2 AvgQ: 1.990 Udr: 2 Ovr: 0 Shrinks: 1
Internal Adaptive Jitter buffer Queue stats, current length of the Q, Low Water Mark of the que in the previous second, Hight Water Mark of same interval, Queue underruns and Queue Overruns (very bad), shrinks is how often we've had to forward a packet early to reduce delay caused by the queue.

[DLG 1s] IS Tx:835 Rx:561 BwRx:79.1K BwTx:86.4K Lost:0(0%) OOS:0(0%) ClkSync:+0.0036
[DLG 1s] Jtr Max:30 Avg:0 BinSize:10ms Histo:556,2,2,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
[DLG 1s] Loss Max:0 BinSize:1 Histo:0,0,0,0,0,0,0,0,0,0
[DLG 1s] IS JitterQ, Len: 1 LWM: 1 HWM: 2 AvgQ: 2.056 Udr: 0 Ovr: 0 Shrinks: 2
These are all the same but the stats facing the LAN (Icp Side).

[DLG 1s] RTCP SS Rtt:0.0ms CNAME:10009@192.168.5.211 RxJtr:0.0ms RxLoss:0 TxJtr:0.0ms TxLoss:0
[DLG 1s] RTCP IS Rtt:0.0ms CNAME:08:00:0F:8C:69:94_IS@mbg-61.dilkie.com RxJtr:0.0ms RxLoss:0 TxJtr:0.0ms TxLoss:0
the last RTCP stats sent/

[DLG 1s] SRTP Cipher SS:AES_CM_128_HMAC_SHA1_80 IS:NULL
Encryption (SRTP) parameters for SS(facing wan) and IS(facing LAN) (in this case the media leg over the internet is SRTP which the local LAN leg is not encrypted)

[DLG 1s] Set->SS(192.168.51.2:44236->29822)
[DLG 1s] SS->Set(29822->192.168.51.2:44236)
[DLG 1s] Icp->IS(192.168.5.61:28000->20130)
[DLG 1s] IS->Icp(20130->192.168.5.61:28000)
These are the details of the traffic ip addresses on all sides.

the other stats you'll see are RTCP send and received and they show up every 5 seconds or so during a call.

[DLG 1.0] SS: RxRTCP(l=192.168.51.60:29823,r=192.168.51.2:34921,s=84,e=7/10) Count=356/56960 RTT=9ms Jtr=0ms CulmLoss=0 IntervalLoss=0.0%
[DLG 1.0] IS: RxRTCP(l=192.168.5.60:20131,r=192.168.5.61:28001,s=104,e=0/0) Count=164/39360 RTT=0ms Jtr=2ms CulmLoss=0 IntervalLoss=0.0%

IS/SS as before, the media leg facing the Icp Side (lan) and Set Side (WAN).

RxRTCP is what we *received* from the far end, so these reflect the quality of the RTP we are sending.

We also send RTCP, ie:

[DLG 1.0] SS: TxRTCP(l=192.168.51.60:29823,r=192.168.51.2:34921,s=92,e=7/10) Count=162/40500 Jtr=1ms CulmLoss=0 IntervalLoss=0.0%

Which reflects the quality of the rtp we are receiving.

this little bit in the rtp logs:

l=192.168.51.60:29823,r=192.168.51.2:34921,s=92,e=7/10

l - local address
r - remote address
s - size of rtp packet
e - encryption algorithm/HMAC length ( 7 is AES128, 10 is HMAC80 ... 10 bytes )

hope that helps.

Dogbreath · « **Reply #2 on:** May 16, 2022, 06:30:30 AM »

I find the following useful when trying to work out how widespread a packet loss issue is - find non-zero packet loss reports:

grep 'Lost:[1-9]\d*' -A 11 /var/log/tug/tug.log

dilkie · « **Reply #3 on:** May 16, 2022, 07:58:46 AM »

FWIW, I have a few scripts i threw together to aid in finding those needles in the haystack, especially on busy sites.

the find_bad_tx/rx rtcp scripts will dump out either tx (what we "see") or rx ("what the remote sees from us") rtcp stats that exceed a certain percentage interval loss (in units of 10%). this dumps the far end ip addresses.

find_user_from_ip looks in the logs to try and find the sip username for an ip address

find_all_bad.. uses the above to dump the user names of folks having rtp(network) issues (again, above a specified percentage).

they all work on a local 'tug.log', which I usually make from all the tug logs concatenated together.

I had a site that was reporting audio problems but only periodically and these scripts helped determine that it was affecting a large number of clients over the internet. Usually, that means some sort of local (to mbg) network issue. If a number of users from the same ip (like an office) are affected, it generally points to an issue with the local network on the remote side.

in this particular nasty case, it turned out that mpls had been configured with a fixed b/w and rtp wasn't being tagged correctly, with the result that periodic call recording dumps over mpls were bumping rtp out... a long running tna helped isolated that as it graphed a nice periodic loss bump. Mind you, tracking down the cause of the load wasn't trivial, but it wasn't my problem either

i'm not the greatest script writer, so forgive...

{find_bad_rxrtcp_ip.sh}
#!/bin/bash
if [ "$1" == "" ] ; then
   echo supply percentage, ie 10%, 20%, etc
   exit -1
fi
per=$(( $1 / 10 ))
cat tug.log | grep "SS: RxRTCP" | grep -E "IntervalLoss=[$per-9][0-9]." | sed -e 's/^.*f=//;s/:.*)/ /' | sort | sed -e 's/ .*//' | uniq -c | sort -n

{find_bad_txrtcp_ip.sh}
#!/bin/bash
if [ "$1" == "" ] ; then
   echo supply percentage, ie 10%, 20%, etc
   exit -1
fi
per=$(( $1 / 10 ))
cat tug.log | grep "SS: TxRTCP" | grep -E "IntervalLoss=[$per-9][0-9]." | sed -e 's/^.*t=//;s/:.*)/ /' | sort | sed -e 's/ .*//' | uniq -c | sort -n

{find_user_from_ip.sh}
#!/bin/bash
if [ "$1" == "" ] ; then
echo supply ip address
exit -1
fi
cat tug.log | grep -B 1 -m 1 "Addr:$1"

{find_all_bad_txrtcp_users.sh}
#!/bin/bash
if [ "$1" == "" ] ; then
echo supply percentage loss, ie 10, 20, 30
exit -1
fi
for i in `./find_bad_txrtcp_ip.sh $1 | awk '{print $2;}'` ; do ./find_user_from_ip.sh $i; echo ""; done

sarond · « **Reply #4 on:** May 17, 2022, 07:41:56 PM »

Quote from: dilkie on May 16, 2022, 07:58:46 AM

FWIW, I have a few scripts i threw together to aid in finding those needles in the haystack, especially on busy sites.

Definitely worth it.

Thanks.

Mitel Forums - The Unofficial Source

News:

Author Topic: MiCollab/MBG - how to read tug.log data (Read 5685 times)

lundah

MiCollab/MBG - how to read tug.log data

dilkie

Re: MiCollab/MBG - how to read tug.log data

Dogbreath

Re: MiCollab/MBG - how to read tug.log data

dilkie

Re: MiCollab/MBG - how to read tug.log data

sarond

Re: MiCollab/MBG - how to read tug.log data