Startseite » Upgrading Cisco IOS to 16.12.06 made customer unreachable from external

Upgrading Cisco IOS to 16.12.06 made customer unreachable from external

by enricojost

Heyas,

a customer of ours mentioned that their CUBE randomly rebooted, which made a branch office unreachable from external for a short time. The customer asked to investigate this behavior.

Logging into the CUBE and checking the logs / the flash, I saw a lot of crash dumps in relation to lowmem:

The CUBE was running 16.12.04 at a time when 16.12.05 was recommended by Cisco and 16.12.06 as the latest available version. I thought about upgrading to the new major release of 17.x but as my experiences with this one are very little, I decided to only upgrade the CUBE within the 16.x release.

It was the first time I decided to not follow the Cisco recommendations for a stable release and to just jump on the „latest & greatest“ release.

A huge mistake as I saw a bit later 🙂

I updated the CUBE and the customer reported that incoming calls wouldn’t work anymore. I checked the log files and indeed the CUBE denied incoming INVITEs with a 503 Service Unavailable – because from CUBE perspective the dial-peers to CUCM were down.

I checked the CUCM side:
– all SIP trunks up and fine.

I checked the CUBE side:
– all the keepalive voice classes towards CUCM went down.

cube#sh voice class sip-options-keepalive 102
Voice class sip-options-keepalive: 102           AdminStat: Up
  Description:
  Transport: tcp          Sip Profiles: 0
  Interval(seconds) Up: 5                 Down: 10
  Retry: 3

   Peer Tag      Server Group    OOD SessID      OOD Stat IfIndex
   --------      ------------    ----------      -------- -------
   102           1                               Busy     17

   Server Group: 1                OOD Stat: Busy
    OOD SessID   OOD Stat
    ----------   --------
    5            Busy
    6            Busy

  OOD SessID: 5                   OOD Stat: Busy
   Target: ipv4:10.11.5.21
   Transport: tcp                 Sip Profiles: 0

  OOD SessID: 6                   OOD Stat: Busy
   Target: ipv4:10.11.5.22
   Transport: tcp                 Sip Profiles: 0

Checking the logs I found the OPTIONS Pings sent from CUBE towards CUCM.
And I also found the 200 OK responses from the target CUCM servers.

Sent: 
OPTIONS sip:10.11.5.21:5060 SIP/2.0
Via: SIP/2.0/TCP 10.34.5.10:5060;branch=z9hG4bKFF6A435
From: <sip:10.34.5.10>;tag=3F0B5CA-BF8
To: <sip:10.11.5.21>
Date: Wed, 03 Nov 2021 12:22:09 GMT
Call-ID: [email protected]
User-Agent: Cisco-SIPGateway/IOS-16.12.6
Max-Forwards: 70
CSeq: 101 OPTIONS
Contact: <sip:10.34.5.10:5060;transport=tcp>
Content-Length: 0

[...]

Nov  3 13:22:09.832: //66500/000000000000/SIP/Msg/ccsipDisplayMsg:
Received: 
SIP/2.0 200 OK
Via: SIP/2.0/TCP 10.34.5.10:5060;branch=z9hG4bKFF6A435
From: <sip:10.34.5.10>;tag=3F0B5CA-BF8
To: <sip:10.11.5.21>;tag=1542737194
Date: Wed, 03 Nov 2021 12:22:09 GMT
Call-ID: [email protected]
Server: Cisco-CUCM10.5
CSeq: 101 OPTIONS
Allow: INVITE, OPTIONS, INFO, BYE, CANCEL, ACK, PRACK, UPDATE, REFER, SUBSCRIBE, NOTIFY
Content-Length: 0

As I didn’t want to downgrade the CUBE during the daily business of the customer and we needed a quick solution, we just removed the keepalive profiles from all dial-peers towards CUCM.

And see there:
dial-peers up and inbound calling working again…
In a few days, I am going to downgrade this CUBE to the recommended version of 16.12.05 , let’s see if I can add the OPTIONS pings back to my dial-peers then.

In the meanwhile, I opened a TAC case to verify this bug in the latest IOS version. First question from them was, why I didn’t use the recommended version.
I was thinking about a response like „because I didn’t expect an upgrade to break OPTIONS mechanism“ 🙂

Fun fact about this one:

  • OPTIONS pings towards the PSTN-Provider were responded with a 200 Alive message – these were processed fine and these dial-peers still have their keepalive profiles. Without any issues.

Cheers,

You may also like