NetScaler SD-WAN 9.3.2 Bug?

Earlier this January, together with Citrix Germany we were out at a customer in order to do the setup of a NetScaler SD-WAN PoC. All went well and worked as expected, until we found some … well … “strange behaviour” with a CB410-SE appliance.

The configuration was nothing special so far: existing MPLS connections with QoS applied, extended and supported by new ADSL lines. The data center CB5100-SE appliance uses virtual inline mode (BGP enabled) and the CB410-SE branch office appliance is configured in inline mode using fail to wire I/F groups. We configured WAN link templates for some typical MPLS lines used at the customer, assigned this to the corresponding WAN links and configured several VLANs available at the specific branch site. We activated Proxy ARP as we wanted to help the infrastructure when the MPLS Routers becomes unresponsive. In Detail, we configured the branch VLAN IDs 1, 2, 3 … up to 8, deployed the configuration and so we were able to test communication to and from the data center successfully. All good so far.

Finally, we stumbled upon another VLAN (ID 251) which is used for network management traffic, e.g. switch and router configuration. No Problem, we too configured this VLAN, activated Proxy ARP, deployed the configuration and tried to establish communication within the new VLAN 251. BAM! WTF?! At exactly the time, the first VLAN 251 packet flows, the Fail-to-Wire I/Fs went down with a “click!” (front panel LED went red immediately) and no traffic was able to reach the data center. As soon VLAN 251 communication stopped, the I/F Group went up (another “click!” / green LED) and everything was okay again. If we tried to establish another communication originating from VLAN 251 (the destination networks and protocols used didn’t matter at all) the same procedure occured again and again: I/F group down (“click!”), no traffic at all – even already established ICA sessions or VoIP calls stopped working at the spot. Even some VLAN 251 ICMP packets were able to pull the whole branch down. Hm … “ping of death”, indeed.

Long story short, my Citrix mate finally “fixed” the issue by removing the reference to the defined WAN link template from the CB410-SE and configured the MPLS WAN link from scratch. We deployed the configuration to the environment and were able to test ALL VLAN traffic successfully, including VLAN 251. BTW, we weren’t able to get to the core of the problem, I think it’s the combination of VLAN 251 and an MPLS WAN link template assigned to the CB410-SE’s links.

Well, we then took this behaviour to Anaheim these days as several SD-WAN developers and product managers are available at Summit 2018. I’m very curious about the findings related to this “unexpected behaviour” … Maybe the solution is to do the upgrade to SD-WAN v10 expected soon? To tell the truth, I don’t know …

-jochen

Update on Jan, 31: please see part 2.

Leave a Reply