Blog :: Network Operations

SD WAN Problems with Visibility

sd wan

We have blogged on topics surrounding the SD WAN industry a few times. In one post, we provided a list of SD WAN vendors and highlighted the companies that are exporting NetFlow and IPFIX. We even built reports for the proprietary exports from both Cisco IWAN and Citrix CloudBridge, but we still see major SD WAN problems when it comes to the visibility available into the SD WAN performance.

In the SD WAN flow exports that we built support for, we didn’t see any elements that included details on important events such as when a flow is rerouted. Our customers investing in SD WAN architectures would like to be able to report on:

  1. the flows on a specific interface that were impacted
  2. the interface that the impacted flows were moved to
  3. the exact times the flows were rerouted
  4. any changes in the flows that were made
  5. the event ID that occurred, which explains what caused individual flows to get redirected
  6. a list of all the different event IDs and their meaning (a metadata template)

In order to address the above, we need an additional flow template with the following elements:

Typical Flow TemplateRerouted Flow Template
Ingress InterfacePost Ingress Interface
Egress InterfacePost Egress Interface
Source IP AddressSource IP Address
Destination IP AddressDestination IP Address
ProtocolProtocol
Source PortSource Port
Destination PortDestination Port
Start TimePost Start Time
End TimePost End Time
NexthopPost Nexthop
DSCPPost DSCP
OctetDeltaCountPost OctetDeltaCount
Etc.
“Reason Rerouted” ID*

*”Reason Rerouted” ID points to another template with a text description of the event ID.

The value of the Rerouted Flow Template in SD WAN environments:

  • The Post Ingress/Egress Interface is needed because the customer wants to know which interface on the router the flow was moved to.
  • The Post Start/End Times are needed to understand when the flows were rerouted. The delta between the End Time and the Post Start Time could provide a metric for how long the conversion took for each flow.
  • The Post Nexthop, DSCP, OctetDeltaCount, and other values provide additional insightful details.

Without the above, network administrators can’t confirm that the SD WAN architecture is working when an event occurs. “When something happens, I don’t know,” one customer told me. “I can’t confirm which applications were impacted or what users. I can’t tell you how much traffic was impacted, when or even why.” In short, you are completely blind.

Without the above details in the flow export, network admins have no insight into changes in the SD WAN and must trust that the vendors’ hardware is optimizing traffic and recovering from congestion and outages. Customers need to be able to confirm that the SD WAN architecture is reacting correctly to events and they should be able to figure out what traffic was impacted. Contact our team if you would like to learn more.