Blog :: Network Operations

How to Prevent NetFlow Export Storms

adam

When measuring NetFlow volume, we typically speak in thousands of flows per second. That data is exported over UDP from the network infrastructure to a NetFlow collector. This results in huge streams of data that are proportional in volume to the amount of unique traffic observed by NetFlow-monitoring devices. Collecting all this data without missing packets can be a real challenge, but with some basic tuning and a high performance tool like Scrutinizer, perfect flow collection is possible.

When is NetFlow Data Exported?

Flow data is held in a flow cache while the conversation is still considered active by the networking device handling the traffic. Often, you may find yourself entering lines into a flow configuration like ‘cache timeout active 60’ or ‘cache timeout inactive 15.’ These are commands used to define how a flow is determined to be complete and ready for export to Scrutinizer. If a connection is constantly active, then the ‘active timeout’ value will end the flow after the amount of time specified and start a new record, and the ‘inactive value’ sets a timer for how long after the last packet a flow is exported. This type of cache is good for NetFlow collection, as it spreads the export of NetFlow.

Performance monitoring for flow data can require that you configure a synchronous cache. In this configuration, all flow data is exported at regular intervals regardless of the current duration of any individual flow. While requisite for collecting certain values in performance monitoring, this type of flow cache can be dangerous, as it will dump its full flow cache at once. This can lead to a huge flood or storm of IPFIX data heading through your network toward your netflow collector. This behavior can be controlled by configuring a ‘spread’ interval. The spread interval governs the length of time over which the flow cache will be exported, reducing the burst of the default behavior. In one instance, we observed in the field an environment with hundreds of identically configured devices all exporting in sync with no spread interval configured. Huge volumes of flow data were not collected, as it couldn’t make it through the network. Keep in mind that NetFlow/IPFIX is UDP, so when that packet is dropped, the source does not know to retransmit.

How can I prevent NetFlow Export Storms?

You can enable export spreading by adding the following lines to your performance monitor configuration, prior to applying the monitor to an interface, either directly or via a service policy.

     Conf t
     Flow monitor type performance-monitor [name of monitor]
     Cache type synchronized
     Cache timeout synchronized [seconds until synchronized export] export-spread [amount of time in seconds to spread flows over]
     End

In a basic configuration, it would be safe to use 60sec as a sync export and 15sec as an export-spread. This will still terminate flows every minute and keep Scrutinizer’s atomic flow measurement in 1m intervals, while transmitting the flow data over a 15-second interval to prevent bursting. For more information, you can reference this Cisco document on the topic.

Scrutinizer can consume millions of flows per second when configured in a large clustered environment, and employing strategies like export spreading is critical in maintaining the network performance required to collect these volumes of data. If you have questions about MFSN values in your Scrutinizer instance or are working through a NetFlow deployment, reach out to us, the Plixer Support Team. We work with teams all over the world to help leverage Netflow data in high volume environments.