Avoid NetFlow Sampling: Here's how

Sampling traffic in an effort to represent the overall traffic pattern in theory is a sound idea. In fact, for the most part, I agree that statistically, sampling can be as accurate as rolling a dice 1296 times and expecting exactly 216 matches for each of the 6 possible outcomes. Well, I doubt it. When counting massive quantities of data and accuracy matters, we need an alternate solution to sampling but, what is it?

A word on sFlow Vs. NetFlow

First lets digress a bit on sFlow. Its misleading name craftily implies that it is somehow similar to NetFlow when in fact, sFlow is not a flow technology at all. It is a packet sampling technology and a counter export utility which compares more to SNMP. It is intended to aid in network monitoring routines however most admins use it for packet sampling. NetFlow and IPFIX are the only true flow technologies. AppFlow, CascadeFlow, J-Flow, NetStream, etc. are all rebrands of NetFlow or IPFIX. The popularity of true flow technologies is growing for several reasons:

NetFlow and IPFIX metering is being done in hardware (e.g. Enterasys and Cisco) or software
They both support packet and flow sampling : Cisco, Juniper, Alcatel/Lucent, Nortel, etc. have implemented them for sampling.
Application awareness through DPI -> packet samples can’t do this
Application Performance monitoring with metrics on jitter, latency, packet loss, etc.
The ability to export machine messages like syslogs
Threat detection by analyzing flow behaviors

Unfortunately, when people try to use sFlow to accurately represent the volume of traffic from a particular IP address they are met with a bit of frustration because sFlow usually understates what they are looking for. Another misleading point that the advocates of sFlow try to make is that sFlow scales better than NetFlow and IPFIX and true to the name ‘sFlow’ this is also misleading. There is a dated but, still accurate post on Networkworld.com titled “Closer look: sFlow better than NetFlow?” that is worth reading.

With the above being said and if you understand its limitations, sFlow is a great option when NetFlow and IPFIX are not available. I like to compare NetFlow Vs. sFlow to eating ice cream. My favorite ice cream is strawberry (NetFlow) but, when the only other flavor available is pistachio (sFlow), I cringe and eat it because I like ice cream.

Vendors continue to migrate to NetFlow and IPFIX

Today, NetFlow and its proposed standard IPFIX include all of the functionality of sFlow and then some. Yes, even packet sampling has been done with NetFlow and IPFIX. When talking with customers, most just don’t like the idea of sampling but, agree that at some point it seems inevitable. Good News: in this post I will demonstrate a flow export trick that will allow administrators to gain 100% accuracy by compromising on only a couple of elements (E.g. source and destination ports) that often go unused in flow reporting especially with the introduction of technologies like NBAR.

“The ability to capture 100% of the packets for traffic analysis is certainly going to give more insight however, it just doesn’t scale and although sampling has its benefits, in many cases it misses the data that customers want to see. By defining a Flexible NetFlow tuple that matches on fewer fields, customers can often avoid sampling.” Thomas Pore, Dir. Of Field Engineering – Plixer

NetFlow Matching Fields

The ‘flexible’ aggregation capabilities of true flow technologies allow the user to specify the key fields they want to receive. In lieu of sampling packets, hardware can create flows based on specified key fields. Below is what many consider a traditional flow matching tuple:

Source interface
Source port
Source IP address
Destination port
Destination IP address
Protocol
ToS / DSCP

Below is an example of the amount of flows exported using all of the above match fields.

Notice above that the pagination at the bottom of the screen shot says 1422. If each page has 10 flows, that is 14,220 entries. How can we reduce the flows exported and still represent 100% of the data?

With Flow technology we can reduce the volume of flows exported if we simplify the matching key fields to:

Source interface
Source IP address
Destination IP address
Protocol

Below is an example of the amount of flows exported for exactly the same traffic with the above smaller tuple:

Notice above that the pagination at the bottom dropped from 1422 to 81 which is nearly a 95% reduction in flow volume while still representing 100% of the data. If you want to reduce it more for NetFlow billing purposes, we can reduce the key fields to:

Source Address
ToS / DSCP

Gartner Group on NetFlow

With the above flexibility of specifying key matching fields, we can avoid sampling data almost entirely. Hardware vendors recognize this and they are investing in NetFlow and IPFIX. True flow abstraction is the technology customers want. Gartner recently stated that flow analysis should be done 80% of the time and that packet capture with probes should be done 20% of the time. Source The experts agree; NetFlow and IPFIX make the best ice cream.

A word on sFlow Vs. NetFlow

Vendors continue to migrate to NetFlow and IPFIX

NetFlow Matching Fields

Gartner Group on NetFlow

Related

Support criteria for SD-WAN vendors

Dark DDoS: Masked Data Exfiltration

Indicator Of Compromise and Detection

Subscribe

Search Plixer