Blog :: Network Operations

Large Scale NetFlow Collection

When an enterprise is ready to move beyond basic flow collection to large scale NetFlow collection, there are several features that should be taken into consideration. This post outlines several features that separate the vendors who are highly specialized in this industry from the add-on NetFlow module vendors.

Large Scale NetFlow Collection

Collection vs Reporting

It’s one thing to claim the ability to report on over 100K, 200K or 1M+ flows per second and quite another to report on those flows in a timely fashion.  The collection of flow data is the easy part.  Ensuring that nothing was dropped and providing the ability to retrieve specific data fast within the piles of flows received requires a well laid out architecture.  Missed Flow Sequence Numbers (MFSNs) need to be counted from every device sending flows to ensure that flows aren’t missing and indicators need to be triggered when something is awry.

Another important feature is that the right open data store must be used to allow for the storage of both variable length strings and counters.  NetFlow, IPFIX, sFlow, etc. collection is an industry involving terabytes, not gigabytes and that can mean big data stores.  As a result, how the data is stored, sharded and indexed will mean all the difference when reports are run.

Federated vs Cluster

Both have their benefits and both should be supported by the vendor.  Federated is the collection of flows at servers located in different geographically dispersed locations.  Cluster is the collection of flows at one location whereby the saving of data is spread out amongst a farm of servers.  They have their unique advantages and the best architectures incorporate both technologies.  Reporting involves sending the queries across the entire distributed NetFlow collection architecture to find the data requested.  Deduplication and stitching should take place to ensure that once the user narrows in on the source of a problem, the flows only get counted once and the return flow is located even if it was through a different router and sent off to an entirely different flow collector. Read more about Deduplication and Stitching NetFlow.

Centralized Reporting

Clearly users can’t go searching for desired data on each individual server.  One location for searching and reporting on all flows across all servers in a federated or clustered collection environment is paramount.  At the same time, a listing of which collectors had the data and specifically on which routers and switches should only be a click or two away.

Host Indexes

Was that IP address ever on the network?  In the world of packet capture, we can only go back a few days unless the organization has very deep pockets and even then, only a week or two.  When collecting NetFlow or IPFIX coupled with a good index, we can go back years and confirm if an IP address was ever on the network.  However, drilling in to view the flows can still require a mountain of big data.  Indexes allow for instant searches and will stay optimized for speed even when there are billions of rows in a database.  Speed to the data desired is a significant separator between vendors.

Threat Detection

With total visibility across the entire enterprise, it often makes sense to look for anomalies in all corners of the network especially if the traffic is not Internet bound.   For example, low and slow data leakage requires that the system look for data transfers that can take hours or even days.  This requires maintaining state over time and deduplication and stitching is critical here.

The Verizon “2017 Data Breach Investigations Report” stated that “81% of hacking-related breaches leveraged either stolen and/or weak passwords”.  This means that algorithms that baseline and monitor usernames collected from Microsoft Active Directory, Cisco ISE, Forescout CounterACT or other authentication systems help ensure that malware isn’t moving laterally within the internal network.


When an organization is ready to step up from a NetFlow plugin tool to a world class scalable flow collection system, doing some homework will pay off.  Feature sets must be listed and weighted for importance.  Competitive systems should be lined up then tested and compared for responsiveness and feature sets.  Your next investment in a large scale NetFlow collection system needs to be chosen carefully.  Contact us if you need help.