The big boss had a conference call over the weekend. She sent a message to your boss’s boss, which trickled downhill, and eventually made its way to you in the form of a text message.
Somehow you already knew… A text at this hour? Can’t be good…
Slowly you reach in your pocket, click the phone once to wake it up, and staring you right in the face the four words you hate the most.
“The network is slow!”
…it’s going to be one of those days.
I work with a lot of people who are in situations like this, lack of visibility causes issues like this to spiral and linger. Flow collection is a great way to improve visibility and help investigate network issues quickly. Whether people are approaching flow collection for the first time or optimizing their current deployment, the most common question I get is:
“Where should I turn flow data on?”
To answer this question, it’s best to take a step back and think of how data moves through the network. Looking a picture of a network diagram will help me elaborate on this a bit.
In this network (we only have flow enabled on the core switch) let’s look at what visibility we have from the point of view of three network components.
Although you could make assumptions about firewall traffic by collecting flows off the interface that connect the core switch to the firewall, there is a key portion of data that is missing. What is the firewall blocking? This data is easiest to determine by enabling flows from the perspective of the firewall, especially if it is capable of sending denied events like a Palo Alto or Cisco ASA.
Limited visibility of lateral movement between similar network segments may be the biggest gap in terms of security and network segmentation. After a host has been compromised, the next step is typically lateral movement of some sort. By limiting visibility to traffic that only traverses the core, communication within a network segment isn’t possible.
Common questions from people who are designing metadata collection are:
“Do I need to collect flow data from my remote sites?”
“How will collecting data from my remote sites impact bandwidth?”
Generally, I’m all for getting visibility into at least the edge of a remote site, especially if internet traffic isn’t backhauled to corporate. Even if all traffic is backhauled to the corporate network, it’s a lot easier to troubleshoot problems or prepare capacity planning reports if you can look right at their routers.
As far as the impact on bandwidth, it’s usually negligible (1 – 2% of overall bandwidth). But if this is a concern, simply enable a couple of sites and use your flow collector to see what percentage of traffic the flow data is consuming.
This can be a tricky area of the network to get visibility into. Although collecting traffic from the core gives visibility into what is coming in and out of each ESX host, we cannot see ESX-to-ESX or VM-to-VM communication. If you are using virtual distributed switches (VDS), gathering flow from them would provide this visibility.
Network map revisited
If we take the same network and turn on flow data from the devices discussed, then our visibility increases dramatically.
Where should I turn flow data on?
As you can see, the simple answer is “everywhere you can!” but that’s not really the point of this blog. Sure, in a perfect world getting data from everywhere is best, but that may not be practical for a variety of reasons. The best way to approach designing flow collection is understanding that it is all about observation points. Understanding what devices export flow data helps in making these decisions, but understanding what is gained and lost from each network segment is the most important.
If you have any questions on how to configure a device for flow data, or would like help designing your metadata collection strategy, please feel free to reach out to me directly.