We spent some time awhile back building some host baseline logic for a future Flow Analytics release. If you’re not familiar with the term “host baseline”, it is a history of an IP node’s historical traffic behavior. Items that go into this communication baseline are behaviors observed in a specified time frame.
NetFlow Host Baseline : Generation
Here’s how it generally works: you specify the hosts/subnets you want to baseline and sometimes they are placed into zones or groups of IP addresses. After this is defined, a baseline is periodically run to determine the following for each individual IP:
- the number of other IP nodes a host communicates with
- the number of unique connections made
- the number of protocols used and amount of each
- the number of applications used and the amount of each
- the overall total amount of bytes
- the overall total amount of packets
- times of day the host is active
- network behavior during business hours, after hours, holidays, weekends
- etc. etc. etc.
Once you have the baselines, the collector will compare incoming flows (i.e. current traffic behavior) of each of the specified IP addresses to the baseline. Hosts that violate an item in their individual baseline will see an increase in their ‘unique index'(TM) or ‘source index'(TM). Sometimes the destination address involved with the odd traffic or violation sees its ‘destination index'(TM) or ‘target index'(TM) increase. When the overall source or destination index hits a threshold, alarms can be triggered.
I simplified the above a great deal and I should mention that individual baselines can be tweaked and new baselines can be configured to have a rolling percent impact on the existing baseline. There’s also the topic of the zone or IP group baseline and there is something called a baseline precedence. Other variables can be considered such as the number of days to include in the baseline and how long to save the baselines. Believe me, the list of considerations can be daunting.
We also investigated multipliers depending on the type of violation as this would impact the overall source or destination index. We implemented an “Order of Violation Check” to specify which violations were tested first and whether or not to continue testing for other violations with the same flow. Time went by and we continued to learn. It wasn’t perfect.
Imagine trying to do all of the checking above while receiving enormous amounts of NetFlow. Well, we tested it and we saw what probably every other network behavior analysis vendor sees: some good threat detection together with a healthy amount of false positives. Ultimately, we decided to focus on Flow Analytics(TM) .
NOTE: if you watch the above video on Flow Analytics, jump to minute 2:00 and pay attention to “internet threats”.
I think that everyone understands that false positives are unavoidable with network behavior analysis and NetFlow baselines; here’s why. The way we use our computer changes.
NetFlow Host Baselines : No Additional Value
Your computer may have been off-line during the baseline or maybe you get into work early 1-2 times per month and this doesn’t make it into the baseline. A few days later you decide to VPN in after hours and because it isn’t in your baseline, your index rises. Sure it only increases the index, but eventually with enough false positives, it can trigger an event and ultimately cause an alarm. And guess what: loading new applications on your computer can cause issues as well. We found it just wasn’t worth it however, we haven’t given up either. With the above said, I also believe a talented engineer that is willing to put in the hours can make most systems work. How much time do you have during the day? Is this time well spent?
As Michael W. Lucas the author of Network Flow Analysis once said “I work at an ISP, Anything is normal” He also said “You have to know what kind of traffic is your usual activity to understand when things are really going wrong and what is wrong.”
NetFlow Host Baseline : False Positives
Our Flow Analytics solution catches the odd traffic patterns and unwanted network traffic on your network. False positives tend to be minimal and the setup is intuitive. More importantly, maintenance is simple and the amount of ongoing effort to maintain a system is manageable. However, we are determined to improve our threat detection strategies. Because of this, we looked for alternatives to host baselines. What we learned is that a new trend in the network security industry seems to be a smarter approach toward efforts at improving IT security. The focus is on “host reputation” and we have supported this strategy longer than any other NetFlow vendor.
IP Host Reputation is Everything
Many ISPs are now paying attention to where a hosts traffic is going to and coming from based on host reputation. Flow Analytics has done this for years. How it works: Regularly updated databases are maintained and referenced in realtime. An IDS leverages the database to ensure that local end systems aren’t communicating with known bad hosts. Companies such as OISF are building next generation intrusion detection and prevention engines which include reputation lookups as part of their deliverable. An example of this type of system is Suricata which is funded by the Department of Homeland Security’s Directorate for Science and Technology.
If you look at the problems people are solving in our NetFlow case studies, you will learn about the types of issues being resolved with good NetFlow reporting and Analysis tools like Flow analytics. Flow analytics deduplicates flows from your routers and checks for dozens of anomalous network traffic behavior patterns. Host reputation is only one of dozens of checks performed in near real-time.
Uncovering and resolving these issues takes good logic and appropriate action. Always test and compare.
‘Flow Analytics’, ‘unique index’, ‘source index’, ‘destination index’, ‘target index’ are trademarks and the property of Plixer International, Inc.