Sometimes, opportunity comes from necessity. In the past week I was working on a larger deployment that had multiple compliance concerns. One of the specific rules required that we provide a fault tolerant solution. As you might have guessed, this required me to document how Scrutinizer leverages our Flow Replicator to provide the required fault tolerance. So when it came time to write my blog, it seemed logical to take the information that I gathered and share it with our blog community!

What is Fault Tolerance?

The first question that came up was, “What is fault tolerance?” A quick search on the inter-web gives us a WIKI definition: “Fault tolerance is the property that enables a system to continue operating properly in the event of the failure of (or one or more faults within) some of its components.” I don’t know about you, but that’s a bit clinical. When I think about fault tolerance, I tend to think about the word “confidence.”

It’s all about the confidence you need when it’s 2 AM in the morning and you just received an alert of a major network attack. Plus, at the same time you also find out that one of your flow collectors went down for whatever reason. In these situations, I tend to ask myself things like, “Do I have confidence in my granular data?” and “Do I have confidence in my solution?”

With Scrutinizer, fault tolerance is handled in two ways. First, in a simple deployment, you can have a primary and secondary collector. This allows you to have two data sets stored in parallel. If one of the data sets goes down, the other can be used for reporting.

The second aspect of this deployment is the Flow Replicator. The Flow Replicator allows you to forward the UDP stream from multiple sources to one collector. Multiple Replicators can be used to provide fault tolerance.

How does this work in the real world?

As a flow monitoring tool, Scrutinizer provides traffic analysis and reporting, which gives organizations the deepest visibility, accountability, and measurability of network utilization by user, device, and application. This visibility enables companies to easily build reports that help deal with a compliance audit. In a fault tolerance deployment, this is accomplished in two ways.

Detailed network conversation retention

We start at the data feed. In this situation, we would use the Replicator as the one source to send all flows to. Basically, multiple UDP sources are sent to one management IP and then the Replicator sends all that flow data to Scrutinizer. In the Replicator, you can set roles. In this case, you have a primary role and secondary collector role. The primary and secondary collectors will be collecting the same data in parallel. The primary device will be the reporter until the need arises for the secondary to be called upon.

The next questions is how do you provide fault tolerance for the data feed? How does fault tolerance work with the Replicator? Simple: a secondary Flow Replicator can be set up to provide a fault tolerant environment in case the primary Flow Replicator goes offline.

“The secondary Flow Replicator (SFR) actively monitors the state of the primary Flow Replicator (PFR) and frequently synchronizes its database with the settings from the primary. When a Flow Replicator is in secondary mode, it will no longer maintain its current configuration and any current configuration is lost. At any time, the SFR will not allow any configuration changes. With the exception of the role and show commands, all profile and global configuration must be changed on the PFR.”

Our fault-tolerant environment will function with either a virtual or hardware appliance. Our online Replicator manual digs a little deeper on the two methods available for Fault Tolerance Environments, but if you have any questions let me know.

Do your requirements demand better visibility into your network traffic along with the ability to provide a high availability solution, but you don’t know where to start? Why not evaluate our Replicator on your network?

James Dougherty

I have worn many hats in my professional life. Support engineer, developer, network admin and manager are all points on my resume, but the one common thread with all of these jobs is that I enjoy working with people; that is what I do here at Plixer. I make sure that everyone understands our product and can get the most out of it. It's just simple 'no bull' support!

Let me know if you have any questions, I would be happy to help.

- Jimmy D