The network incident response system is a subset of the overall network management effort. Specifically, it addresses the word ‘reactive’ which has plagued the network management space almost since inception.
The word ‘incident’ by itself generally requires taking action or in the world of network management ‘reaction’. But, being reactive doesn’t have to mean that we are not prepared. Network incidents will occur on all networks and being prepared for them helps management understand:
- The scope of the problem
- Who is impacted
- Who needs to get involved
- How long the incident could take to resolve
In order to ready ourselves for the barrage of events that can occur on any given day, we need some documentation. There has always been documentation on things like network connectivity but, the urgency to keep it up to date has been motivated largely by the malware incident response industry. Although a network incident doesn’t have to involve malware, the reactive process to deal with the issue (e.g. slow application performance) does require some of the same resources. For a helpdesk individual to assist with any type of issue involving the network, they need to be prepared and being ready means documentation. The malware incident response industry encourages two very important areas of preparedness:
- Fire drills
Documentation: If you don’t have a clearly laid out play book for dealing with a particular problem, your team could end up with individuals going off and working on different areas with no unified effort. People need to know what their responsibilities are. Documentation that outlines how to respond to specific incidences is a good proactive measure.
Fire Drills: These should be performed to test the action plans that have been put in place. During the drill, monitor how well people communicate. Are people clear on what their role is and do they stick to their responsibilities? Are the hand offs taking place at appropriate times?
Mean Time To Know and Mean Time To Repair
In order to have a good and timely reaction, the team needs resources. The Mean Time To Know (MTTK) and the Mean Time To Repair (MTTR) are two indicators that we want to maintain low numbers on and to do this, an incident response solution must provide the details needed to gain insight to the problem. Because the problem could ingress anywhere on the network and reach across the entire circumference of the infrastructure, support engineers need enterprise wide visibility. The only feasible technologies available today that can provide this total insight are NetFlow and IPFIX.
Documentation should include the list of devices that can export flows and a collection system should be put in place that can scale to the needs of the security and network teams. A good flow reporting and filtering system is an important part of the network incident response system as it can help minimize the impact of an infection and improve overall awareness of the event.
To get started with establishing your own incident response system you can:
- Prepare by creating an incident response plan
- Form an incident response team
Remember, your network incident response system can often service both the network and security teams.