Nagios is a scalable, flexible. and powerful Network Monitoring solution that pairs well with graphing tools such as Cacti or MRTG. In this post I’ll share templates and configuration files to get you started with monitoring Cisco routers, switches, and security devices.
Once you have a few devices configured using the templates in this post, you’ll be able to quickly scale out your deployment using Python, shell scripts, or worst case scenario – a text editor.
Introduction
This post doesn’t include instructions on how to install Nagios, but instead assumes that you followed the Nagios Quick Start Guide and have a working installation complete with plugins and a functioning web interface.
The templates defined in the steps below will allow you to monitor the following:
- Cisco IOS Routers and Switches
- System UpTime
- 5 Minute CPU Average
- BGP Peer Sessions
- Interface Operational Status (Port-channel, Vlan, Physical)
- SSH Availability
- IP SLA ICMP Echo Round Trip Time (RTT)
- IP SLA ICMP Echo Failures
- Packet Loss and RTT to a Layer 3 Interface
- Cisco Nexus Switches
- System Uptime
- 5 Minute CPU Average
- Interface Operational Status (Port-channel, Vlan, Physical)
- SSH Availability
- Packet Loss and RTT to Management Interface
- Cisco ASA Appliances and Firewall Service Modules (FWSM)
- System UpTime
- 5 Minute CPU Average
- Interface Operational Status (Physical)
- SSH Availability
- Packet Loss and RTT to a Layer 3 Interface
- Total Current Sessions
Overview
We will be following these 6 steps to get Nagios monitoring your network:
- Download the check_bgp.pl plugin (Optional)
- Add Command Definitions
- Create hostgroups.cfg (Optional)
- Create hosts.cfg
- Define Services to Monitor
- Define Interfaces to Monitor
I also included an extra section showing how you can use awk to help generate your service definitions for those who don’t know how to script.
Deployment Example
My Nagios deployment is monitoring over 900 “services” (interfaces, ports, services, sessions, etc) on 175 network devices in one data center.
Here are a few screen shots of how my deployment looks in Nagios:
Host Group Overview
Host Details for a Core Router
Service Group Overview
Since a single instance of Nagios is monitoring all 175 network devices in this data center I am using Host Groups and Service Groups (both optional) to help organize things.
Instructions
1. (Optional) Download the check_bgp.pl plugin
If you’re running BGP and want to monitor your peer sessions, I recommend using the check_bgp.pl plugin from the Nagios Exchange. Download it to your plugins directory (mine is /usr/lib/nagios/plugins) with wget and make it executable.
root@nag001:/usr/lib/nagios/plugins# wget -O check_bgp.pl "http://exchange.nagios.org/components/com_mtree/attachment.php?link_id=1555&cf_id=30" --2013-01-30 23:21:26-- http://exchange.nagios.org/components/com_mtree/attachment.php?link_id=1555&cf_id=30 Resolving exchange.nagios.org... 66.228.58.94 Connecting to exchange.nagios.org|66.228.58.94|:80... connected. HTTP request sent, awaiting response... 200 OK Length: 10219 (10.0K) [application/octet-stream] Saving to: `check_bgp.pl' 100%[=======================================================================================>] 10,219 --.-K/s in 0.07s 2013-01-30 23:21:27 (153 KB/s) - `check_bgp.pl' saved [10219/10219] root@nag001:/usr/lib/nagios/plugins# chmod +x check_bgp.pl
2. Add Command Definitions
All of our monitoring requirements are handled by three plugins: check_snmp, check_bgp.pl, and check_tcp. In order to use them for our service checks we must first create custom Command Definitions in the commands.cfg configuration file.
To make these commands as flexible as possible we will include variables that allow arguments to be passed from our hosts.cfg and services.cfg files.
Let’s take the check_snmp plugin as an example. This plugin accepts over 20 different options as explained on its definition page: http://nagiosplugins.org/man/check_snmp
Usage: check_snmp -H <ip_address> -o <OID> [-w warn_range] [-c crit_range] [-C community] [-s string] [-r regex] [-R regexi] [-t timeout] [-e retries] [-l label] [-u units] [-p port-number] [-d delimiter] [-D output-delimiter] [-m miblist] [-P snmp version] [-L seclevel] [-U secname] [-a authproto] [-A authpasswd] [-x privproto] [-X privpasswd]
When we use check_snmp to monitor interfaces we will use the -r and -l options. When we use it to monitor IP SLA’s and NAT Translations we will use the -w and -c options. We will also use different OID’s. In order to accommodate all of this with just one command definition we will use three custom variables called $ARG1$, $ARG2, and $ARG3$:
check_snmp -H $HOSTADDRESS$ -C public -o $ARG1$ $ARG2$ $ARG3$
I’ll show you how it works later in this post. For now, just define the custom commands in your commands.cfg file as shown below.
Note: Change “-C public” to match your snmp community name. Also use the path to your plugin directory, which may be different than mine.
## Poll a device using the OID specified as $ARG1$ and apply options specified in $ARG2$ and $ARG3$ define command{ command_name check_snmp_router command_line /usr/lib/nagios/plugins/check_snmp -H $HOSTADDRESS$ -C public -o $ARG1$ $ARG2$ $ARG3$ } ## Call the check_bgp.pl perl script and send the IP Address of the BGP Peer specified in $ARG1$ by using the -p option define command{ command_name check_cisco_bgp command_line /usr/lib/nagios/plugins/check_bgp.pl -H $HOSTADDRESS$ -C public -p $ARG1$ } ## Telnet to port 22 for each host, expect (-e) to see "SSH" somewhere in the output, then quit (-q) by sending the string "exit" define command { command_name check_cisco_ssh command_line /usr/lib/nagios/plugins/check_tcp -H $HOSTADDRESS$ -p 22 -e SSH -q exit }
3. (Optional) Create hostgroups.cfg
Decide how you want to logically group your devices and then add the definitions to a file named hostgroups.cfg that you create in your /etc/nagios3/conf.d directory.
Nagios will display your host groups in alphabetical order so if you want to influence how things are displayed you can just include numerals in their names.
define hostgroup { hostgroup_name Routers } define hostgroup { hostgroup_name Switches } define hostgroup { hostgroup_name Firewalls } define hostgroup { hostgroup_name VPN }
4. Create hosts.cfg
Create a file called hosts.cfg in /etc/nagios3/conf.d and optionally assign them to the host groups you created in Step 3.
In this example I am using the default template “generic host” – you’ll want to develop a standard template of your own once you are more comfortable with Nagios.
define host { host_name core1 alias core1.domain.com address 10.0.0.1 use generic-host hostgroups Routers } define host { host_name fw1 alias fw1.domain.com address 10.0.0.254 use generic-host hostgroups Firewall_and_VPN }
5. Define Services to Monitor
Here are the service definitions to place in your services.cfg file, complete with full OIDs. Just copy, paste, and modify (if necessary).
I will be using the generic-service template for each of these. Replace any OIDs that end with an “X” with the proper unique SNMP identifier from your device. These identifiers can be found with snmpwalk commands.
Note: My code snippets are line wrapping – be sure to include everything on one line in your configuration file.
For Use with All Cisco Devices
; Report the System Uptime define service { use generic-service hosts * service_description System UpTime check_interval 5 ; This overrides what is specified in the check_command check_snmp_args!1.3.6.1.2.1.1.3.0 } ; Check latency and packet loss - specify the warning and critical levels for each define service { use generic-service hosts * service_description PING check_interval 5 check_command check_ping!200.0,20%!400.0,40% ; warning and critical levels for latency, packet loss% } ; Verify that an SSH connection can be established define service { use generic-service hosts * service_description SSH check_interval 5 check_command check_cisco_ssh }
For Use with Cisco Nexus Devices Only
; Cisco Nexus CPU Avg define service { use generic-service hostgroup Routers service_description 5 Min CPU Average check_interval 5 check_command check_snmp_args!.1.3.6.1.4.1.9.9.109.1.1.1.1.5.1!-l \"5 Minute CPU \% \" -w 50 -c 80 ; -w is warning level, -c is critical }
For Use with Cisco Routers and Switches
; Cisco IOS CPU Avg <pre>define service { use generic-service hostgroup Routers,Switches,Firewalls,VPN service_description 5 Min CPU Average check_interval 5 check_command check_snmp_router!.1.3.6.1.4.1.9.9.109.1.1.1.1.5.1!-l \"5 Minute CPU \% \" -w 50 -c 80 servicegroups Memory_and_CPU } ; Monitor BGP peer session to ISP's define service { use generic-service hosts core1 service_description BGP Session: ISP 1 check_interval 5 check_command check_cisco_bgp!x.x.x.x ; Insert your BGP peer address here } ; Monitor the IP SLA ICMP Echo Round Trip Time define service { use generic-service hosts core1 service_description IP SLA RTT for ISP 1 check_interval 1 check_command check_snmp_args!.1.3.6.1.4.1.9.9.42.1.2.10.1.1.X!-l "Last RTT (ms)" -w 1000 -c 2000 ; where X is your IP SLA operation number } ; Verify our IP SLA ICMP Echo command was successful define service { use generic-service hosts core1 service_description IP SLA PING Success for ISP 1 check_interval 1 check_command check_snmp_router!.1.3.6.1.4.1.9.9.42.1.2.10.1.2.X!-r 1!-l "IP SLA Ping Success" ; where X is your IP SLA operation number }
For Use with Cisco ASA and FWSM’s
; Total Sessions define service { use generic-service hostgroup 3.FW_VPN service_description Total_Sessions check_interval 5 check_command check_snmp_router!.1.3.6.1.4.1.9.9.147.1.2.2.2.1.5.40.6 -l \"Total Current Sessions\"-w 20000 -c 30000 servicegroups FW_and_VPN }
6. Defining Interfaces to Monitor
First, identify which interfaces you want to monitor then do a snmpwalk of the mib-2.interfaces OID to find out how they are identified by SNMP. Here’s an example:
root@nag001:/etc/nagios3/conf.d# snmpwalk -v2c -c public 10.0.0.1 mib-2.interfaces . IF-MIB::ifDescr.1 = STRING: GigabitEthernet0/0 IF-MIB::ifDescr.2 = STRING: GigabitEthernet0/1 . IF-MIB::ifDescr.5 = STRING: GigabitEthernet0/0.2 IF-MIB::ifDescr.7 = STRING: GigabitEthernet0/1.10 IF-MIB::ifDescr.8 = STRING: GigabitEthernet0/1.11 IF-MIB::ifDescr.9 = STRING: GigabitEthernet0/1.12
The numeral after “IF-MIB::ifDescr.” is what identifies each of your interfaces. So “1” is G0/0, “2” is G0/1″, and so on. Now you can use this information to monitor your interfaces by using the following service template:
define service { use generic-service hosts your-host-name service_description your-interface-description check_command check_snmp_router!.1.3.6.1.2.1.2.2.1.8.X!-r 1!-l ifOperStatus }
So using the example above, we would have the following directives for monitoring whether our interfaces are UP or DOWN (note that the X in the OIDs have been replaced):
define service { use generic-service hosts core1 service_description GigabitEthernet0/0 check_command check_snmp_router!.1.3.6.1.2.1.2.2.1.8.1!-r 1!-l ifOperStatus } define service { use generic-service hosts core1 service_description GigabitEthernet0/1 check_command check_snmp_router!.1.3.6.1.2.1.2.2.1.8.2!-r 1!-l ifOperStatus } define service { use generic-service hosts core1 service_description GigabitEthernet0/0.2 check_command check_snmp_router!.1.3.6.1.2.1.2.2.1.8.5!-r 1!-l ifOperStatus } define service { use generic-service hosts core1 service_description GigabitEthernet0/1.10 check_command check_snmp_router!.1.3.6.1.2.1.2.2.1.8.7!-r 1!-l ifOperStatus } define service { use generic-service hosts core1 service_description GigabitEthernet0/1.11 check_command check_snmp_router!.1.3.6.1.2.1.2.2.1.8.8!-r 1!-l ifOperStatus } define service { use generic-service hosts core1 service_description GigabitEthernet0/1.12 check_command check_snmp_router!.1.3.6.1.2.1.2.2.1.8.9!-r 1!-l ifOperStatus }
Those are all the pieces you need for monitoring your network. Now it’s just a matter of adding hosts, their associated services, and restarting Nagios!
Miscellaneous: Using Awk to Create Service Definitions
Configuring Nagios can be a daunting task if you have hundreds of interfaces on numerous devices to monitor. To help make the process easier, you can (and should) use either a Python or shell script to add your interfaces.
If you don’t have any scripting experience, here is a simple procedure using awk to help make the process easier.
First, save the details from your snmpwalk to a local text file called interfaces.txt:
root@nag001:/etc/nagios3/conf.d# snmpwalk -v2c -c public 10.0.0.1 mib-2.interfaces >> interfaces.txt
Next, replace all instances of “IF-MIB::ifDescr” with “.1.3.6.1.2.1.2.2.1.8” using your text editor of choice.
Your file should now look something like this:
.1.3.6.1.2.1.2.2.1.8.67 = STRING: GigabitEthernet9/11 .1.3.6.1.2.1.2.2.1.8.68 = STRING: GigabitEthernet9/12 .1.3.6.1.2.1.2.2.1.8.252 = STRING: TenGigabitEthernet7/1 .1.3.6.1.2.1.2.2.1.8.253 = STRING: TenGigabitEthernet7/2 .1.3.6.1.2.1.2.2.1.8.91 = STRING: Port-channel1 .1.3.6.1.2.1.2.2.1.8.92 = STRING: Port-channel2 .1.3.6.1.2.1.2.2.1.8.94 = STRING: Port-channel3 .1.3.6.1.2.1.2.2.1.8.95 = STRING: Port-channel4 .1.3.6.1.2.1.2.2.1.8.96 = STRING: Port-channel5 .1.3.6.1.2.1.2.2.1.8.97 = STRING: Port-channel6 .1.3.6.1.2.1.2.2.1.8.98 = STRING: Port-channel7 .1.3.6.1.2.1.2.2.1.8.99 = STRING: Port-channel8 .1.3.6.1.2.1.2.2.1.8.100 = STRING: Port-channel9
Now you can run the following awk command to create and format the service definitions for your host and add them to the services.cfg file.
awk ' {print "define service \ { \n \t use \t \t \t generic-service \n \ \t hosts \t \t \t core1 \n \ \t service_description \t "$4" \n \ \t check_command \t \t check_snmp_router!"$1"!-r 1!-l ifOperStatus \n \ \t } \n"}' interfaces.txt >> /etc/nagios3/conf.d/services.cfg
Thanks for nice post. Can we filter only up interfaces and put then in monitoring?
bgp session with cisco nexus doesn’t work! only with the 7k models…
Great article however I am seeing this and when I tried to use it
I got the following:
# Report the System Uptime
define service {
use generic-service
hosts *
service_description System UpTime
check_interval 5 ; This overrides what is specified in the
check_command check_snmp_args!1.3.6.1.2.1.1.3.0 <- This is not defined anywhere
}
thanks