Monday, May 10, 2010

Exercizing an ICMP Poller


Exercizing an ICMP Poller

Abstract:
Polling devices for availability on a global basis required an understanding of the underlying topology and impacts due to concerns of latency and reliability. An excellent tool to perform polling on a wide scale is "fping".

Methodology:
The "fping" poller can take a sequence of nodes and probe those nodes. A simple way to start is to leverage the "hosts" table. The basic UNIX command called "time" can be leveraged to understand performance. The "nawk" command can be leverage in order to parse nodes and output into usable information.

Implementation:
Parsing the "hosts" table is simple enough to do via "nawk", leveraging a portion of the device name, IP address, or even comments! The "fping" poller is able to use Name, IP, or even the regular "/etc/hosts" table format for input.

For example, looking for all 192 entries in the host table can be done as follows:

sunt2000/root# nawk '/192./ { print $1 }' /etc/hosts # parse node ip addresses
sunt2000/root# nawk '/192./ { print $2 }' /etc/hosts # parse node names
sunt2000/root# nawk '/192./' /etc/hosts # parse node names

When performing large numbers of pings across the globe, name lookups can unexpectedly add a great deal of time to the run of the command:

sunt2000/root# time nawk '/192./ { print $1 }' /etc/hosts \| fping \|
nawk '/alive/ { Count+=1 } END { print "Count:", Count }'


Count: 2885
real 3m12.70s
user 0m0.45s
sys 0m0.68s

sunt2000/root# time nawk '/192./ { print $2 }' /etc/hosts \| fping \|
nawk '/alive/ { Count+=1 } END { print "Count:", Count }'


Count: 2884
real 8m47.74s
user 0m0.49s
sys 0m0.70s

The name resolution lookup for the system to find an ip address is mitigated the second run, due to the name caching daemon, under operating systems like Solaris.

sunt2000/root# time nawk '/192./ { print $2 }' /etc/hosts \| fping \|
nawk '/alive/ { Count+=1 } END { print "Count:", Count }'


Count: 2883
real 3m10.87s
user 0m0.44s
sys 0m0.67s

Avoiding Name Resolution Cache Miss:
Sometimes, an application can not afford to accept an occasional Name Resolution cache miss. One way of managing this is through the parsing script on the back end of the "fping" command in an ehanced "nawk" one-liner.

sunt2000/root# time nawk '/HDFC/ { print $1 }' /etc/hosts \| fping \| nawk '
BEGIN { File="/etc/hosts" ; while ( getline <>
Ip=$1; Name=$2 ; IpArray[Ip]=Ip ; NameArray[Ip]=Name }
close(File) }
/alive/ { Count+=1 ; print NameArray[$1] "\t" $0 }
END { print "Count:", Count }'

Count: 2882
real 3m9.04s
user 0m0.73s
sys 0m0.76s

Tuning Solaris 10:
When manaing LARGE number of devices, ICMP Input overflows may be occurring. This can be checked via the "netstat" command parsed by a nawk one-liner. Note, the high ratio.

sunt2000/root# netstat -s -P icmp \| nawk '{ gsub("=","") }
/icmpInMsgs/ { icmpInMsgs=$3 }
/icmpInOverflows/ { icmpInOverflows=$2 }
END { print "Msgs=" icmpInMsgs "\tOverflows=" icmpInOverflows }'

Msgs=381247797 Overflows=138767274

To check what the tunable is:
sunt2000/root# ndd -get /dev/icmp icmp_max_buf
262144

The above value is the default for this version of Solaris. It can (and should) be increased (dramatically), as device counts start to grow aggressively (into the thousands of devices.)
sunt2000/root# ndd -set /dev/icmp icmp_max_buf 2097152

Note, the above setting is not persistent after reboot.

Validating Tuning:
Before and after the fping, the values of icmp messages and overflows can be observed.
Prior FPing: icmpInMsgs=381267922 icmpInOverflows=138775834
Post FPing: icmpInMsgs=381270809 icmpInOverflows=138778159
Difference: icmpInMsgs=2887 icmpInOverflows=2325

Applying the tunables temporarily
sunt2000/root# ndd -set /dev/icmp icmp_max_buf 2097152
sunt2000/root# ndd -get /dev/icmp icmp_max_buf
2097152

Validate the Ratio:
Prior FPing: icmpInMsgs=381279465 icmpInOverflows=138778662
Post FPing: icmpInMsgs=381282224 icmpInOverflows=138778806
Difference: icmpInMsgs=2759 icmpInOverflows=144


Secondary Validation of the Ratio:
Prior FPing: icmpInMsgs=381296575 icmpInOverflows=138784125
Post FPing: icmpInMsgs=381300943 icmpInOverflows=138784125
Difference: icmpInMsgs=4368 icmpInOverflows=0


Making the Tuning Persistent:
A start/stop script can be created to make the tunables persistent.
t2000/root# vi /etc/init.d/ndd_rmm.sh

#!/bin/ksh
# script: ndd_rmm.sh
# author: david halko
# purpose: make a start/stop script to make peristent tunables
#
case ${1} in
start) /usr/sbin/ndd -get /dev/icmp icmp_max_buf nawk '
!/2097152/ {
Cmd="/usr/sbin/ndd -set /dev/icmp icmp_max_buf 2097152"
system(Cmd) }'
;;
status) ls -al /etc/init.d/ndd_rmm.sh /etc/rc2.d/S89_ndd_rmm.sh
/usr/sbin/ndd -get /dev/icmp icmp_max_buf
;;
install) ln -s /etc/init.d/ndd_rmm.sh /etc/rc2.d/S89_ndd_rmm.sh
chmod 755 /etc/init.d/ndd_rmm.sh /etc/rc2.d/S89_ndd_rmm.sh
chown -h ivadmin /etc/init.d/ndd_rmm.sh /etc/rc2.d/S89_ndd_rmm.sh
;;
*) echo "ndd_rmm.sh [startstatusinstall]\n"
esac
:w
:q

t2000/root# ksh /etc/init.d/ndd_rmm.sh install

t2000/root# ksh /etc/init.d/ndd_rmm.sh status
-rwxr-xr-x 1 ivadmin root 647 May 10 21:18 /etc/init.d/ndd_rmm.sh
lrwxrwxrwx 1 ivadmin root 22 May 10 21:20 /etc/rc2.d/S89_ndd_rmm.sh -> /etc/init.d/ndd_rmm.sh
262144

t2000/root# /etc/init.d/ndd_rmm.sh start

t2000/root# /etc/init.d/ndd_rmm.sh status
-rwxr-xr-x 1 root root 647 May 10 21:18 /etc/init.d/ndd_rmm.sh
lrwxrwxrwx 1 root root 22 May 10 21:20 /etc/rc2.d/S89_ndd_rmm.sh -> /etc/init.d/ndd_rmm.sh
2097152

Monitoring Overflows :
You can monitor overflows in near-real-time through a simple script such as:
#
# script: icmpOverflowMon.sh
# author: David Halko
# purpose: simple repetitive script to monitor icmp overflows
#
for i in 0 1 2 3 4 5 6 7 8 9 ; do
for j in 0 1 2 3 4 5 6 7 8 9 ; do
for k in 0 1 2 3 4 5 6 7 8 9 ; do
echo "`date` - $i$j$k - \c"
netstat -s -P icmp \| nawk '{ gsub("=","") }
/icmpInMsgs/ { icmpInMsgs=$3 ; print $0 }
/icmpInOverflows/ { icmpInOverflows=$2 ; print $0 }
END { print "InMsgs=" icmpInMsgs "\tOverflows=" icmpInOverflows }'
sleep 10
done
done
done
Conclusion:
Exercizing ICMP pollers in a Network Management environment is easy to do. It may be important to tune the OS if polling a large number of devices is required. Tuning the OS is a very reasonable process where metrics can simply show the behavior and improvements.


----------------------------------------------
UPDATE --- every time I edit this posting in blogger, it removes my pipe symbols. In general, you may need to add a pipe between some commands like fping, netstat, nawk if you get a syntax error when you copy-paste a line of text.

No comments:

Post a Comment