I was a network engineer in a previous life. Though I’ve moved into DevOps, my love for network automation has not subsided.
Recently, an engineer posted a message on one of the Slack Workgroups I frequent. He had been asked to test and document network connectivity for a large number of hosts… by close of business the next day. Given the tight timeline, I offered to write a script for him which would automate the process. He graciously accepted.
Design
ping
and traceroute
tests can take anywhere from a few seconds to 10s of seconds each. Given that the engineer needed to test a large number of hosts, running these commands serially would take a very long time. So what can we do to fix this problem? Use multithreading to run the test in parallel.
If this design sounds familiar, it might be because it was covered in a previous post - Multithreading with Python and Netmiko. However, this time we’re going to try something a little different. In the aforementioned post, we used the threading
module.
This time we’ll be using the concurrent.futures
module. The technical explanation is that it provides a higher level interface than threading
. This means that it abstracts away a lot of the implementation details required to implement multithreading.
To put it another way, concurrent.futures
is very lightweight as we’ll see in a moment. And comparatively speaking, threading
requires quite a lot of code to utilise.
Code
The finished script can be found on GitHub. Let’s now dissect the script to ensure we understand how it works.
First we create the OUTPUT_DIR
. We then read the file which contains the hostnames that need to be checked:
1
2
3
4
5
6
def main():
print('Running health checks...')
Path(OUTPUT_DIR).mkdir(parents=True, exist_ok=True)
with open(INPUT_FILENAME, 'r') as f:
ips = f.read().splitlines()
Next we run health_checks
using ThreadPoolExecutor
(which comes from the concurrent.futures
module).
1
2
with ThreadPoolExecutor(max_workers=MAX_THREADS) as executor:
[executor.submit(health_checks, ip) for ip in ips]
And just like that, we’ve implementing multithreading. That’s right! All it took was just two lines of code. How amazing is that?
Let’s now move our focus to the final piece of the puzzle - the health_checks
function:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
def health_checks(ip):
ping_cmd = f'{PING_COMMAND} {ip}'
trace_cmd = f'{TRACEROUTE_COMMAND} {ip}'
ping_status = run_command(ping_cmd)
trace_status = run_command(trace_cmd)
filename = f'{OUTPUT_DIR}{os.sep}{ip}.txt'
with open(filename, 'w') as f:
f.write(title('Ping Results'))
f.write(ping_status)
f.write(title('Trace Results'))
f.write(trace_status)
logging.info(f'Wrote outputs to: {filename}')
logging.debug(ping_status)
logging.debug(trace_status)
Though this function is the heart of our script, there’s nothing particularly exciting happening here. We’re simply running the traceroute
and ping
commands on each the provided hosts. The real beauty comes from the fact that this function is being called by ThreadPoolExecutor
.
In doing so, we’re able to test connectivity to a large number of hosts in the same amount of time that it would take us to test one host.
Output
Below is an example of what the health check output looks like:
****************************************************************************************************
Ping Results
****************************************************************************************************
PING 1.1.1.1 (1.1.1.1) 56(84) bytes of data.
--- 1.1.1.1 ping statistics ---
3 packets transmitted, 3 received, 0% packet loss, time 2004ms
rtt min/avg/max/mdev = 10.165/11.043/12.695/1.168 ms
****************************************************************************************************
Trace Results
****************************************************************************************************
traceroute to 1.1.1.1 (1.1.1.1), 30 hops max, 60 byte packets
1 _gateway (192.168.20.1) 1.398 ms 4.752 ms 4.696 ms
2 1.2.3.4 (1.2.3.4) 10.265 ms 11.023 ms 11.229 ms
3 5.6.7.8 (5.6.7.8) 12.595 ms 17.532 ms 17.479 ms
4 10.1.31.141 (10.1.31.141) 12.415 ms 12.358 ms 17.286 ms
5 10.1.31.142 (10.1.31.142) 17.243 ms 17.187 ms 17.138 ms
6 10.1.31.195 (10.1.31.195) 16.721 ms 12.358 ms 17.138 ms
7 as13335.melbourne.megaport.com (103.26.71.38) 11.290 ms 11.685 ms 12.149 ms
8 one.one.one.one (1.1.1.1) 11.860 ms 12.673 ms 12.618 ms
As always, if you have any questions or have a topic that you would like me to discuss, please feel free to post a comment at the bottom of this blog entry, e-mail at will@oznetnerd.com, or drop me a message on Reddit (OzNetNerd).
Note: The opinions expressed in this blog are my own and not those of my employer.
Leave a comment