Skip to content →

The case of the Zombie Connections

Recently I've come across to a weird incident where a very network-demanding application started to show strange behaviors and poor performance. This application consumes lots of sockets, both on the client and the server. Of course, this shouldn't be an issue, as long as there are lots of solutions out there that have similar requirements.

The case here was that, both the client and the server, kept orphaned/zombie TCP connections in state CLOSING and FIN_WAIT_2 on each side. In other words, no alive Process was the owner of those connections and, as a result, there was no way to close them and free network resources for the application service.

Although the cause of this issue is still under investigation, everything seems to point to some sort of handle leak. Anyway, although interesting, I won't bother you with the internals of this diagnosis but with a way you can characterize the impact of this issue using PowerShell 2.

Our first problem was, how to do extract meaningful information from more than 25.000 connections from both sides of the communication. If we succeed, we’ll be ready to answer more questions: Could this issue be happening everywhere? In such a case, how quickly could we perform an evaluation in the other servers in the farm? Were we able to issue an estimation about how quickly the service was degrading?

The solution

Details matter…

There are several important details here that worth it further commenting. The first one is the use of the foreach keyword instead of the ForEach-Object commandlet. The later will wait until all the objects from the previous expression in the pipeline have gone through it to start processing. Conversely, the foreach keyword will process each object as soon it reaches the pipeline. This subtle difference is key whenever you have to handle a significant amount of objects. In our case, more than 25.000 elements.

Even though we have taken care about the difference between the foreach keyword and the ForEach-Object commandlet. You will notice that handling a large set of objects takes time. In these cases, if you don't provide yourself with some sort of progress indication, you will soon experience a feeling that something wrong is going on. Even if that's not the case. That is why we use the $i variable to give us feedback of the progress on the heavy processing operations.

Making things easier…

When you are in the course of solving the problem, you won't bother of writing your code in a canonical fashion. You simply are focused on getting things done. In these situations you will feel more comfortable writing your Data Analysis statements taking advantage of PowerShell Aliases, Tab Completion, Parameters Shortcuts and other quick and dirty techniques. Let's see an example:

The pattern

Fortunately, many problems respond to this pattern:

  • Data Acquisition: from a PowerShell Commandlet, a CLI program, an input file, etc.
  • Data Parsing/Processing: taking advantage of the -matches operator is a useful way of doing it as long as you have some practice with .NET regular expressions. Of course, this is not the only way of performing this stage of the process. Use whatever it takes for your case.
  • Data Analysis: You will mostly use the commandlets that we have use in this example: Select-Object, Group-Object, Sort-Object, Compare-Object, Measure-Object, ForEach-Object, Where-Object. Of course, if you want to have a richer control on the output, you can add more commandlets to this list: Format-Table, Format-List, Out-File, Export-Csv, etc.

Taking advantage of this pattern you can try to standardize your code and/or your code skeletons to be even more proficient on your complex analysis projects.

Conclusion

Here we have seen how an endless list of connections from a regular netstat command turned into meaningful and useful information that lead us to a clear idea of what the scope and impact of our incident really was. It does help to:

  • narrow the potential universe of causes and speed up the diagnosis process.
  • answer questions that your management staff might ask you.
  • improve the quality of the available information, allowing to take better decisions even though urgency and anxiety are the “business drivers” in the course of the incident.

You might have read that PowerShell is a powerful tool in many problem domains. Obviously, this is not new. But, there is nothing like looking at a real example and feeling the value in the field. If you have not see it by yourself, just give it a try. The return of your learning investments will be huge.

The case of the Zombie Connections by Carlos Veira Lorenzo is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.

Published in Automation Troubleshooting