Recently I've come across to a weird incident where a very network-demanding application started to show strange behaviors and poor performance. This application consumes lots of sockets, both on the client and the server. Of course, this shouldn't be an issue, as long as there are lots of solutions out there that have similar requirements.
The case here was that, both the client and the server, kept orphaned/zombie TCP
connections in state CLOSING
and FIN_WAIT_2
on each side. In other words, no alive Process was the owner of those connections and, as a result, there was no way to close them and free network resources for the application service.
Although the cause of this issue is still under investigation, everything seems to point to some sort of handle leak. Anyway, although interesting, I won't bother you with the internals of this diagnosis but with a way you can characterize the impact of this issue using PowerShell 2.
Our first problem was, how to do extract meaningful information from more than 25.000 connections from both sides of the communication. If we succeed, we’ll be ready to answer more questions: Could this issue be happening everywhere? In such a case, how quickly could we perform an evaluation in the other servers in the farm? Were we able to issue an estimation about how quickly the service was degrading?
The solution
############################################################################################# # Data Acquisition $connections = netstat -ano ############################################################################################# # Data Parsing/Processing [PSObject[]] $ConnectionsList = @() $i = 0 $pattern = '\s+(?<Protocol>\w+)\s+' + ` '(?<LocalAddress>[\d\.]+:\d+)\s+' + ` '(?<ForeignAddress>[\d\.]+:\d+)\s+' + ` '(?<State>\w+)\s+' + ` '(?<pid>\d+)' foreach ($connection in $connections) { $i++ Write-Host -foregroundcolor Green " + Processing Element: $i" if ( $connection -match $pattern ) { $ConnectionsList += New-Object PSObject -Property @{ Protocol = $Matches.Protocol LocalAddress = $Matches.LocalAddress ForeignAddress = $Matches.ForeignAddress State = $Matches.State PID = $Matches.PID } } # if ( $i -gt 10 ) { break } } ############################################################################################# # Data Analysis # How many connections do really exist $ConnectionsList | Measure-Object # How many unique processes have connections alive $ConnectionsList | Select-Object PID -unique | Measure-Object # Top 20: Which Processes have the most connections $ConnectionsList | Select-Object LocalAddress, PID | Group-Object PID | Sort-Object Count -Descending | Select-Object -first 20 # Remote Services (Ports) which we connect to $ConnectionsList | Select-Object ForeignAddress -unique | ForEach-Object { $_.ForeignAddress.Split(":")[1] } # Top 20: Which Remote Services (Ports) we connect the most to $ConnectionsList | ForEach-Object { $_.ForeignAddress.Split(":")[1] } | Group-Object | Sort-Object Count -Descending | Select-Object -first 20 # Top 20: Which Remote Machines we connect the most to $ConnectionsList | ForEach-Object { $_.ForeignAddress.Split(":")[0] } | Group-Object | Sort-Object Count -Descending | Select-Object -first 20 # Which Connection States we have on the system $ConnectionsList | Select-Object State -unique # More frequent Connection States $ConnectionsList | ForEach-Object { $_.State.ToString() } | Group-Object | Sort-Object Count -Descending # Handle Leak: Which Processes do NOT EXIST and HAVE an active connection on the system Compare-Object $(Get-Process | ForEach-Object { $_.Id.ToString() } | Select-Object -unique ) $($ConnectionsList | ForEach-Object { $_.PID } | Select-Object -unique ) -IncludeEqual | Where-Object { $_.SideIndicator -eq "=>" } # Which Processes do EXIST and HAVE an active connection on the system Compare-Object $(Get-Process | For-EachObject { $_.Id.ToString() } | Select-Object -unique ) $($ConnectionsList | ForEach-Object { $_.PID } | Select-Object -unique ) -IncludeEqual | Where-Object { $_.SideIndicator -eq "==" } # Which Processes do EXIST and DON'T HAVE an active connection on the system Compare-Object $(Get-Process | ForEach-Object { $_.Id.ToString() } | Select-Object -unique ) $($ConnectionsList | ForEach-Object { $_.PID } | Select-Object -unique ) -IncludeEqual | Where-Object { $_.SideIndicator -eq "<=" } # Connection State frequency by Process $ConnectionsList | Select-Object @{ Name="State"; Expression={$_.State.ToString()} }, @{ Name="PID"; Expression={$_.PID.ToString()} } | Group-Object State, PID | Sort-Object Count -Descending
Details matter…
There are several important details here that worth it further commenting. The first one is the use of the foreach
keyword instead of the ForEach-Object
commandlet. The later will wait until all the objects from the previous expression in the pipeline have gone through it to start processing. Conversely, the foreach
keyword will process each object as soon it reaches the pipeline. This subtle difference is key whenever you have to handle a significant amount of objects. In our case, more than 25.000 elements.
Even though we have taken care about the difference between the foreach
keyword and the ForEach-Object
commandlet. You will notice that handling a large set of objects takes time. In these cases, if you don't provide yourself with some sort of progress indication, you will soon experience a feeling that something wrong is going on. Even if that's not the case. That is why we use the $i
variable to give us feedback of the progress on the heavy processing operations.
Making things easier…
When you are in the course of solving the problem, you won't bother of writing your code in a canonical fashion. You simply are focused on getting things done. In these situations you will feel more comfortable writing your Data Analysis statements taking advantage of PowerShell Aliases, Tab Completion, Parameters Shortcuts and other quick and dirty techniques. Let's see an example:
############################################################################################# # Data Analysis # How many connections do really exist $ConnectionsList | Measure # How many unique processes have connections alive $ConnectionsList | SelectPID -unique | Measure # Top 20: Which Processes have the most connections $ConnectionsList | Select LocalAddress, PID | Group PID | Sort Count -Descending | Select -first 20 # Remote Services (Ports) which we connect to $ConnectionsList | Select ForeignAddress -unique | % { $_.ForeignAddress.Split(":")[1] } # Top 20: Which Remote Services (Ports) we connect the most to $ConnectionsList | % { $_.ForeignAddress.Split(":")[1] } | Group | Sort Count -Descending | Select -first 20 # Top 20: Which Remote Machines we connect the most to $ConnectionsList | % { $_.ForeignAddress.Split(":")[0] } | Group | Sort Count -Descending | Select -first 20 # Which Connection States we have on the system $ConnectionsList | Select State -unique # More frequent Connection States $ConnectionsList | % { $_.State.ToString() } | Group | Sort Count -Descending # Handle Leak: Which Processes do NOT EXIST and HAVE an active connection on the system Compare $(gps | % { $_.Id.ToString() } | Select -unique ) $($ConnectionsList | % { $_.PID } | Select -unique ) -IncludeEqual | ? { $_.SideIndicator -eq "=>" } # Which Processes do EXIST and HAVE an active connection on the system Compare $(gps | % { $_.Id.ToString() } | Select -unique ) $($ConnectionsList | % { $_.PID } | Select -unique ) -IncludeEqual | ? { $_.SideIndicator -eq "==" } # Which Processes do EXIST and DON'T HAVE an active connection on the system Compare $(gps | % { $_.Id.ToString() } | Select -unique ) $($ConnectionsList | % { $_.PID } | Select -unique ) -IncludeEqual | ? { $_.SideIndicator -eq "<=" } # Connection State frequency by Process $ConnectionsList | Select @{ n="State"; e={$_.State.ToString()} }, @{ n="PID"; e={$_.PID.ToString()} } | Group State, PID | Sort Count -Descending
The pattern
Fortunately, many problems respond to this pattern:
- Data Acquisition: from a PowerShell Commandlet, a CLI program, an input file, etc.
- Data Parsing/Processing: taking advantage of the
-matches
operator is a useful way of doing it as long as you have some practice with .NET regular expressions. Of course, this is not the only way of performing this stage of the process. Use whatever it takes for your case. - Data Analysis: You will mostly use the commandlets that we have use in this example:
Select-Object
,Group-Object
,Sort-Object
,Compare-Object
,Measure-Object
,ForEach-Object
,Where-Object
. Of course, if you want to have a richer control on the output, you can add more commandlets to this list:Format-Table
,Format-List
,Out-File
,Export-Csv
, etc.
Taking advantage of this pattern you can try to standardize your code and/or your code skeletons to be even more proficient on your complex analysis projects.
Conclusion
Here we have seen how an endless list of connections from a regular netstat
command turned into meaningful and useful information that lead us to a clear idea of what the scope and impact of our incident really was. It does help to:
- narrow the potential universe of causes and speed up the diagnosis process.
- answer questions that your management staff might ask you.
- improve the quality of the available information, allowing to take better decisions even though urgency and anxiety are the “business drivers” in the course of the incident.
You might have read that PowerShell is a powerful tool in many problem domains. Obviously, this is not new. But, there is nothing like looking at a real example and feeling the value in the field. If you have not see it by yourself, just give it a try. The return of your learning investments will be huge.