A Look at Filtering Performance in PowerShell

While working on a script for work, I ran into a situation where I needed to some filtering on a bunch of data and the thought struck me as to what was the quickest approach to performing a filter of data. We know that Where-Object is the official filtering cmdlet with PowerShell and that it gets the job done without much issue. With PowerShell V4, we also got the .Where() method which is built for Desired State Configuration, but has uses outside of that in terms of performance vs. using Where-Object, but at the expense of having all of the data stored in memory prior to performing the filter whereas Where-Object takes input from the pipeline and processes each item to see what matches the filter.

Edit: I should know better than to write late at night, but thanks to Dave for catching that I didn’t have my TestFunction configured to match our Filter. As mentioned, while the pipeline is an amazing piece of PowerShell, it can be expensive with performance.

Where Do We Begin?

I wanted to just do a simple filter just to see how the performance would be. So with that I will use $%2 which will evaluate each number ($) to $True on odd numbers and $False on even numbers. This will allow me to look to only filter only for odd numbers.

 
1..4|ForEach {
    If ($_%2){
        "{0}: Odd" -f $_
    }Else{
        "{0}: Even" -f $_
    }
}

image

As I said, its very simple with no complexity at all, but it is all I need for testing.

Now that we have that taken care of, the next step is to look at as many possible ways to filter data that I can think of. Now this is an exhaustive list of filtering possibilities, but it has enough ways to show where performance is great down to where it is lacking as we deal with more data.

Next up is looking at those possible filtering techniques which will range from using a cmdlet to a method to some other techniques that you may not have seen before.

The eleven filtering methods that I will be testing are as follows:

  • ForEach() {}
  • | ForEach {}
  • | Where-Object {}
  • .Where({})
  • PowerShell Filter (see below for source code)
  • PowerShell Filter with Parameter (see below for source code)
  • .{Process {}}
  • Using [Predicate[Object]] (see below for source code)
  • .ForEach({})
  • PowerShell Function using parameter (see below for source code)
  • PowerShell Function with Pipeline (see below for source code)

Source Code for Custom Methods

PowerShell Filter

Dating back to V1, this was the original way to send data via the pipeline to a custom command. It is still very much useful with V5 to provide a quick way to filter out data.

 
Filter TestFilter {   
    If ($_%2){$_}
}

PowerShell Filter with Parameter

Because I wanted to include a parameter that lets you set a predicate vs. hard coding one.

 
Filter TestFilter_Predicate {
    Param ($Predicate)   
    If (&$Predicate){$_}
}

Predicate Object

I use this with some of my UIs to quickly filter data; figured it would make for a good filter method here as well.

 
$t = [System.Windows.Data.CollectionViewSource]::GetDefaultView($List)
$t.Filter = [Predicate[Object]]{
    Try {
        $args[0] % 2
    } Catch {$True}
}

PowerShell Function

The PowerShell functions we know and love that supports the pipeline. I’ll test both the pipeline approach as well as using the –InputObject parameter.

Function TestFunction {
    [cmdletbinding()]
    Param(
        [parameter(ValueFromPipeline)]
        $InputObject
    )
    Process {
        If ($_%2){$_}
    }
}
Function TestFunction_param {
    [cmdletbinding()]
    Param(
        $InputObject
    )
    ForEach ($item in $InputObject){
        If ($item%2){$item}
    }
}

Of course, the data returned varies based on how many resources are being consumed on your computer.

Let’s see the data!

I am going to be posting the source code that I will be using to provide the results below so you can take it and use it for your own testing.

I am going to look at running the tests against the following collection of count of numbers: 10,100,1000,10000,100000 and pull only the odd numbers from that list. What we have below are the results of the tests with each count grouped together and sorted from fastest to slowest. The fastest has green font while the slowest one has red font.

Filter1 Filter2

Here we can see that the winners are split between using ForEach(){} and the TestFunction using a parameter while using Where-Object and a Function taking pipeline input (the updated function still wasn’t the fastest approach, but it is no longer in the top two slowest after taking out the unneeded pipeline within the function) turn out to be the slowest approaches (although applying a parameter to our Filter definitely slows it down as we start adding more data). The Filter performed admirably as well as using .{Process{}} to do the filtering. Some of these approaches, such as top two winners, require that you have enough memory to support keeping all of the data prior to performing the filter. If you want just a little bit slower performance (and I do mean a little), you can rely on the pipeline and save memory by using a Filter or looking at .{Process{}} instead.

Of course, a graph can show just how these approaches scale out over the course of adding more data to each set.

FilterGraph

Now let’s break this out to see how each of these handles more data so you can get a better idea as to what is going on.

FilterGraph1 FilterGraph2 FilterGraph3 FilterGraph4 FilterGraph5 FilterGraph6 FilterGraph7 FilterGraph8 FilterGraph9 FilterGraph10 FilterGraph11

Only a few graphs, right?

In the end, what we have seen is that while Where-Object is the most well known filtering approach in PowerShell, if you are really looking to squeeze each and every possible millisecond from your commands, you might want to look at some alternative approaches to filtering your data, such as a Filter if you don’t want to exhaust memory. Some approaches like building out a predicate is probably just silly, but I wanted to use everything that I could think of in my tests, which also doesn’t really do anything all that complex at all.

Unless there is a pressing need to work with a ton of data, I will still rely mostly on Where-Object to accomplish what I need to do. Because it is simple and gets the job done without much thought involved (unless your filtering queries are complex, of course).

The source code for my testing is available below. Give it a shot and let me know how the results look for you. Speaking of which, if you think I missed something or have other recommendations, feel free to let me know or post up your results here!

Source Code

 
Filter TestFilter {   
    If ($_%2){$_}
}
Function TestFunction {
    [cmdletbinding()]
    Param(
        [parameter(ValueFromPipeline)]
        $InputObject
    )
    Process {
        If ($_%2){$_}
    }
}
Function TestFunction_param {
    [cmdletbinding()]
    Param(
        $InputObject
    )
    ForEach ($item in $InputObject){
        If ($item%2){$item}
    }
}
Filter TestFilter_Predicate {
    Param ($Predicate)   
    If (&$Predicate){$_}
}
 
[decimal[]]$Count = 1E1,1E2,1E3,1E4,1E5,1E6
$Data = ForEach ($Item in $Count) {
    $List = 1..$Item
    Write-Verbose "Testing for Count: $($Item)" -Verbose
    $Seconds = (Measure-Command {$List|Where{$_%2}}).TotalSeconds
    [pscustomobject]@{
        Type = 'Where-Object Filter <Pipeline>'
        IsPipeline = $True
        Time_seconds =$Seconds
        Count = $Item
    }
    $Seconds = (Measure-Command {$List.Where({$_%2})}).TotalSeconds
    [pscustomobject]@{
        Type = '.Where() Filter'
        IsPipeline = $False
        Time_seconds =$Seconds
        Count = $Item
    }
 
    $Seconds = (Measure-Command {$List|ForEach{If($_%2){$_}}}).TotalSeconds
    [pscustomobject]@{
        Type = 'ForEach Filter <Pipeline>'
        IsPipeline = $True
        Time_seconds =$Seconds
        Count = $Item
    }
    $Seconds = (Measure-Command {$List.ForEach({If($_%2){$_}})}).TotalSeconds
    [pscustomobject]@{
        Type = '.ForEach Filter'
        IsPipeline = $False
        Time_seconds =$Seconds
        Count = $Item
    }
    $Seconds = (Measure-Command {ForEach ($Item in $List){If($Item%2){$Item}}}).TotalSeconds
    [pscustomobject]@{
        Type = 'ForEach Filter'
        IsPipeline = $False
        Time_seconds =$Seconds
        Count = $Item
    }
 
    $Seconds = (Measure-Command {$list | .{process{If($_%2){$_}}}}).TotalSeconds
    [pscustomobject]@{
        Type = '.{Process{}} Filter <pipeline>'
        IsPipeline = $True
        Time_seconds =$Seconds
        Count = $Item
    }
 
    $Seconds = (Measure-Command {$List|TestFunction}).TotalSeconds
    [pscustomobject]@{
        Type = 'TestFunction Filter <Pipeline>'
        IsPipeline = $True
        Time_seconds =$Seconds
        Count = $Item
    }

    $Seconds = (Measure-Command {TestFunction_param $List}).TotalSeconds
    [pscustomobject]@{
        Type = 'TestFunction Filter'
        IsPipeline = $False
        Time_seconds =$Seconds
        Count = $Item
    }
 
    $Seconds = (Measure-Command {$List|TestFilter}).TotalSeconds
    [pscustomobject]@{
        Type = 'TestFilter Filter <pipeline>'
        IsPipeline = $True
        Time_seconds =$Seconds
        Count = $Item
    }
    $Seconds = (Measure-Command {$List|TestFilter_Predicate -Predicate {$_%2}}).TotalSeconds
    [pscustomobject]@{
        Type = 'TestFilter_Predicate Filter'
        IsPipeline = $False
        Time_seconds =$Seconds
        Count = $Item
    }
    $Seconds = (Measure-Command {
        $t = [System.Windows.Data.CollectionViewSource]::GetDefaultView($List)
        $t.Filter = [Predicate[Object]]{
            Try {
                $args[0] % 2
            } Catch {$True}
        }
    }).TotalSeconds
    [pscustomobject]@{
        Type = 'Predicate Filter'
        IsPipeline = $False
        Time_seconds =$Seconds
        Count = $Item
    }
}
#Send data to CSVs
Remove-Variable List,Count
$data|group count | ForEach {
   
    $temp = ((($_.Group|sort Time_Seconds |ft -auto|out-string) -split '\n')|?{$_ -match '\w|-'})
    For ($i=0;$i -lt $temp.count;$i++) {
        If ($i -eq 2) {
            Write-Host $temp[$i] -fore Green
        } ElseIf ($i -eq ($Temp.Count-1)) {
            Write-Host $temp[$i] -fore Red
        } Else {
            Write-Host $temp[$i]
        }
    }
    Write-Host "`n--------------------------------------------------------`n"
}
This entry was posted in powershell and tagged , , . Bookmark the permalink.

4 Responses to A Look at Filtering Performance in PowerShell

  1. bjorn80 says:

    Excellent write up and very interesting. I work with a lot of data and need to do a lot of filtering. Thanks for the pointers!

  2. Dave Wyatt says:

    Your TestFunction results are probably much slower than they need to be, because of the second pipeline to ForEach-Object set up inside the process block. Creating a whole new pipeline for every input object is expensive.

    A fairer comparison would be to have the process block body exactly match the body of the Filter test. (Filters are still faster, but not by as much.)

    • Boe Prox says:

      Good catch Dave. I am going to update this and re-publish the results. That is what I get for trying to write when it is late at night. 🙂

    • Boe Prox says:

      Yep, that dropped it down to more accurate levels. Of course, the filter is still the quickest of the two, but at least we don’t have the pipeline throwing a wrench into the tests on the function now. Side note: its fun to re-do this while you have a 3 year old putting you in a choke hold. 🙂

Leave a comment