Quick Hits: Adding Items to an Array and a Look at Performance

I had some comments on a recent performance comparison article that were well received and encouraged me to write some more of these similar articles. So without further ado, lets get started on looking at adding items to an array and which approach would be better in terms of performance.

Typically, the most common approach that I see with adding items to an array in PowerShell is through the use of the += operator.

 
$a = @()
$a += 'data'
$a
$a += 'test'
$a

image

I had to first initialize the array, otherwise my attempts to add the text would instead be treated as concatenating text, which is not what I am looking for in this example. As I said, this is the common approach that I see, but is it necessarily the fastest approach? Well, the answer is no.

Using an ArrayList

The answer to better performance is in the use of an ArrayList. As you may see more and more with getting better performance out of PowerShell, you will typically see a move towards using some .Net instances to perform similar operations that can be done using actual PowerShell commands or operators. So something that I did above can be done like the following:

 
$a = New-Object System.Collections.ArrayList
$a.Add('data')
$a.Add('test')
$a

image

The 0 and 1 that you see being outputted each time I add an item indicates the index of which the item was added into the collection. So ‘data’ was added at the 0 index and ‘test’ was added to the 1 index. This can be pretty annoying in my opinion and can pollute the pipeline and bring undesirable results. You can get around this by making sure that data is sent to a Null location.

We have two options for creating a collection of items set up before us, but the next question is which one is the quickest? Let’s find out!

 
@(1E1,1E2,1E3,1E4,1E5) | ForEach {
    $Time = (Measure-Command {
        $array = @()
        1..$_ | ForEach {
            $array+=$_
        }
    }).TotalMilliseconds
     [pscustomobject]@{
        Type = '+='
        Time_ms = $Time
        Count = $_
    }

    $Time = (Measure-Command {
        $list = New-Object System.Collections.ArrayList
        1..$_ | ForEach {
            [void]$list.Add($_)
        }
    }).TotalMilliseconds
     [pscustomobject]@{
        Type = 'ArrayList'
        Time_ms = $Time
        Count = $_
    }
} | Sort Count | Format-Table -AutoSize

2014-09-23_6-09-01

I’m not saying that ArrayList won every time…wait, I am saying that! As the total number of items added to each collection increased, the time also increased for each of the approaches with the += increasing rather dramatically near the end. What is happening with the += operator is that it actually builds out a new array each time you use the operator so it can add the new item to the collection. Not exactly efficient, but it gets the job done. You won’t find that with the ArrayList approach as it adds the item right into the collection.

As with any of these types of performance tests, I covered the simple up the extreme type of situations, so if you absolutely want to squeeze every possible millisecond out of your PowerShell scripts, then you would definitely want to look at the ArrayList approach.

This entry was posted in powershell and tagged , , , , . Bookmark the permalink.

8 Responses to Quick Hits: Adding Items to an Array and a Look at Performance

  1. jamesone111 says:

    A late comment … it might be obvious, but it is not the data type but the operation which makes the difference.. you can declare a System.Collections.ArrayList and add items with += and the times are much the same as when you use += with the default array. When you use .Add() – which fails with fixed sized (default) arrays but works with System.Collections.ArrayList – THEN you get the speed up.

  2. kaynix says:

    very interesting.
    could you also explain why when i filter multiple values in one column in excel, using powershell autofilter and then
    copy the result to another sheet
    my curent script will make the file get from 3mb to 250mb
    and if i copy the hole sheet it only 2mb bigger?

  3. Pingback: Looking More at System.Array With PowerShell | Learn Powershell | Achieve More

  4. jonathan says:

    Brilliant, Keep it up!

    Earlier you tested, which method of creating an object was fastest. Which actually made me convert much of a program i am currently working on. From using the New-Object to [Activator]::

    Turns out though, that the PS_ISE, cannot register the properties on a object made with the Activator ( intellisense ) however, made with New-Object, it could. It doesn’t matter much, because i use the MSDN page. But still relevant.

    I also kind-of stole bit of your code, to test which method of doing 2 things is fastest.
    Seems a bit silly, however i found it interesting to test. Join-path vs String Concatenation. String concatenation won.

    Sorry for rambling. 😛

  5. cantoris says:

    Both interesting and useful, thanks.
    Evidence of the Array having to be re-made when you add to it can be seen here:
    @().IsFixedSize
    = $True
    (I see this is not the case for a hashtable.)

    • cantoris says:

      Also Interesting to see that an ArrayList has a Capacity property that starts at 4 (or 0 if the array is empty) and is doubled as required – so presumably a new ArrayList object is created each time you exceed the current one’s capacity. Vastly better than a new one on every addition though like with [Array].

  6. Larry Weiss says:

    Please examine the time posted for the Count of 10 for the += Type
    It seems like a typo that it is so large a number compared with the Count of 100 for Type +=

Leave a comment