Posts RSS Comments RSS 253 Posts and 408 Comments till now

Why use foreach vs foreach-object.

I have been ask several times why I use the foreach statement instead of the foreach-object cmdlet. The simple answer is performance.

First it is important to understand that foreach -ne foreach-object unless used in the pipeline. I know that sounds confusing so I will try to explain.

foreach-object: This is a looping cmdlet that processes (executes the script block) in the pipeline and uses $_ to reference the current item.
Syntax:

$objectCollection | foreach-object{"Do something to $_"}

foreach statement: This is a looping statement that process each item in a collection ($objectCollection) and perform the script block referencing each item as declared ($obj.)
Syntax:

foreach($obj in $objectCollection){"Do Something to $obj"}

Rules:
- foreach is considered to be foreach statement if it begins the line. If you want to use foreach-object you must use the fullname.
Example:

foreach-objectinput $objectCollection {"Do Something to $_"}
foreach($obj in $objectCollection){"Do Something to $obj"}

- foreach is considerd to be foreach-object if it used in the pipe.

Correct: $objectCollection | foreach-object{"Do Something to $_"}
Wrong: $objectCollection | foreach($obj in $objectCollection){"Do Something $obj"}

- % and foreach are alias to foreach-object

More Detail on performance:
The reason “foreach(){}” is faster is because it compiles into a single expression tree which gets evaluated in a single function call.

While, foreach-object is effectively compiled into three expression trees: For example get-childitem | foreach-object { $_.Name }

  • Get the value to pipe. Get-Childitem
  • Call the Foreach-Object. foreach-object
  • One for the ScriptBlock. {$_.Name}
  • Here is some sample code to show the performance Difference

    70# $ds = 1..50000
    71# measure-command {$ds | %{$_*9834} }

    Days              : 0
    Hours             : 0
    Minutes           : 0
    Seconds           : 16
    Milliseconds      : 551
    Ticks             : 165517175
    TotalDays         : 0.000191570804398148
    TotalHours        : 0.00459769930555556
    TotalMinutes      : 0.275861958333333
    TotalSeconds      : 16.5517175
    TotalMilliseconds : 16551.7175

    72# measure-command {foreach($item in $ds){$item*9834}}

    Days              : 0
    Hours             : 0
    Minutes           : 0
    Seconds           : 0
    Milliseconds      : 734
    Ticks             : 7341052
    TotalDays         : 8.49658796296296E-06
    TotalHours        : 0.000203918111111111
    TotalMinutes      : 0.0122350866666667
    TotalSeconds      : 0.7341052
    TotalMilliseconds : 734.1052

    12 Responses to “Why use foreach vs foreach-object.”

    1. on 04 Aug 2008 at 4:31 pmAndy

      Thanx. Ex as ever.
      A.

    2. on 05 Aug 2008 at 6:07 ammarcus

      brandon –

      awesome. thanks for the information on this!

    3. on 10 Aug 2008 at 7:41 pmPoshoholic

      Hey Brandon,

      I think you might be over-simplifying this for your readers and not giving them all the necessary details to make an informed decision on foreach vs. Foreach-Object. I’ve blogged about the things to consider here:

      http://poshoholic.com/2007/08/21/essential-powershell-understanding-foreach/
      http://poshoholic.com/2007/08/31/essential-powershell-understanding-foreach-addendum/

      Of those two articles, the first one is by far the most important to read and understand.

      The biggest thing to consider is whether or not your collection is loaded into memory already and if not, whether you want to load it into memory or not. The foreach statement will not do anything until the entire collection it is working with is already loaded. That means if you run this:

      foreach ($item in (Get-ChildItem C:\ -recurse)) {$item}

      you won’t see any output in PowerShell until the entire directory structure has been retrieved and loaded into memory. There are many, many times when you wouldn’t want a large colection loaded in memory so this is important.

      With respect to performance, the performance will be much faster if you already have a collection loaded in memory, but it will only be a little faster if your collection isn’t loaded in memory. See the performance numbers in the first of the two articles I refer to above if you want to know what I’m talking about. And performance is relative to what you’re doing. Waiting a long time for data to be displayed at all can be considered much less performant than having data displayed almost immediately even if the case where data is displayed immediately takes a little longer overall.

      Simply using the foreach statement instead of the ForEach-Object cmdlet isn’t necessarily giving you the best performance for your script like you might think it is.


      Kirk Munro [MVP]
      Poshoholic
      http://poshoholic.com

    4. on 10 Aug 2008 at 8:54 pmtshell

      That is a great point Kirk, but I am not sure it would effect my end feeling that foreach() is a better option.

      You point out an edge case. Generally speaking you will want to do more parsing per cycle, but even in your case still foreach(){} is faster.

      PS> Measure-Command { foreach ($item in (Get-ChildItem C:\ -recurse -ea 0)) {$item} }
      TotalSeconds : 51.606183

      PS> Measure-Command { Get-ChildItem C:\ -recurse -ea 0 | %{$_}}
      TotalSeconds : 57.014471

      I also thing the memory use could be a huge point, but to be honest… I saw no difference and if you thing about it, this make sense. This data has to be store regardless of what you do…. I mean that is the point right? So unless you pipe to $null there is no difference in regards to mem usage.

    5. on 11 Aug 2008 at 5:02 amPoshoholic

      Not exactly. What if you’re retrieving all users from an organization that contains 50000 users or more. Do you like loading the data for 50000 users into memory? I’m not particularly fond of that idea. I prefer letting the Ldap engine use paging to have a specific number of users loaded in memory at a time (e.g. 1000) and letting the objects be cleaned up by the garbage collector when it needs to.

      Sure, if you know you don’t need to be concerned about memory and if you don’t care when your scripts return data but instead you simply want them to run as fast as possible, then the foreach statement will fit the bill nicely. I think that’s often not the case though. If you are working with very large data sets (not an edge case in PowerShell automation at all) or if you want to be able to see data in the console and take appropriate action more quickly (personally I often do this in PowerShell, retrieving a large data set that I want to scan for something and then Ctrl+C out of the operation so that I can do what I need to do; again, not an edge case), or if you are working in a product like PowerGUI where you want to see data as it is returned so that you can do something with it (not an edge case), or if you are taking action on data in a script and you simply want to be able to use some other product to work with items that have already had action taken against them (not an edge case either), then ForEach-Object is a better bet. :)


      Kirk Munro [MVP]
      Poshoholic
      http://poshoholic.com

    6. on 12 Aug 2008 at 11:29 amtshell

      It is funny that you bring up that exact scenario, as I would say that proves my point. In such case, you don’t need the information to output the screen (It will slow you down signficantly and it will fly by without providing any useful information.)

      The question is this?
      What are you doing with the 50k users and how fast do you want it done. Regardless if you use foreach-object or foreach statement you going to have to store them or what’s the point.

      In my environment I have 380k users… let’s take a look at my numbers

      PS D:\Scripts> measure-command {foreach($user in (Get-myADUser -Filter “samAccountName=*”)){$user.distinguishedName}}
      TotalSeconds : 28.9051588

      PS D:\Scripts> measure-command { Get-myADUser -Filter “samAccountName=*” | %{$_.distinguishedName}}
      TotalSeconds : 70.0647409

      It is clearly 2x faster using the foreach statement then foreach-object and if you do any further processing it could have an experientially negative effect on foreach-object.

    7. on 12 Aug 2008 at 2:05 pmPoshoholic

      This is wrong:

      “Regardless if you use foreach-object or foreach statement you going to have to store them or what’s the point.”

      If you use the foreach statement, you are loading the entire collection you are processing into memory *at one time*. If you use ForEach-Object, what is loaded in memory depends on things like page size for ldap queries as well as other things internal within the cmdlets being used as the source of the pipeline. But if you page size is 1000, then when you are processing those items in a ForEach-Object cmdlet process block, you have those 1000 items loaded in memory. Once the pipeline has finished processing that page, another page is fetched and the first page is left for garbage collection. So in this case, the entire collection is loaded into memory *in chunks*. That’s my point about memory usage.

      Notice when running the foreach statement vs ForEach-Object and showing the output in the console how there is a significant pause before the first item is displayed? That’s the collection loading into memory. You can measure that too by retrieving the current date before the call to foreach or ForEach-Object and then creating a New-Timespan with that date for the first item once you have it. That shows you the difference. Even for small collections, you’re often looking at much less than 1 second before you see the first object inside the ForEach-Object process block while you’re looking at several seconds (or much more, depending on the size of the collection) when you use the foreach statement.

      If you’re retrieving 350000 AD user objects, you’re likely not getting very many properties for those objects or you would likely face memory issues pretty quickly. See the thread below for relevant discussion about that for a user trying to retrieve 300000 users with Get-QADUser. They were retrieving all ldap properties and storing them in memory and wondering why it wouldn’t fit. You’ll see the same issue if you do that with the foreach statement, but if you use ForEach-Object you won’t run out of memory.

      http://www.vistax64.com/powershell/78140-powershell-memory-problem-agianst-ad-quest-get-qaduser.html

      And even if you have enough memory, the more you consume the slower things will likely get.

      My recommendation is still to consider what you are doing before blindly picking the foreach statement or the ForEach-Object cmdlet. Are you processing a very large data set and trying to keep memory down? Use ForEach-Object. Do you want output of the first objects you are outputting as quickly as possible, or do you want actions you’re going to take in your loop to start happening as quickly as possible? Again, use ForEach-Object. Are you working with a dataset that is already loaded in memory? The foreach statement is what you want. Are you running a script that won’t have too much memory overhead and that you want to run as fast as possible? Again, the foreach statement is what you want.

      Playing the performance card (which is always relative) so firmly in the foreach statement camp just isn’t considering all of the factors.

      This discussion reminds me I really have to try this in v2. The PowerShell Team indicated they were going to improve performance, and I’m wondering if the overall runtime difference between foreach and ForEach-Object in a pipeline will reduce.


      Kirk Munro [MVP]
      Poshoholic
      http://poshoholic.com

    8. on 12 Aug 2008 at 4:33 pmtshell

      BS: First, for clarification, I am all on board that foreach-object is process as you go. There is no question about that and if there were an issue where process as you go was in an issue, then I would suggest that. I have yet to see that happen.

      Kirk: If you use the foreach statement…

      BS: I get your point about memory usage, but you are missing mine. Just piping VLDS (very large dataset) to foreach-object or foreach(){} is moot. You are not going to be able view them (unless you have super speed reading.) So what do you do with it? You process it somehow and collect the results in a variable or output to a file. In such case you process then foreach(){} will absolutely blow away foreach-object speed wise. Will it take more memory during processing? Perhaps, but foreach-object will also have this hit… it is simply more gradual. Now, writing to a file, I am not sure about how the piping works between foreach-object and out-file and outside of the out-file, I have yet to find an scenario where foreach-object was even remotely useable for VLDS because of the expediential performance hit. I find pipelines are not simply useful in those scenarios. Now, in such case as your using where-object that is a different story.

      Kirk: Notice when running the foreach statement vs ForEach-Object and showing the output in the console how there is a significant pause before the first item is displayed…

      BS: I notice the delay and I understand the delay, but again… what is the purpose of getting output on the screen if it is unreadable. In a small dataset (that it is worth seeing the output) this timeframe is not perceivable.

      Kirk: If you’re retrieving 350000 AD user objects…

      BS: Another funny thing to bring up, this is a limitation of both the Quest tools and the SDS namespace because of the reliance on ADSI. IMO, anything that relies on ADSI is not meant for, nor will it be useful in a large environment. The only way I can get and process my users is using sds.protocols and controlling what gets return and how. With large environments you have be VERY mindful of memory usage and CPU processing, ADSI does neither.

      KirK: My recommendation is still to consider what you are doing before blindly picking the foreach statement or the ForEach-Object cmdlet.
      BS: Agreed, but that doesn’t change the point of my post :)

    9. on 12 Aug 2008 at 6:35 pmtojo2000

      Does PowerShell do any kind of parallel processing as far as executing the initial command and passing objects down the pipe? When it passes an object down from the initial command, does it wait for it to finish, or does it immediately continue populating the input pipeline? If so, then it sounds off the top of my head like you would get a larger performance boost from operations that require more processing in the foreach-object part of the pipeline since it can begin processing without waiting for the pipeline, particularly if the initial command takes a long time to execute.

    10. on 12 Aug 2008 at 6:39 pmPoshoholic

      But, but, but! :)

      “Will it take more memory during processing? Perhaps, but foreach-object will also have this hit…”

      ForEach-Object actually won’t have the hit because the garbage collector can come along and clean up what is marked for collection, and when the objects going down a pipeline to ForEach-Object are finished with, they get marked for garbage collection. Look at this example:

      The following two commands are essentially equivalent in terms of memory, except the in the second one the collection won’t get marked for garbage collection until $collection goes out of scope.

      foreach ($item in (Get-SomeData)) { … }
      foreach ($item in ($collection = Get-SomeData)) { … }

      But these two commands are not equivalent:

      Get-SomeData | ForEach-Object { … }
      ($collection = Get-SomeData) | ForEach-Object { … }

      Why? Because ForEach-Object doesn’t add the overhead of loading the entire collection into memory unless you tell it to do so, so objects that are finished with can be cleaned up by the garbage collector earlier while it is processing more objects. In foreach this isn’t the case at all. In this exact scenario, where you could be pipelining a VLDS, ForEach-Object is very useful because it allows you to have a script that works on a VLDS without worrying about it causing out-of-memory issues on the machine, allowing you to do more than just run that script. That kind of reliability is great for scheduled tasks that will be run unattended, even if it means they take longer to complete. If you know you won’t hit memory issues, you can use foreach for these scheduled tasks. But if you don’t know for sure and don’t want to find out the hard way, ForEach-Object is best.

      Btw, I only mentioned the visible delay in output before data starts appearing to visually illustrate the delay in loading the data set into memory. I wasn’t insinuating that you should show data output unnecessarily.

      Back to you. :)

      Kirk Munro [MVP]
      Poshoholic
      http://poshoholic.com

    11. on 13 Aug 2008 at 6:47 amtshell

      @ tojo, Powershell is strictly serial… no parallel processing at all. It will pass information down the pipe as it goes so I guess in that aspect is does to some parallel processing. Sounds like a good blog post :)

      @ Kirk
      You are continuing to argue a point I agree with. So for the record, We all agree with the following
      1) foreach-object is process and “cleanup” as you go
      2) foreach(){} is collect and process. Cleanup at the end.
      We all understand this and it can be put in the “we know this” pile.

      Now… our difference is in application. I am saying there is not a pragmatic difference in application, and therefore the difference is moot.
      Why do I say that?
      1) foreach-object is not very useful for VLDS by itself, meaning you have to do something with the data you collect or whats the point.
      2) This argument is moot in small datasets.

    12. on 20 Aug 2008 at 11:13 amPoshoholic

      FYI, I didn’t intentionally leave this dangling, I just got busy.

      Anyhow, thanks for the discussion. It forced me to re-think some things I was doing. There are a few cases where I was using ForEach-Object where I don’t care about getting output faster, so I’ve started switching those over to foreach. Most of the time though I care about getting output faster, so I’ll likely stick with ForEach-Object in the majority of my work and use foreach where optimizing the overall execution time is beneficial towards the end result.

      ‘Nuff said.


      Kirk Munro [MVP]
      Poshoholic
      http://poshoholic.com

    Trackback this post | Feed on Comments to this post

    Leave a Reply

    You must be logged in to post a comment.