views:

161

answers:

2

Last time I got confused by the way PowerShell eagerly unrolls collections, Keith summarized its heuristic like so:

Putting the results (an array) within a grouping expression (or subexpression e.g. $()) makes it eligible again for unrolling.

I've taken that advice to heart, but still find myself unable to explain a few esoterica. In particular, the Format operator doesn't seem to play by the rules.

$lhs = "{0} {1}"

filter Identity { $_ }
filter Square { ($_, $_) }
filter Wrap { (,$_) }
filter SquareAndWrap { (,($_, $_)) }

$rhs = "a" | Square        
# 1. all succeed
$lhs -f $rhs
$lhs -f ($rhs)
$lhs -f $($rhs)
$lhs -f @($rhs)

$rhs = "a" | Square | Wrap       
# 2. all succeed
$lhs -f $rhs
$lhs -f ($rhs)
$lhs -f $($rhs)
$lhs -f @($rhs)

$rhs = "a" | SquareAndWrap       
# 3. all succeed
$lhs -f $rhs
$lhs -f ($rhs)
$lhs -f $($rhs)
$lhs -f @($rhs)

$rhs = "a", "b" | SquareAndWrap       
# 4. all succeed by coercing the inner array to the string "System.Object[]"
$lhs -f $rhs
$lhs -f ($rhs)
$lhs -f $($rhs)
$lhs -f @($rhs)

"a" | Square | % {
    # 5. all fail
    $lhs -f $_
    $lhs -f ($_)
    $lhs -f @($_)
    $lhs -f $($_)            
}

"a", "b" | Square | % {
    # 6. all fail
    $lhs -f $_
    $lhs -f ($_)
    $lhs -f @($_)
    $lhs -f $($_)            
}

"a" | Square | Wrap | % {
    # 7. all fail
    $lhs -f $_
    $lhs -f ($_)
    $lhs -f @($_)
    $lhs -f $($_)            
}

"a", "b" | Square | Wrap | % {
    # 8. all fail
    $lhs -f $_
    $lhs -f ($_)
    $lhs -f @($_)
    $lhs -f $($_)            
}

"a" | SquareAndWrap | % {
    # 9. only @() and $() succeed
    $lhs -f $_
    $lhs -f ($_)
    $lhs -f @($_)
    $lhs -f $($_)            
}

"a", "b" | SquareAndWrap | % {
    # 10. only $() succeeds
    $lhs -f $_
    $lhs -f ($_)
    $lhs -f @($_)
    $lhs -f $($_)            
}

Applying the same patterns we saw in the previous question, it's clear why cases like #1 and #5 behave different: the pipeline operator signals the script engine to unroll another level, while the assignment operator does not. Put another way, everything that lies between two |'s is treated as a grouped expression, just as if it were inside ()'s.

# all of these output 2
("a" | Square).count                       # explicitly grouped
("a" | Square | measure).count             # grouped by pipes
("a" | Square | Identity).count            # pipe + ()
("a" | Square | Identity | measure).count  # pipe + pipe

For the same reason, case #7 is no improvement over #5. Any attempt to add an extra Wrap will be immediately subverted by the extra pipe. Ditto #8 vs #6. A little frustrating, but I'm totally on board up to this point.

Remaining questions:

  • Why doesn't case #3 suffer the same fate as #4? $rhs should hold the nested array (,("a", "a")) but its outer level is getting unrolled...somewhere...
  • What's going on with the various grouping operators in #9-10? Why do they behave so erratically, and why are they needed at all?
  • Why don't the failures in case #10 degrade gracefully like #4 does?
+1  A: 

Neither Square not Wrap will do what you're trying in #'s 5 and 7. Regardless of whether you put an array within a grouping expression () as you do in Square or you use the comma operator as you do in Wrap, when you use these functions in the pipeline their output is unrolled as it is feed to the next pipeline stage one at a time. Similarly in 6 and 8, it doesn't matter that you pipe in multiple objects, both Square and Wrap will feed them out one at a time to your foreach stage.

Cases 9 and 10 seem to indicate a bug in PowerShell. Take this modified snippet and try it:

"a" | SquareAndWrap | % {    
    # 9. only @() and $() succeed  
    $_.GetType().FullName
    $_.Length
    $lhs -f [object[]]$_
    $lhs -f [object[]]($_)    
    $lhs -f @($_)   
    $lhs -f $($_)            
}

It works. It also shows that the foreach alreadyd receives an object[] size 2 so $_ should work without casting to [object[]] or wrapping in a subexpression or array subexpression. We have seen some V2 bugs related to psobjects not unwrapping correctly and this appears to be another instance of that. If you unwrap the psobject manually it works e.g. $_.psobject.baseobject.

I "think" what you are shooting for in Wrap is this:

function Wrap2 { Begin {$coll = @();} Process {$coll += $_} End {,$coll} }

This will accumulate all pipeline input and then output it as a single array. This will work for case 8 but you still need to cast to [object[]] on the first two uses of the -f operator.

BTW the parens in both Square and Wrap and the outer parens in SquareAndWrap are unnecessary.

Keith Hill
Cool. I'll file a Connect bug.
Richard Berg
+1  A: 

Well, there's a bug in that for sure. (I just wrote up a page on the PoshCode Wiki about it yesterday, actually, and there's a bug on connect).

Answers first, more questions later:

To get consistent behavior from arrays with the -f string formatting, you're going to need to make 100% sure they are PSObjects. My suggestion is to do that when assigning them. It is supposed to be done automatically by PowerShell, but for some reason isn't done until you access a property or something (as documented in that wiki page and bug). E.g.( <##> is my prompt):

<##> $a = 1,2,3
<##> "$a"
1 2 3

<##> $OFS = "-"  # Set the Output field separator
<##> "$a"
1-2-3

<##> "{0}" -f $a
1 

<##> $a.Length
3 

<##> "{0}" -f $a
1-2-3

# You can enforce correct behavior by casting:
<##> [PSObject]$b = 1,2,3
<##> "{0}" -f $a
1-2-3

Note that when you've done that, they WILL NOT be unrolled when passing to -f but rather would be output correctly -- the way they would be if you placed the variable in the string directly.

Why doesn't case #3 suffer the same fate as #4? $rhs should hold the nested array (,("a", "a")) but its outer level is getting unrolled...somewhere...

The simple version of the answer is that BOTH #3 and #4 are getting unrolled. The difference is that in 4, the inner contents are an array (even after the outer array is unrolled):

$rhs = "a" | SquareAndWrap
$rhs[0].GetType()  # String

$rhs = "a","b" | SquareAndWrap
$rhs[0].GetType()  # Object[]

What's going on with the various grouping operators in #9-10? Why do they behave so erratically, and why are they needed at all?

As I said earlier, an array should count as a single parameter to the format and should be output using PowerShell's string-formatting rules (ie: separated by $OFS) just as it would if you put $ into the string directly* ... therefore, when PowerShell is behaving correctly, $lhs -f $rhs will fail if $lhs contains two place holders.

Of course, we've already observed that there's a bug in it.

I don't see anything erratic, however: @() and $() work the same for 9 and 10 as far as I can see (the main difference, in fact, is caused by the way the ForEach unrolls the top level array:

> $rhs = "a", "b" | SquareAndWrap
> $rhs | % { $lhs -f @($_); " hi " }
a a
 hi 
b b
 hi 

> $rhs | % { $lhs -f $($_); " hi " }
a a
 hi 
b b
 hi     

# Is the same as:
> [String]::Format( "{0} {1}", $rhs[0] ); " hi "
a a
 hi 

> [String]::Format( "{0} {1}", $rhs[1] ); " hi "
b b
 hi

So you see the bug is that @() or $() will cause the array to be passed as [object[]] to the string format call instead of as a PSObject which has special to-string values.

Why don't the failures in case #10 degrade gracefully like #4 does?

This is basically the same bug, in a different manifestation. Arrays should never come out as "System.Object[]" in PowerShell unless you manually call their native .ToString() method, or pass them to String.Format() directly ... the reason they do in #4 is that bug:PowerShell has failed to extend them as PSOjbects before passing them to the String.Format call.

You can see this if you access a property of the array before passing it in, or cast it to PSObject as in my original exampels. Technically, the errors in #10 are the correct output: you're only passing ONE thing (an array) to string.format, when it expected TWO things. If you changed your $lhs to just "{0}" you would see the array formatted with $OFS


I wonder though, which behavior do you like and which do you think is correct, considering my first example? I think the $OFS-separated output is correct, as opposed to unrolling the array as happens if you @(wrap) it, or cast it to [object[]] (Incidentally, note what happens if you cast it to [int[]] is a different buggy behavior):

> "{0}" -f [object[]]$a
1

> "{0}, {1}" -f [object[]]$a  # just to be clear...
1,2

>  "{0}, {1}" -f [object[]]$a, "two"  # to demonstrate inconsistency
System.Object[],two

> "{0}" -f [int[]]$a
System.Int32[]

I'm sure lots of scripts have been written unknowingly taking advantage of this bug, but it still seems pretty clear to me that the unrolling that's happening in the just to be clear example is NOT the correct behavior, but is happening because, on the call (inside PowerShell's core) to the .Net String.Format( "{0}", a ) ... $a is an object[] which is what String.Format expected as it's Params parameter...

I think that has to be fixed. If there's any desire to keep the "functionality" of unrolling the array it should be done using the @ splatting operator, right?

Jaykul
Great summary - marked as answer even though I disagree with your opinion. I think "{0} {1}" -f "a", "b" should be equivalent to $arr="a","b"; "{0} {1}" -f $arr. That is, the RHS of the -f operator should be a [params object[]], to mix C# and PS lingo. I don't see why $arr should be converted to a string using OFS, unless you explicitly quote it on the RHS of the operator.
Richard Berg
Well, the main reason it should be converted to a string is that that's what string formatting is supposed to do. That is, if you don't use formatting codes like { "{0:X}" -f 42 } then string formatting is supposed to behave like casting the thing to string. That's how .Net's string formatting works. Changing the rules for PowerShell is confusing. Of course, they've already changed it with $OFS ... but changing it AGAIN would mean that arrays would output at least three different ways, and you'd never know which. For instance, what should happen if I do: { "{0}{1}" -f $arr,"hello" }?
Jaykul
Honestly, part of me agrees with you, because it's convenient to be able to do that! But I want it to only happen explicitly so I can do BOTH of these: `{ $a = 1,"+",2; "{0} = {1}" -f $a,"three"; "{2}-{0}={3}" -f @a,"one" }` and have them come out as: "1 + 2 = three" and "2-1=one" ... right now you CAN do that, but only if you're crazy like a fox: `{ $a = 1,"+",2; "{0} = {1}" -f [PSObject]$a,"three"; "{2}-{0}={3}" -f [object[]]($a+"one") }` and THAT is just unknowable.
Jaykul
Actually, this works too: `[PSObject]$a = 1,"+",2; "{0} = {1}" -f $a,"three"; "{2}-{0}={3}" -f @($a+"one")` but it's only slightly less mysterious...
Jaykul
I'd settle for either, so long as you can switch modes easily (e.g. adding the splat operator). And so long as they give us some consistency and bona fide documentation! :)
Richard Berg