views:

1041

answers:

4

I've got a situation where I need to generate a class with a large string const. Code outside of my control causes my generated CodeDom tree to be emitted to C# source and then later compiled as part of a larger Assembly.

Unfortunately, I've run into a situation whereby if the length of this string exceeds 335440 chars in Win2K8 x64 (926240 in Win2K3 x86), the C# compiler exits with a fatal error:

fatal error CS1647: An expression is too long or complex to compile near 'int'

MSDN says CS1647 is "a stack overflow in the compiler" (no pun intended!). Looking more closely I've determined that the CodeDom "nicely" wraps my string const at 80 chars.This causes the compiler to concatenate over 4193 string chunks which apparently is the stack depth of the C# compiler in x64 NetFx. CSC.exe must internally recursively evaluate this expression to "rehydrate" my single string.

My initial question is this: "does anyone know of a work-around to change how the code generator emits strings?" I cannot control the fact that the external system uses C# source as an intermediate and I want this to be a constant (rather than a runtime concatenation of strings).

Alternatively, how can I formulate this expression such that after a certain number of chars, I am still able to create a constant but it is composed of multiple large chunks?

Full repro is here:

// this string breaks CSC: 335440 is Win2K8 x64 max, 926240 is Win2K3 x86 max
string HugeString = new String('X', 926300);

CodeDomProvider provider = CodeDomProvider.CreateProvider("C#");
CodeCompileUnit code = new CodeCompileUnit();

// namespace Foo {}
CodeNamespace ns = new CodeNamespace("Foo");
code.Namespaces.Add(ns);

// public class Bar {}
CodeTypeDeclaration type = new CodeTypeDeclaration();
type.IsClass = true;
type.Name = "Bar";
type.Attributes = MemberAttributes.Public;
ns.Types.Add(type);

// public const string HugeString = "XXXX...";

CodeMemberField field = new CodeMemberField();
field.Name = "HugeString";
field.Type = new CodeTypeReference(typeof(String));
field.Attributes = MemberAttributes.Public|MemberAttributes.Const;
field.InitExpression = new CodePrimitiveExpression(HugeString);
type.Members.Add(field);

// generate class file
using (TextWriter writer = File.CreateText("FooBar.cs"))
{
    provider.GenerateCodeFromCompileUnit(code, writer, new CodeGeneratorOptions());
}

// compile class file
CompilerResults results = provider.CompileAssemblyFromFile(new CompilerParameters(), "FooBar.cs");

// output reults
foreach (string msg in results.Output)
{
    Console.WriteLine(msg);
}

// output errors
foreach (CompilerError error in results.Errors)
{
    Console.WriteLine(error);
}
+2  A: 

So am I right in saying you've got the C# source file with something like:

public const HugeString = "xxxxxxxxxxxx...." +
    "yyyyy....." +
    "zzzzz.....";

and you then try to compile it?

If so, I would try to edit the text file (in code, of course) before compiling. That should be relatively straightforward to do, as presumably they'll follow a rigidly-defined pattern (compared with human-generated source code). Convert it to have a single massive line for each constant. Let me know if you'd like some sample code to try this.

By the way, your repro succeeds with no errors on my box - which version of the framework are you using? (My box has the beta of 4.0 on, which may affect things.)

EDIT: How about changing it to not be a string constant? You'd need to break it up yourself, and emit it as a public static readonly field like this:

public static readonly HugeString = "xxxxxxxxxxxxxxxx" + string.Empty +
    "yyyyyyyyyyyyyyyyyyy" + string.Empty +
    "zzzzzzzzzzzzzzzzzzz";

Crucially, string.Empty is a public static readonly field, not a constant. That means the C# compiler will just emit a call to string.Concat which may well be okay. It'll only happen once at execution time of course - slower than doing it at compile-time, but it may be an easier workaround than anything else.

Jon Skeet
The runtime is .NET 3.5 but I'm not sure if it executes the 2.0 csc.exe or newer when it actually compiles the code. I bumped the string size in the repro so that it should fail in more circumstances. If it still succeeds, then either 4.0 increased the stack depth or it is more machine dependent of a value than I suspected.Yes editing the file would do it but unfortunately my code is being called only to return the CodeDom tree. The external code determines where and when the intermediate files are emitted / compiled.
McKAMEY
Ah. Okay, editing with odd idea.
Jon Skeet
I'd vote you up but apparently I don't participate enough on SO.Interesting. CodeDom isn't full C# so I can't actually emit readonly, but taking off the concat does allow it to compile. Now I need to see if this just pushes the overflow into runtime.
McKAMEY
Your suggestion gave me an idea though: when I said I can't emit readonly, I think that I *can* emit a direct string of C# source from CodeDom to emit "readonly". If I could do that, then I could go back to a simple const, but just emit as a single line of C#. Trying that...
McKAMEY
A lack of "readonly" does seem very strange. However, this blog article seems to confirm that:http://blogs.msdn.com/bclteam/archive/2005/03/16/396915.aspx
Jon Skeet
Escaping the string shouldn't be too hard if you use a verbatim string literal (i.e. put @ at the start) - then the *only* thing you have to do is replace " with "".
Jon Skeet
McKAMEY
A: 

I have no idea how to change the behavior of the code generator, but you can change the stack size that the compiler uses with the /stack option of EditBin.EXE.

Example:

editbin /stack:100000,1000 csc.exe <options>

Following is an example of its use:

class App 
{
    private static long _Depth = 0;

    // recursive function to blow stack
    private static void GoDeep() 
    {
        if ((++_Depth % 10000) == 0) System.Console.WriteLine("Depth is " +
            _Depth.ToString());
        GoDeep();
    return;
    }

    public static void Main() {
        try 
        {
            GoDeep();
        } 
        finally 
        {
        }

        return;
    }
}




editbin /stack:100000,1000 q.exe
Depth is 10000
Depth is 20000

Unhandled Exception: StackOverflowException.

editbin /stack:1000000,1000 q.exe
Depth is 10000
Depth is 20000
Depth is 30000
Depth is 40000
Depth is 50000
Depth is 60000
Depth is 70000
Depth is 80000

Unhandled Exception: StackOverflowException.
Robert Harvey
Interesting suggestion. Unfortunately where I'm being called I don't have access to the csc.exe directly. Ideally I'd like to never have to question again if the string was too long. This work-around would require me to keep bumping the stack size as the string grew.
McKAMEY
+1  A: 

Using a CodeSnippetExpression and a manually quoted string, I was able to emit the source that I would have liked to have seen from Microsoft.CSharp.CSharpCodeGenerator.

So to answer the question above, replace this line:

field.InitExpression = new CodePrimitiveExpression(HugeString);

with this:

field.InitExpression = new CodeSnippetExpression(QuoteSnippetStringCStyle(HugeString));

And finally modify the private string quoting Microsoft.CSharp.CSharpCodeGenerator.QuoteSnippetStringCStyle method to not wrap after 80 chars:

private static string QuoteSnippetStringCStyle(string value)
{
    // CS1647: An expression is too long or complex to compile near '...'
    // happens if number of line wraps is too many (335440 is max for x64, 926240 is max for x86)

    // CS1034: Compiler limit exceeded: Line cannot exceed 16777214 characters
    // theoretically every character could be escaped unicode (6 chars), plus quotes, etc.

    const int LineWrapWidth = (16777214/6) - 4;
    StringBuilder b = new StringBuilder(value.Length+5);

    b.Append("\r\n\"");
    for (int i=0; i<value.Length; i++)
    {
     switch (value[i])
     {
      case '\u2028':
      case '\u2029':
      {
       int ch = (int)value[i];
       b.Append(@"\u");
       b.Append(ch.ToString("X4", CultureInfo.InvariantCulture));
       break;
      }
      case '\\':
      {
       b.Append(@"\\");
       break;
      }
      case '\'':
      {
       b.Append(@"\'");
       break;
      }
      case '\t':
      {
       b.Append(@"\t");
       break;
      }
      case '\n':
      {
       b.Append(@"\n");
       break;
      }
      case '\r':
      {
       b.Append(@"\r");
       break;
      }
      case '"':
      {
       b.Append("\\\"");
       break;
      }
      case '\0':
      {
       b.Append(@"\0");
       break;
      }
      default:
      {
       b.Append(value[i]);
       break;
      }
     }

     if ((i > 0) && ((i % LineWrapWidth) == 0))
     {
      if ((Char.IsHighSurrogate(value[i]) && (i < (value.Length - 1))) && Char.IsLowSurrogate(value[i + 1]))
      {
       b.Append(value[++i]);
      }
      b.Append("\"+\r\n");
      b.Append('"');
     }
    }
    b.Append("\"");
    return b.ToString();
}
McKAMEY
Thanks to Jon Skeet for the discussion which prompted this solution to come to mind. Also thanks to Robert Harvey for thinking outside the box.
McKAMEY
Another csc.exe limitation to consider when choosing to not wrap string constants: "error CS1034: Compiler limit exceeded: Line cannot exceed 16777214 characters" Apparently what's needed is a hybrid: wrap with really long chunk sizes.
McKAMEY
This answer allows *many* orders of magnitude longer string lengths (read: hundreds of millions of chars). Stress testing has shown that machine memory limits become the new bounding size.
McKAMEY
+1  A: 

Note that if you declare the string as const, it will be copied in each assembly that uses this string in its code.

You may be better off with static readonly.

Another way would be to declare a readonly property that returns the string.

codymanix
This is interesting. I haven't heard of this. What constitutes "using this string"? Are you meaning when another assembly references the constant member of the generated class? Wouldn't I be able to see the copied constant in the other assembly with Reflector? What I am actually doing in my code is satisfying an interface that is being implemented by returning this constant in a property getter. I am pretty sure the compiler would not be able to know that it was always going to return the constant to be able to turn around and embed it in the referencing assembly. Where can I find more info?
McKAMEY
if you call Console.WriteLine(MyClass.HugeString) and look into reflector then you willk only see Console.WriteLine("blah blah blubb.."), the reference is gone. const are compile time constants similar (but different) to define in c++. With reaadonly this is not the case. google for "const vs readonly" to find more information or read the c# language specs.
codymanix
Thanks for the heads up and clarification. I can see how the compiler's constant folding wouldn't be able to cross assembly boundaries so that makes sense.I think I'll be okay in this case since the example code here is simplified from what I'm actually doing. I'm actually building up a string literal to return it from a property with only a getter:`property.GetStatements.Add(new CodeMethodReturnStatement(new CodeSnippetExpression(QuoteSnippetStringCStyle(str))));`
McKAMEY