views:

205

answers:

3

I have a web application that lets users upload entire .html files to my server. I wish to 'detect' the width/height of the uploaded html and store it in my DB.

So far, I have unsuccessfully tried using the System.Windows.Forms.WebBrowser control - by reading the file into a string, loading it into the browser.document:

 _browser = new WebBrowser();
 _browser.Navigate("about:Blank");
 _browser.Document.OpenNew(true);
 _browser.Document.Write(html);

Inspecting the various properties of the _browser object (document, window etc) seems to always default the size to 250x250.

I've tried putting various css size declarations in the .html file and still the same thing.

Is the only option to inspect the html string and regex match CSS properties? How would you reliably determine what the rendered width/height would be of the document in question? Remember, the .html file may or may not contain css properties. Maybe the user uses older, deprecated tags such as

<body width="500">

vs

<style>
 body{ width: 400px; }
<body>

etc.

+1  A: 

You will not be able to find the dimensions using regular expressions - remember that there might not be any, in which case you'd have to manually measure the elements in the document, requiring a complete HTML renderer.

Doing it with Interhet Explorer raises security concerns; make sure that IE is always kept up to date on your server, and that its security settings in the ASP .Net account are as tight as possible. (I'm not sure how to do that)

Try _browser.Document.Body.OffsetRectangle.Size.

EDIT: Note that, ass other people have pointed out, the height will also depend on the width, because of text wrapping, etc, so you should set the width of the IE control to an appropiate value.

SLaks
+2  A: 

Even if you could capture the declared width through inspection of CSS and/or HTML tag specifications, you'd be unlikely to get the rendered width. Height will be even worse, since text wraps.

I think you may want to consider a different approach. Do you really need this? What requirement are you trying to satisfy? Can it be done in a different way?

Randolpho
Good points, and some web pages will even adjust the layout dynamically to the width of the canvas. How would you handle that?
0xA3
+1  A: 

As you've discovered, you won't be able to use a WebBrowser control because the height and width reported are the height and width of the control itself, not the document inside the control.

What you'd really need to do is write your own HTML parsing engine to calculate this out on your own. You would need to calculate out all of the lines, figure out the line height, etc.

Is this really worth the effort? You would need to make so many assumptions that such a calculation would be pretty much worthless... Differences in rendering by different browsers, customers that have their text size set to something other than the default, and probably dozens of others. Even the screen resolution would matter because, as you can see in this paragraph, text tends to wrap. You need to calculate where the text will wrap in order to calculate how many lines of text will show up. You need to factor in font sizes...

All of that said, in theory this should be doable, and the mechanics for calculating this all out would be the same concepts you would use for printing to a printer. Calculating the page height, and figuring out where you are on the page is all standard operating procedure when printing manually.

Here's an article that explains the basics. It'll be up to you to see if it's worth the effort.

http://msdn.microsoft.com/en-us/magazine/cc188767.aspx

David Stratton