views:

1431

answers:

5

Hello friends This is my code

NSURL *url=[NSURL URLWithString:@"http://www.engadget.com"];
NSString *webPage=[[NSString alloc]initWithContentsOfURL:url
                          encoding:NSUTF8StringEncoding error:nil];

In the webPage string i got an html page of the link... In that string there is lot of tags and text... I want to take only body of the text without any tag... I want to display that text into my UITextView...

Can anyone help me i searched lot... If u have sample code means post here

Thanks in advance ...

+3  A: 

Is this what you are looking for?

Remove HTML Tags From an NSString on the iPhone

Nick Stamas
A: 
#include <libxml2/libxml/xmlmemory.h>
#include <libxml2/libxml/HTMLparser.h>

@implementation NSString (FlattenHTML)

static void charactersParsed(void* context,
      const xmlChar* ch, int len)
/*" Callback function for stringByStrippingHTML. "*/
{
  NSMutableString* result = context;
  NSString* parsedString;
  parsedString = [[NSString alloc] initWithBytesNoCopy:
      (xmlChar*) ch length: len encoding:
      NSUTF8StringEncoding freeWhenDone: NO];
  [result appendString: parsedString];
  [parsedString release];
}

/* GCS: custom error function to ignore errors */
static void structuredError(void * userData,
      xmlErrorPtr error)
{
   /* ignore all errors */
   (void)userData;
   (void)error;
}

- (NSString*) flattenHTML
/*" Interpretes the receiver als HTML, removes all tags
    and returns the plain text. "*/
{
  int mem_base = xmlMemBlocks();
  NSMutableString* result = [NSMutableString string];
  xmlSAXHandler handler; bzero(&handler,
      sizeof(xmlSAXHandler));
  handler.characters = &charactersParsed;

  /* GCS: override structuredErrorFunc to mine so
      I can ignore errors */
  xmlSetStructuredErrorFunc(xmlGenericErrorContext,
      &structuredError);

  htmlSAXParseDoc((xmlChar*)[self UTF8String], "utf-8",
      &handler, result);

  if (mem_base != xmlMemBlocks()) {
    NSLog( @"Leak of %d blocks found in htmlSAXParseDoc",
      xmlMemBlocks() - mem_base);
  }
  return result;
}

How could i use that code ... Its make lots of error Where could i give my webpage string...

A: 

This is the best answer and is exactly what you are looking for ...

Write the following script in the webView delegate method. ( UIWebviewdidfinishLoading)

NSString *myText = [webView stringByEvaluatingJavaScriptFromString:@"document.documentElement.textContent"];

Biranchi
A: 

From what I tried, this did the job best. Even though the NSSCanner is not the smaerter solution for this, if the html/xml is well formed you should be fine.

Dimitris
A: 

Better Solution:

  • (NSString *)flattenHTML:(NSString *)html {

    NSScanner *theScanner; NSString *text = nil;

    theScanner = [NSScanner scannerWithString:html];

    while ([theScanner isAtEnd] == NO) {

    // find start of tag
    [theScanner scanUpToString:@"<" intoString:NULL] ; 
    
    
    // find end of tag
    [theScanner scanUpToString:@">" intoString:&text] ;
    
    
    // replace the found tag with a space
    //(you can filter multi-spaces out later if you wish)
    html = [html stringByReplacingOccurrencesOfString:
                       [ NSString stringWithFormat:@"%@>", text]
                 withString:@" "];
    

    } // while //

    return html;

}

Jimit
This would appear to be a direct copy and paste from this blog post: http://rudis.net/content/2009/01/21/flatten-html-content-ie-strip-tags-cocoaobjective-cYou should provide attribution when copying someone else's code.
Nick Forge
my bad. I was in train when I wrote it.
Jimit