views:

308

answers:

2

Hi,

I've built an iPhone application using the parsing code from the TopSongs sample iPhone application. I've hit a problem though - the feed I'm trying to parse data from doesn't have a separate field for every piece of information (i.e. if it was for a feed about dogs, all the information such as dog type, dog age and dog price is contained in the feed. However, the TopSongs app relies on information having its own tags, so instead of using it uses and .

So my question is this. How do I extract this information from the description field so that it can be parsed using the TopSongs parser? Can you somehow extract the dog age, price and type information using Yahoo Pipes and use that RSS feed for the feed? Or is there code that I can add to do it in application?

Update: To view the code of my application parser (based on the TopSongs Core Data Apple provided application, see below.

Here's a sample of one item from the the actual RSS feed I'm using (the description is longer, and has status,size, and a couple of other fields, but they're all formatted the same.:

<item>
<title>MOE, MARGRET STREET</title>
<description> <b>District/Region:</b>&nbsp;REGION 09</br><b>Location:</b>&nbsp;MOE</br><b>Name:</b>&nbsp;MARGRET STREET</br></description>
<pubDate>Thu,11 Mar 2010 05:43:03 GMT</pubDate>
<guid>1266148</guid>
</item>



   /*
     File: iTunesRSSImporter.m
 Abstract: Downloads, parses, and imports the iTunes top songs RSS feed into Core Data.
  Version: 1.1

 Disclaimer: IMPORTANT:  This Apple software is supplied to you by Apple
 Inc. ("Apple") in consideration of your agreement to the following
 terms, and your use, installation, modification or redistribution of
 this Apple software constitutes acceptance of these terms.  If you do
 not agree with these terms, please do not use, install, modify or
 redistribute this Apple software.

 In consideration of your agreement to abide by the following terms, and
 subject to these terms, Apple grants you a personal, non-exclusive
 license, under Apple's copyrights in this original Apple software (the
 "Apple Software"), to use, reproduce, modify and redistribute the Apple
 Software, with or without modifications, in source and/or binary forms;
 provided that if you redistribute the Apple Software in its entirety and
 without modifications, you must retain this notice and the following
 text and disclaimers in all such redistributions of the Apple Software.
 Neither the name, trademarks, service marks or logos of Apple Inc. may
 be used to endorse or promote products derived from the Apple Software
 without specific prior written permission from Apple.  Except as
 expressly stated in this notice, no other rights or licenses, express or
 implied, are granted by Apple herein, including but not limited to any
 patent rights that may be infringed by your derivative works or by other
 works in which the Apple Software may be incorporated.

 The Apple Software is provided by Apple on an "AS IS" basis.  APPLE
 MAKES NO WARRANTIES, EXPRESS OR IMPLIED, INCLUDING WITHOUT LIMITATION
 THE IMPLIED WARRANTIES OF NON-INFRINGEMENT, MERCHANTABILITY AND FITNESS
 FOR A PARTICULAR PURPOSE, REGARDING THE APPLE SOFTWARE OR ITS USE AND
 OPERATION ALONE OR IN COMBINATION WITH YOUR PRODUCTS.

 IN NO EVENT SHALL APPLE BE LIABLE FOR ANY SPECIAL, INDIRECT, INCIDENTAL
 OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
 SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
 INTERRUPTION) ARISING IN ANY WAY OUT OF THE USE, REPRODUCTION,
 MODIFICATION AND/OR DISTRIBUTION OF THE APPLE SOFTWARE, HOWEVER CAUSED
 AND WHETHER UNDER THEORY OF CONTRACT, TORT (INCLUDING NEGLIGENCE),
 STRICT LIABILITY OR OTHERWISE, EVEN IF APPLE HAS BEEN ADVISED OF THE
 POSSIBILITY OF SUCH DAMAGE.

 Copyright (C) 2009 Apple Inc. All Rights Reserved.

 */

#import "iTunesRSSImporter.h"
#import "Song.h"
#import "Category.h"
#import "CategoryCache.h"
#import <libxml/tree.h>

// Function prototypes for SAX callbacks. This sample implements a minimal subset of SAX callbacks.
// Depending on your application's needs, you might want to implement more callbacks.
static void startElementSAX(void *context, const xmlChar *localname, const xmlChar *prefix, const xmlChar *URI, int nb_namespaces, const xmlChar **namespaces, int nb_attributes, int nb_defaulted, const xmlChar **attributes);
static void endElementSAX(void *context, const xmlChar *localname, const xmlChar *prefix, const xmlChar *URI);
static void charactersFoundSAX(void *context, const xmlChar *characters, int length);
static void errorEncounteredSAX(void *context, const char *errorMessage, ...);

// Forward reference. The structure is defined in full at the end of the file.
static xmlSAXHandler simpleSAXHandlerStruct;

// Class extension for private properties and methods.
@interface iTunesRSSImporter ()

@property BOOL storingCharacters;
@property (nonatomic, retain) NSMutableData *characterBuffer;
@property BOOL done;
@property BOOL parsingASong;
@property NSUInteger countForCurrentBatch;
@property (nonatomic, retain) Song *currentSong;
@property (nonatomic, retain) NSURLConnection *rssConnection;
@property (nonatomic, retain) NSDateFormatter *dateFormatter;
// The autorelease pool property is assign because autorelease pools cannot be retained.
@property (nonatomic, assign) NSAutoreleasePool *importPool;

@end

static double lookuptime = 0;

@implementation iTunesRSSImporter

@synthesize iTunesURL, delegate, persistentStoreCoordinator;
@synthesize rssConnection, done, parsingASong, storingCharacters, currentSong, countForCurrentBatch, characterBuffer, dateFormatter, importPool;

- (void)dealloc {
    [iTunesURL release];
    [characterBuffer release];
    [currentSong release];
    [rssConnection release];
    [dateFormatter release];
    [persistentStoreCoordinator release];
    [insertionContext release];
    [songEntityDescription release];
    [theCache release];
    [super dealloc];
}

- (void)main {
    self.importPool = [[NSAutoreleasePool alloc] init];
    if (delegate && [delegate respondsToSelector:@selector(importerDidSave:)]) {
        [[NSNotificationCenter defaultCenter] addObserver:delegate selector:@selector(importerDidSave:) name:NSManagedObjectContextDidSaveNotification object:self.insertionContext];
    }
    done = NO;
    self.dateFormatter = [[[NSDateFormatter alloc] init] autorelease];
    [dateFormatter setDateStyle:NSDateFormatterLongStyle];
    [dateFormatter setTimeStyle:NSDateFormatterNoStyle];
    // necessary because iTunes RSS feed is not localized, so if the device region has been set to other than US
    // the date formatter must be set to US locale in order to parse the dates
    [dateFormatter setLocale:[[[NSLocale alloc] initWithLocaleIdentifier:@"US"] autorelease]];
    self.characterBuffer = [NSMutableData data];
    NSURLRequest *theRequest = [NSURLRequest requestWithURL:iTunesURL];
    // create the connection with the request and start loading the data
    rssConnection = [[NSURLConnection alloc] initWithRequest:theRequest delegate:self];
    // This creates a context for "push" parsing in which chunks of data that are not "well balanced" can be passed
    // to the context for streaming parsing. The handler structure defined above will be used for all the parsing. 
    // The second argument, self, will be passed as user data to each of the SAX handlers. The last three arguments
    // are left blank to avoid creating a tree in memory.
    context = xmlCreatePushParserCtxt(&simpleSAXHandlerStruct, self, NULL, 0, NULL);
    if (rssConnection != nil) {
        do {
            [[NSRunLoop currentRunLoop] runMode:NSDefaultRunLoopMode beforeDate:[NSDate distantFuture]];
        } while (!done);
    }
    // Display the total time spent finding a specific object for a relationship
    NSLog(@"lookup time %f", lookuptime);
    // Release resources used only in this thread.
    xmlFreeParserCtxt(context);
    self.characterBuffer = nil;
    self.dateFormatter = nil;
    self.rssConnection = nil;
    self.currentSong = nil;
    [theCache release];
    theCache = nil;
    NSError *saveError = nil;
    NSAssert1([insertionContext save:&saveError], @"Unhandled error saving managed object context in import thread: %@", [saveError localizedDescription]);
    if (delegate && [delegate respondsToSelector:@selector(importerDidSave:)]) {
        [[NSNotificationCenter defaultCenter] removeObserver:delegate name:NSManagedObjectContextDidSaveNotification object:self.insertionContext];
    }
    if (self.delegate != nil && [self.delegate respondsToSelector:@selector(importerDidFinishParsingData:)]) {
        [self.delegate importerDidFinishParsingData:self];
    }
    [importPool release];
    self.importPool = nil;
}

- (NSManagedObjectContext *)insertionContext {
    if (insertionContext == nil) {
        insertionContext = [[NSManagedObjectContext alloc] init];
        [insertionContext setPersistentStoreCoordinator:self.persistentStoreCoordinator];
    }
    return insertionContext;
}

- (void)forwardError:(NSError *)error {
    if (self.delegate != nil && [self.delegate respondsToSelector:@selector(importer:didFailWithError:)]) {
        [self.delegate importer:self didFailWithError:error];
    }
}

- (NSEntityDescription *)songEntityDescription {
    if (songEntityDescription == nil) {
        songEntityDescription = [[NSEntityDescription entityForName:@"Song" inManagedObjectContext:self.insertionContext] retain];
    }
    return songEntityDescription;
}

- (CategoryCache *)theCache {
    if (theCache == nil) {
        theCache = [[CategoryCache alloc] init];
        theCache.managedObjectContext = self.insertionContext;
    }
    return theCache;
}

- (Song *)currentSong {
    if (currentSong == nil) {
        currentSong = [[Song alloc] initWithEntity:self.songEntityDescription insertIntoManagedObjectContext:self.insertionContext];
    }
    return currentSong;
}

#pragma mark NSURLConnection Delegate methods

// Forward errors to the delegate.
- (void)connection:(NSURLConnection *)connection didFailWithError:(NSError *)error {
    [self performSelectorOnMainThread:@selector(forwardError:) withObject:error waitUntilDone:NO];
    // Set the condition which ends the run loop.
    done = YES;
}

// Called when a chunk of data has been downloaded.
- (void)connection:(NSURLConnection *)connection didReceiveData:(NSData *)data {
    // Process the downloaded chunk of data.
    xmlParseChunk(context, (const char *)[data bytes], [data length], 0);
}

- (void)connectionDidFinishLoading:(NSURLConnection *)connection {
    // Signal the context that parsing is complete by passing "1" as the last parameter.
    xmlParseChunk(context, NULL, 0, 1);
    context = NULL;
    // Set the condition which ends the run loop.
    done = YES; 
}

#pragma mark Parsing support methods

static const NSUInteger kImportBatchSize = 20;

- (void)finishedCurrentSong {
    parsingASong = NO;
    self.currentSong = nil;
    countForCurrentBatch++;
    // Periodically purge the autorelease pool and save the context. The frequency of this action may need to be tuned according to the 
    // size of the objects being parsed. The goal is to keep the autorelease pool from growing too large, but 
    // taking this action too frequently would be wasteful and reduce performance.
    if (countForCurrentBatch == kImportBatchSize) {
        [importPool release];
        self.importPool = [[NSAutoreleasePool alloc] init];
        NSError *saveError = nil;
        NSAssert1([insertionContext save:&saveError], @"Unhandled error saving managed object context in import thread: %@", [saveError localizedDescription]);
        countForCurrentBatch = 0;
    }
}

/*
 Character data is appended to a buffer until the current element ends.
 */
- (void)appendCharacters:(const char *)charactersFound length:(NSInteger)length {
    [characterBuffer appendBytes:charactersFound length:length];
}

- (NSString *)currentString {
    // Create a string with the character data using UTF-8 encoding. UTF-8 is the default XML data encoding.
    NSString *currentString = [[[NSString alloc] initWithData:characterBuffer encoding:NSUTF8StringEncoding] autorelease];
    [characterBuffer setLength:0];
    return currentString;
}

@end

#pragma mark SAX Parsing Callbacks

// The following constants are the XML element names and their string lengths for parsing comparison.
// The lengths include the null terminator, to ensure exact matches.
static const char *kName_Item = "item";
static const NSUInteger kLength_Item = 5;
static const char *kName_Title = "title";
static const NSUInteger kLength_Title = 6;
static const char *kName_Category = "category";
static const NSUInteger kLength_Category = 9;
static const char *kName_Itms = "itms";
static const NSUInteger kLength_Itms = 5;
static const char *kName_Artist = "description";
static const NSUInteger kLength_Artist = 7;
static const char *kName_Album = "description";
static const NSUInteger kLength_Album = 6;
static const char *kName_ReleaseDate = "releasedate";
static const NSUInteger kLength_ReleaseDate = 12;

/*
 This callback is invoked when the importer finds the beginning of a node in the XML. For this application,
 out parsing needs are relatively modest - we need only match the node name. An "item" node is a record of
 data about a song. In that case we create a new Song object. The other nodes of interest are several of the
 child nodes of the Song currently being parsed. For those nodes we want to accumulate the character data
 in a buffer. Some of the child nodes use a namespace prefix. 
 */
static void startElementSAX(void *parsingContext, const xmlChar *localname, const xmlChar *prefix, const xmlChar *URI, 
                            int nb_namespaces, const xmlChar **namespaces, int nb_attributes, int nb_defaulted, const xmlChar **attributes) {
    iTunesRSSImporter *importer = (iTunesRSSImporter *)parsingContext;
    // The second parameter to strncmp is the name of the element, which we known from the XML schema of the feed.
    // The third parameter to strncmp is the number of characters in the element name, plus 1 for the null terminator.
    if (prefix == NULL && !strncmp((const char *)localname, kName_Item, kLength_Item)) {
        importer.parsingASong = YES;
    } else if (importer.parsingASong && ( (prefix == NULL && (!strncmp((const char *)localname, kName_Title, kLength_Title) || !strncmp((const char *)localname, kName_Category, kLength_Category))) || ((prefix != NULL && !strncmp((const char *)prefix, kName_Itms, kLength_Itms)) && (!strncmp((const char *)localname, kName_Artist, kLength_Artist) || !strncmp((const char *)localname, kName_Album, kLength_Album) || !strncmp((const char *)localname, kName_ReleaseDate, kLength_ReleaseDate))) )) {
        importer.storingCharacters = YES;
    }
}

/*
 This callback is invoked when the parse reaches the end of a node. At that point we finish processing that node,
 if it is of interest to us. For "item" nodes, that means we have completed parsing a Song object. We pass the song
 to a method in the superclass which will eventually deliver it to the delegate. For the other nodes we
 care about, this means we have all the character data. The next step is to create an NSString using the buffer
 contents and store that with the current Song object.
 */
static void endElementSAX(void *parsingContext, const xmlChar *localname, const xmlChar *prefix, const xmlChar *URI) {    
    iTunesRSSImporter *importer = (iTunesRSSImporter *)parsingContext;
    if (importer.parsingASong == NO) return;
    if (prefix == NULL) {
        if (!strncmp((const char *)localname, kName_Item, kLength_Item)) {
            [importer finishedCurrentSong];
        } else if (!strncmp((const char *)localname, kName_Title, kLength_Title)) {
            importer.currentSong.title = importer.currentString;
        } else if (!strncmp((const char *)localname, kName_Category, kLength_Category)) {
            double before = [NSDate timeIntervalSinceReferenceDate];
            Category *category = [importer.theCache categoryWithName:importer.currentString];
            double delta = [NSDate timeIntervalSinceReferenceDate] - before;
            lookuptime += delta;
            importer.currentSong.category = category;
        }
    } else if (!strncmp((const char *)prefix, kName_Itms, kLength_Itms)) {
        if (!strncmp((const char *)localname, kName_Artist, kLength_Artist)) {
            NSString *string = importer.currentSong.artist;
            NSArray *strings = [string componentsSeparatedByString: @", "];
            //importer.currentSong.artist = importer.currentString;
        } else if (!strncmp((const char *)localname, kName_Album, kLength_Album)) {
            importer.currentSong.album = importer.currentString;
        } else if (!strncmp((const char *)localname, kName_ReleaseDate, kLength_ReleaseDate)) {
            NSString *dateString = importer.currentString;
            importer.currentSong.releaseDate = [importer.dateFormatter dateFromString:dateString];
        }
    }
    importer.storingCharacters = NO;
}

/*
 This callback is invoked when the parser encounters character data inside a node. The importer class determines how to use the character data.
 */
static void charactersFoundSAX(void *parsingContext, const xmlChar *characterArray, int numberOfCharacters) {
    iTunesRSSImporter *importer = (iTunesRSSImporter *)parsingContext;
    // A state variable, "storingCharacters", is set when nodes of interest begin and end. 
    // This determines whether character data is handled or ignored. 
    if (importer.storingCharacters == NO) return;
    [importer appendCharacters:(const char *)characterArray length:numberOfCharacters];
}

/*
 A production application should include robust error handling as part of its parsing implementation.
 The specifics of how errors are handled depends on the application.
 */
static void errorEncounteredSAX(void *parsingContext, const char *errorMessage, ...) {
    // Handle errors as appropriate for your application.
    NSCAssert(NO, @"Unhandled error encountered during SAX parse.");
}

// The handler struct has positions for a large number of callback functions. If NULL is supplied at a given position,
// that callback functionality won't be used. Refer to libxml documentation at http://www.xmlsoft.org for more information
// about the SAX callbacks.
static xmlSAXHandler simpleSAXHandlerStruct = {
NULL,                       /* internalSubset */
NULL,                       /* isStandalone   */
NULL,                       /* hasInternalSubset */
NULL,                       /* hasExternalSubset */
NULL,                       /* resolveEntity */
NULL,                       /* getEntity */
NULL,                       /* entityDecl */
NULL,                       /* notationDecl */
NULL,                       /* attributeDecl */
NULL,                       /* elementDecl */
NULL,                       /* unparsedEntityDecl */
NULL,                       /* setDocumentLocator */
NULL,                       /* startDocument */
NULL,                       /* endDocument */
NULL,                       /* startElement*/
NULL,                       /* endElement */
NULL,                       /* reference */
charactersFoundSAX,         /* characters */
NULL,                       /* ignorableWhitespace */
NULL,                       /* processingInstruction */
NULL,                       /* comment */
NULL,                       /* warning */
errorEncounteredSAX,        /* error */
NULL,                       /* fatalError //: unused error() get all the errors */
NULL,                       /* getParameterEntity */
NULL,                       /* cdataBlock */
NULL,                       /* externalSubset */
XML_SAX2_MAGIC,             //
NULL,
startElementSAX,            /* startElementNs */
endElementSAX,              /* endElementNs */
NULL,                       /* serror */
};

Thanks.

+1  A: 

Pull out the description data and use the </br> string as a separator with the NSString method -componentsSeparatedByString:

From each NSString in that array, use : as a separator once again to recover the type, age and price.

Alex Reynolds
OK some of that makes sense to me other parts not so. I'm trying to get used to the code in the TopSongs parser - where would you use -componentsSeparatedByString in the parser? Where does it fit in in the parser (i.e. after you've defined the XML fields but before you store the information in core data?)
Graeme
You might need to fix the feed to use `\n` instead of `</br>` but, either way: Pull out the data between the `description` nodes like you do for all the other nodes. That description is basically one long string: `Dog Type: Border Collie\nDog Age: 11\nDog Price: $234\n`. Then apply the method I mentioned to split this string with the `\n` string. Then you have an array of strings: `Dog Type: Border Collie`, `Dog Age: 11`, `Dog Price: $234`. Now you iterate through each element in this array and apply the exact same method, instead using `:` as the separator, to get a final set of arrays.
Alex Reynolds
I'm struggling to find where to implement this method and how to do it (i.e. the code to use) because it needs to store using core data. I've uploaded the code at http://techmosis.typepad.com/techmosis/2010/03/sample-iphone-app-code.html - are you able to provide me with some more specific details? Thanks so much for your help.
Graeme
Look at the method `endElementSAX`. This is called when the parser finishes parsing a particular node. Let's say your node is the `description` node. Then you split the description node's contents in here and set properties for type, age and price from the split arrays.
Alex Reynolds
OK, so would the following code do the job? It doesn't seem to be storing in core data though.NSString *path = [[NSBundle mainBundle] pathForResource:@"DisplayStrings" ofType:@"strings"]; NSString *string = [NSString stringWithContentsOfFile:path encoding:NSUTF16BigEndianStringEncoding error:NULL]; self.displayStrings = [string componentsSeparatedByString:@"\n"]; displayStringsIndex = 0; [self setupNextDisplayString];
Graeme
Use `NSLog` statements to troubleshoot what you're getting back.
Alex Reynolds
Hmm I think I'm doing something wrong. Are you able to offer some more specific code as to what my code should look like - because mine isn't working? Thanks once again.
Graeme
If your Core Data storage remains unchanged or the values are nil, I recommend putting `NSLog` statements in your code to determine what the values of these variables are.
Alex Reynolds
Yeah I can do that, but at the moment I'm struggling to work out what the code actually needs to be in the first place - at the moment its NSString *string = importer.currentSong.artist;NSArray *strings = [string componentsSeparatedByString: @", "]; but I get the feeling that's wrong seeings how its telling me "string is undeclared.
Graeme
Can you post an update to your question with all the new code (and an updated feed example, if there are changes to that, as well)? It is kind of difficult to read code in comments.
Alex Reynolds
OK have done so above. At top is the RSS feed item, and below the code for the parser.
Graeme
The problem might be this: `<description> <b>District/Region:</b> REGION 09</br><b>Location:</b> MOE</br><b>Name:</b> MARGRET STREET</br></description>` in that you have HTML tags within the XML `description` node. The whole thing should probably be wrapped in `CDATA` element, because those `b` and `br` tags would be treated like new nodes. Presumably you don't want this.
Alex Reynolds
You may want to edit the RSS feed to either strip those unnecessary tags or wrap the content of the `description` node in a `CDATA` element. Search Google on "xml cdata" for an explanation. If you use the `CDATA` approach, you'll need to use RegExLite or `NSScanner` to strip out the unwanted elements.
Alex Reynolds
OK so once I've done that, then what do I do? (I.e. back to the parsing of the separate description fields). Question: Would it be easier to separate the fields using something like Yahoo Pipes rather than trying to do it in an application?
Graeme
You can use the method I outlined, but you want to verify the parsing is working, first. Can you post the output of an `NSLog` statement that shows the output when the `description` tag parsing is finished?
Alex Reynolds