views:

651

answers:

2

Hey folks. Trying to parse rss/atom feeds with the ROME library. Need some help. I am new to java so I am not in tune with many of it's intricacies.

2 things.


  1. Does ROME automatically use it's modules to handle different feeds as it comes across them, or do I have to ask it to use them. If so, any direction on this.
  2. How do I get to the correct 'source'? I was trying to use a item.getSource() but it is giving me fits. I guess I am using the wrong interface. Some direction would be much appreciated.

Here is the meat of what I have for collection my data. I noted two areas where I am having problems, both revolving around getting Source Information of the feed. And by source, I want CNN, or FoxNews, or whomever, not the Author. Judging from my reading, .getSource() is the correct method.

Thanks ahead of time.

List<String> feedList = theFeeds.getFeeds();
List<FeedData> feedOutput = new ArrayList<FeedData>();
for (String sites : feedList ) {
  URL feedUrl = new URL(sites);
  SyndFeedInput input = new SyndFeedInput();
  SyndFeed feed = input.build(new XmlReader(feedUrl));
  List<SyndEntry> entries = feed.getEntries();
  for (SyndEntry item : entries){
    String title = item.getTitle();              
    String link = item.getUri();
    Date date = item.getPublishedDate();
Problem here -->         **     SyndEntry source = item.getSource();
    String description;
    if (item.getDescription()== null){
      description = "";
    } else {
      description = item.getDescription().getValue();
    }
    String cleanDescription = description.replaceAll("\\<.*?>","").replaceAll("\\s+", " ");
    FeedData feedData = new FeedData(); 
    feedData.setTitle(title);
    feedData.setLink(link);
And Here -->        **      feedData.setSource(link);
    feedData.setDate(date);
    feedData.setDescription(cleanDescription);
    String preview =createPreview(cleanDescription);
    feedData.setPreview(preview);
    feedOutput.add(feedData);
    // lets print out my pieces.
    System.out.println("Title: " + title);
    System.out.println("Date: " + date);
    System.out.println("Text: " + cleanDescription);
    System.out.println("Preview: " + preview);
    System.out.println("*****");
  }
}
A: 

getSource() is definitely wrong - it returns back SyndFeed to which entry in question belongs. Perhaps what you want is getContributors()?

As far as modules go, they should be selected automatically. You can even write your own and plug it in as described here

ChssPly76
So what do I do to get the actual Source of the rss. say a RSS feed from yahoo that will give me the YAHOO, or CNN , or ESPN or whatever? I cannot figure that part out.
ButtersB
I'm not sure what you mean. Is `getAuthors()` / `getContributors()` not giving you what you want? Where's that field you're looking for located in actual RSS?
ChssPly76
getAuthors get's the actuall author, not the source. Say, CNN. It gives me the journalist's name or , foxnews@foxnewsonline if they happen to attribute it that way.I guess I could iterate over the parent feed and get it.
ButtersB
A: 

What about trying regex the source from the URL without using the API?

That was my first thought, anyway I checked against the RSS standardized format itself to get an idea if this option is actually available at this level, and then try to trace its implementation upwards...

In RSS 2.0, I have found the source element, however it appears that it doesn't exist in previous versions of the spec- not good news for us!

[ is an optional sub-element of [1]

Its value is the name of the RSS channel that the item came from, derived from its . It has one required attribute, url, which links to the XMLization of the source.

tranced_UT3