It's hard to make the code better without knowing what GetLinks() does. In any event, this avoids recursion. The standard idiom is you don't alter a collection when you're enumerating over it. While the runtime could have let you do it, the reasoning is that it's a source of error, so better to create a new collection or control the iteration yourself.
- create a queue with all urls.
- when dequeueing, we're pretty much saying we've processed it, so add it to result.
- If GetLinks() returns anything, add those to the queue and process them as well.
.
public List<string> ExpandLinksOrSomething(List<string> urls)
{
List<string> result = new List<string>();
Queue<string> queue = new Queue<string>(urls);
while (queue.Any())
{
string url = queue.Dequeue();
result.Add(url);
foreach( string newResult in GetLinks(url) )
{
queue.Enqueue(newResult);
}
}
return result;
}
The naive implementation assumes that GetLinks()
will not return circular references. e.g. A returns B, and B returns A. This can be fixed by:
List<string> newItems = GetLinks(url).Except(result).ToList();
foreach( string newResult in newItems )
{
queue.Enqueue(newResult);
}
* As others point out using a dictionary may be more efficient depending on how many items you process.
I find it strange that GetLinks() would return a value, and then later resolve that to more Url's. Maybe all you want to do is 1-level expansion. If so, we can get rid of the Queue altogether.
public static List<string> StraightProcess(List<string> urls)
{
List<string> result = new List<string>();
foreach (string url in urls)
{
result.Add(url);
result.AddRange(GetLinks(url));
}
return result;
}
I decided to rewrite it because while other answers used queues, it wasn't apparent that they didn't run forever.