Hi Guys,
I'm still working with this huge list of URLs, all the help I have received has been great.
At the moment I have the list looking like this (17000 URLs though):
http://www.domain.com/page?CONTENT_ITEM_ID=1
http://www.domain.com/page?CONTENT_ITEM_ID=3
http://www.domain.com/page?CONTENT_ITEM_ID=2
http://www.domain.com/page?CONTENT_ITEM_ID=1
http://www.domain.com/page?CONTENT_ITEM_ID=2
http://www.domain.com/page?CONTENT_ITEM_ID=3
http://www.domain.com/page?CONTENT_ITEM_ID=3
I can filter out the duplicates no problem with a couple of methods, awk etc. What I am really looking to do it take out the duplicate URLs but at the same time taking a count of how many times the URL exists in the list and printing the count next to the URL with a pipe separator. After processing the list it should look like this:
url | count
http://www.domain.com/page?CONTENT_ITEM_ID=1 | 2
http://www.domain.com/page?CONTENT_ITEM_ID=2 | 2
http://www.domain.com/page?CONTENT_ITEM_ID=3 | 3
What method would be the fastest way to achieve this?
Cheers