I am curious about the technology behind a search engine like torrentz.com. From what I could observe, it doesn't host any torrent files, but rather connects you to other servers that do.
- you search for keywords, it brings up a list of potential titles matching your search.
- then you pick one of these and it provides you with another list of potential servers hosting the corresponding torrent file.
What I'm interested in particularly is the strategy behind gathering and indexing all that content:
How do they collect then aggregate the data?
Is it a submission base service, where each of these servers submits its content for indexing?
Is it a crawling algorithm? If so how do you even start crawling a site like piratebay.org?
Do they have access to these other servers' databases?
My knowledge and understanding of the bittorrent protocol is not very elaborate, but the documentation that I found online pointed me more toward the processes involved in building a tracker service, which isn't exactly what I'm interested in. Any insight and recommended reading material is appreciated.