I have a rather simple hadoop question which I'll try to present with an example
say you have a list of strings and a large file and you want each mapper to process a piece of the file and one of the strings in a grep like program.
how are you supposed to do that? I am under the impression that the number of mappers is a result of the inputSplits produced. I could run subsequent jobs, one for each string, but it seems kinda... messy?
edit: I am not actually trying to build a grep map reduce version. I used it as an example of having 2 different inputs to a mapper. Let's just say that I lists A and B and would like for a mapper to work on 1 element from list A and 1 element from list B
So given that the problem experiences no data dependency that would result in the need for chaining jobs, is my only option to somehow share all of list A on all mappers and then input 1 element of list B to each mapper?
What I am trying to do is built some type of a prefixed look-up structure for my data. So I have a giant text and a set of strings. This process has a strong memory bottleneck, therefore I was after 1 chunk of text/1 string per mapper