views:

261

answers:

3

Many job sites have broken searches that don't let you narrow down jobs by experience level. Even when they do, it's usually wrong. This requires you to wade through hundreds of postings that you can't apply for before finding a relevant one, quite tedious. Since I'd rather focus on writing cover letters etc., I want to write a program to look through a large number of postings, and save the URLs of just those jobs that don't require years of experience.

I don't require help writing the scraper to get the html bodies of possibly relevant job posts. The issue is accurately detecting the level of experience required for the job. This should not be too difficult as job posts are usually very explicit about this ("must have 5 years experience in..."), but there may be some issues with overly simple solutions.

In my case, I'm looking for entry-level positions. Often they don't say "entry-level", but inclusion of the words probably means the job should be saved.

Next, I can safely exclude a job the says it requires "5 years" of experience in whatever, so a regex like /\d\syears/ seems reasonable to exclude jobs. But then, I realized some jobs say they'll take 0-2 years of experience, matches the exclusion regex but is clearly a job I want to take a look at. Hmmm, I can handle that with another regex. But some say "less than 2 years" or "fewer than 2 years". Can handle that too, but it makes me wonder what other patterns I'm not thinking of, and possibly excluding many jobs. That's what brings me here, to find a better way to do this than regexes, if there is one.

I'd like to minimize the false negative rate and save all the jobs that seem like they might not require many years of experience. Does excluding anything that matches /[3-9]\syears|1\d\syears/ seem reasonable? Or is there a better way? Training a bayesian filter maybe?

Edit: There's a similar, but harder problem, which would probably be more useful to solve. There are lots of jobs that just require an "engineering degree", as you just have to understand a few technical things. But searching for "engineering" gives you thousands of jobs, mostly irrelevant.

How do I narrow this down to just those jobs that require any engineering degree, rather than particular degrees, without looking at each myself?

+1  A: 

Ok, this answer is probably not going to be helpful -- I will say that up front. But, in my opinion, merely thinking about the problem in this way is enough to get you hired at most places I've worked. My suggestion? Contact the hiring manager at any of the postings in which you have interest, tell them this is what you are doing. Tell them generically what you have coded so far, and ask for assistance in learning the patterns they use when writing their adverts.

If I were on the receiving end of this letter, I think I would invite the person in for an interview.

MJB
You're lucky to have worked in places like that. It might work for me if I was looking for a software engineering position. But programming is my hobby; I'm looking for a job in mechanical engineering (and it has to be due to silly regulations on international students trying to work in the US). I would love to get a software job instead though, but my status makes it practically impossible currently. If you know anybody who needs a mechanical engineer who can program, please let me know :)
ehsanul
@ehsanul: have you posted on careers.stackoverflow.com?
Ether
@ether See my comment above yours. I can't take a software-related job right now (not legally anyways). Mechanical engineering jobs are what I'm looking for.
ehsanul
@ehsanul: no, I don't know anyone currently hiring, but I have confidence that you'll do fine once you get the interview. Good luck.
MJB
Thanks a lot MJB.
ehsanul
+1  A: 

Hi - I developed a good parse and email routine for a couple of job websites when I was looking for work for myself and a couple of friends. I agree with the other posts, this is a great way to look at the problem. Just to drop a little info, I did it mostly in ruby, and used tor proxies and some other methods to make sure that I wouldn't be iced out of the job site. This sort of project is unlike usual scraping as you really can't afford to get kicked off a job board. In any case, I just have one piece of advice: forget about sorting and fine tuning this too intensely. Let the HR department do that for you and get your resume and credentials out everywhere. It's a statistical game, and you want to broadcast yourself and throw that net as widely as possible.

riva
Thanks for the advice.
ehsanul
A: 

Here's some sample code if you're interested. It's for looking for a flat, not a job, but the concepts should be similar enough. http://github.com/agrimm/Easy-Roommate-parser

Andrew Grimm