ansaurus

Question

Answer 1

A:

I would make use of Regex to parse each row of data, first of all splitting by comma(,) and then removing any text held within brackets, and spaces leading to those brackets. As for removing junk pharases, perhaps comparing to an accepted word list?

I also notice that the keyword 'AND' denotes two separate skills, going by your desired output. Results using this method of processing may be a bit sketchy due to the data not all neccesarily being in the same format.

Seidr 2010-03-31 13:10:19

Answer 2

A:

It would be very hard to start from scratch,

I'd parse some data for skill sets from somewhere and load them to a table and use that table as reference table, trying to match data from that table. Otherwise you have no way to determine whether the words or phrases are meaningful or not.

And for each phrase i'd use the following algorithm

Say you have a phrase of 5 words

 "one two three four five"

first i'd check whether this one exists in my table, if so keep it and go to the next one, if not, check

 "one two three four" and "two three four five"

and if they dont match either, check

  "one two three", "two three four", "three four five"

etc...

I know it is a bit messy and long way, but it is the first thing came in to my mind.

Hope it helps

marvin 2010-03-31 13:17:38

Answer 3

A:

<?php
$white_list = array(); // Add acceptable words and/or characters
$black_list = array(); // Add unacceptable words and/or characters

$s = '"PHP (good level), Java (intermediaite), C++" "PHP5" "project management and quality management" "begining Javascript" "water engineering" "dfsdf zerze rzer" "cibling customers"';

$words = explode(" ",$s);

$primary = array();
$secondary = array();
foreach($words as $word) {
    $new_word = trim(str_replace($black_list, "", $word));
    if (in_array($new_word,$white_list) == true) {
        $primary[] = $new_word;
    } else {
        $secondary[] = $new_word;
    }
}

$collected = '"' . implode('" "',$primary) . '"';

You could use something like this to build a table of white and black lists. In the long run you'll have better control over what is a positive and what is not.

Brant 2010-03-31 14:18:33

ansaurus

tags:

views:

answers:

Datamining on a mysql database

related questions