textmatching

How to match URIs in text?

How would one go about spotting URIs in a block of text? The idea is to turn such runs of texts into links. This is pretty simple to do if one only considered the http(s) and ftp(s) schemes; however, I am guessing the general problem (considering tel, mailto and other URI schemes) is much more complicated (if it is even possible). I wo...

how to determine if a record in every source, represents the same person

I have several sources of tables with personal data, like this: SOURCE 1 ID, FIRST_NAME, LAST_NAME, FIELD1, ... 1, jhon, gates ... SOURCE 2 ID, FIRST_NAME, LAST_NAME, ANOTHER_FIELD1, ... 1, jon, gate ... SOURCE 3 ID, FIRST_NAME, LAST_NAME, ANOTHER_FIELD1, ... 2, jhon, ballmer ... So, assuming that records with ID 1, from sources 1 a...

Data Comparison

We have a SQL Server table containing Company Name, Address, and Contact name (among others). We regularly receive data files from outside sources that require us to match up against this table. Unfortunately, the data is slightly different since it is coming from a completely different system. For example, we have "123 E. Main St." a...

Regexp recognition of email address hard?

I recently read somewhere that writing a regexp to match an email address, taking into account all the variations and possibilities of the standard is extremely hard and is significantly more complicated than what one would initially assume. Can anyone provide some insight as to why that is? Are there any known and proven regexps tha...

Representing a text file as single unit in Java, and matching strings in the text

Hello, How can I have a text file (or XML file) represented as a whole string, and search for (or match) a particular string in it? I have created a BufferedReader object: BufferedReader input = new BufferedReader(new FileReader(aFile)); and then I have tried to use the Scanner class with its option to specify different delimiters,...

How can I match string order between two documents in Perl?

Hi, I've a problem in making a PERL program for matching the words in two documents. Let's say there are documents A and B. So I want to delete the words in document A that's not in the document B. Example 1: A: I eat pizza B: She go to the market and eat pizza result: eat pizza example 2: A: eat pizza B: pizza eat result:pizza (...