string-matching

I need a regEx to match general URLs.

I need to test for general URLs using any protocol (http, https, shttp, ftp, svn, mysql and things I don't know about). My first pass is this: \w+://(\w+\.)+[\w+](/[\w]+)(\?[-A-Z0-9+&@#/%=~_|!:,.;]*)? (PCRE and .NET so nothing to fancy) ...

Test for a syntactically correct path

In .NET is there a function that tests if a string is syntactically a correct path? I specifically don't want it to test if the path actually exists. my current take on this is a regex: ([a-zA-Z]:|\\)?\\?([^/\\:*?"<>|]+[/\\])*[^/\\:*?"<>|]* matches: c:\ bbbb \\bob/john\ ..\..\ rejects: xy: c:\\bob ...

Closest match for Full Text Search

Hello, I am trying to implement an internal search for my website that can point users in the right direction in case the mistype a word, something like the did you mean : in google search. Does anybody have an idea how such a search can be done? How can we establish the relevance of the word or the phrase we assume the user intended t...

Finding how similar two strings are

I'm looking for an algorithm that takes 2 strings and will give me back a "factor of similarity". Basically, I will have an input that may be misspelled, have letters transposed, etc, and I have to find the closest match(es) in a list of possible values that I have. This is not for searching in a database. I'll have an in-memory list o...

Fuzzy matching of product names

I need to automatically match product names (cameras, laptops, tv-s etc) that come from different sources to a canonical name in the database. For example "Canon PowerShot a20IS", "NEW powershot A20 IS from Canon" and "Digital Camera Canon PS A20IS" should all match "Canon PowerShot A20 IS". I've worked with levenshtein distance with s...

A better similarity ranking algorithm for variable length strings

I'm looking for a string similarity algorithm that yields better results on variable length strings than the ones that are usually suggested (levenshtein distance, soundex, etc). For example, Given string A: "Robert", Then string B: "Amy Robertson" would be a better match than String C: "Richard" Also, preferably, this algorithm sh...

Which one is a more reliable matching scheme, EREGI or STRIPOS?

Which scheme according to you is a better one in case of matching? Is it eregi or stripos or any other method? ...

Representing a text file as single unit in Java, and matching strings in the text

Hello, How can I have a text file (or XML file) represented as a whole string, and search for (or match) a particular string in it? I have created a BufferedReader object: BufferedReader input = new BufferedReader(new FileReader(aFile)); and then I have tried to use the Scanner class with its option to specify different delimiters,...

XPath partial of attribute known

I known the partial value of an attribute in a document, but not the whole thing. Is there a character I can use to represent any value? For example, a value of a label for an input is "A. Choice 1". I know it says "Choice 1", but not whether it will say "A. " or "B. " before the "Choice 1". Below is the relevant HTML. There are oth...

Are Regular Expressions a must for programming?

Are Regular Expressions a must for doing programming? ...

How should I print out a particular character in the file after reading the file?

Hi, I am reading a file using perl script. This file consists of strings with different characters and I am supposed to identify strings containing the character 'X'. I want to know how should I (1) print this string (containing 'X') and also (2) write this string to another file (3) count the number of 'X' characters in the whole file....

MySQL, select records with at least X characters matching

Hello, I am trying to accomplish the following. Let's say we have a table that contains these fields (ID, content) 1 | apple 2 | pineapple 3 | application 4 | nation now, I am looking for a function that will tell me all possible common matches. For example, if the argument is "3", the function will return all possible strings from...

String matching technique(s) by converting to number?

I have various length strings which are full of Base64 chars. Actualy they are audio recognition datas differs by song-to-song. For easily comparing parts of those strings i divide them into 16-char sub-strings. (which is about 1 second of a song) But in some cases, i just can't compare these ones head to head.. i should be measuring t...

Using Rabin-Karp to search for multiple patterns in a string

According to the wikipedia entry on Rabin-Karp string matching algorithm, it can be used to look for several different patterns in a string at the same time while still maintaining linear complexity. It is clear that this is easily done when all the patterns are of the same length, but I still don't get how we can preserve O(n) complexit...

i want to capture a single match

i want to capture only the first match through the expression <p>.*?</p> i have tried <p>.*?</p>{1} but it is not working it returns all the p tags which are in the html document, please help ...

Hibernate case-insensitive utf-8/unicode collation that works on multiple DBMS

I'm looking for Hibernate annotation or .hbm.xml that allows me to specify a table column as case-insensitive string working in unicode/utf-8/locale-independent manner that works on multiple database engines. Is there any such thing? So that I can do query using Restrictions.eq("column_name", "search_string") efficiently. ...

String matching in python with re

I have a file in this structure: 009/foo/bar/hi23123/foo/bar231123/foo/bar/yo232131 What i need is to find the exact match of a string; e.g. only /foo/bar among /foo/bar/hi and /foo/bar/yo One solution came up in my mind is like to check for ending "/" for the input string. Because if there is ending "/" in the possible results, that...

Best way string-matching algorithm for same-length strings?

I need to implement a string-matching algorithm to determine which strings most closely match. I see the the Hamming distance is a good matching algorithm when this fixed-length is obtainable. Is there any advantage in the quality of matching if I were to use the Levenshtein distance formula instead? I know this method is less effic...

Method for matching any string in SQL

I have a simple SQL query, SELECT * FROM phones WHERE manu='$manuf' AND price BETWEEN $min AND $max The issue is that all of the variable fields are connected to fields that will sometimes be empty, and thus I need a way to make them match any value that their respective field could take if they are empty. I tried $min=$_REQUEST['min...

How to do fast prefix string matching in Scala

I'm using some Java code to do fast prefix lookups, using java.util.TreeSet, could I be using scala's TreeSet instead? Or a different solution? /** A class that uses a TreeSet to do fast prefix matching */ class PrefixMatcher { private val _set = new java.util.TreeSet[String] def add(s: String) = _set.add(s) def findMatches(pre...