ansaurus

Question

Answer 1

+3 A:

if the substring can appear in any position in another address, you can do a join like:

select a.id, a.addr, b.id as b_id, b.addr as b_addr from t a, t b where
    b.addr like concat('%', substr(a.addr, 0, 5), '%') and
    b.id <> a.id;

will return all records whose first 5 address chars appear anywhere inside another address.

or you can just ignore the stuff after char 5 with a group:

select firstname, substr(addr, 0, 5) from t group by firstname,
  substr(addr, 0, 5);

you might want to use a tool to normalize postal addresses, such as:

https://webgis.usc.edu/Services/AddressNormalization/Default.aspx

(free up to 2500 records)

jspcal 2010-01-07 09:15:42

Answer 2

+1 A:

If you want to group by the first 5 characters of the address too, you can do this:

select firstname, MAX(address) AS Address
from t 
group by firstname, SUBSTRING(address,0,5)

Is that what you want?

AdaTheDev 2010-01-07 09:26:26

not really, i'd like to use the LIKE pattern so I can filter at a broader range.

Luca Matteis 2010-01-07 09:36:44

SQL group by LIKE pattern