tags:

views:

49

answers:

2

I have to do a group by firstname and address on this sort of data

| firstname | address
+-----------+----------
| alex      | 338 s. michigan 
| mark      | finger
| alex      | 338 south michigan
| ted       | port

But the group by will not return similar values of the address field.

Is there a pattern I can apply to the group by? I want to use the LIKE pattern system so I can do for example

select firstname, address from t group by firstname, '%'+SUBSTRING(address,0,5)+'%' 

but that syntax doesn't seem to work.

Any ideas? Thanks!

+3  A: 

if the substring can appear in any position in another address, you can do a join like:

select a.id, a.addr, b.id as b_id, b.addr as b_addr from t a, t b where
    b.addr like concat('%', substr(a.addr, 0, 5), '%') and
    b.id <> a.id;

will return all records whose first 5 address chars appear anywhere inside another address.

or you can just ignore the stuff after char 5 with a group:

select firstname, substr(addr, 0, 5) from t group by firstname,
  substr(addr, 0, 5);

you might want to use a tool to normalize postal addresses, such as:

https://webgis.usc.edu/Services/AddressNormalization/Default.aspx

(free up to 2500 records)

jspcal
+1  A: 

If you want to group by the first 5 characters of the address too, you can do this:

select firstname, MAX(address) AS Address
from t 
group by firstname, SUBSTRING(address,0,5)

Is that what you want?

AdaTheDev
not really, i'd like to use the LIKE pattern so I can filter at a broader range.
Luca Matteis