tags:

views:

333

answers:

8

I have a simple table with just name and email called name_email.

I am trying to fetch data out of it so that: If two rows have the same name, but one has an email which is ending with ‘@yahoo.com’ and the other has a different email, then the one with the ‘@yahoo.com’ email should be discarded.

what would be best way to get this data out?

A: 

You could use something like the following to exclude invalid email addresses:

SELECT name, email
FROM name_email
WHERE email NOT LIKE '%@yahoo.com' // % symbol is a wildcard so [email protected] and [email protected] both match this query.
AND name = 'Joe Guy';

Or do it like this to include only the valid email address or domain:

SELECT name, email
FROM name_email
WHERE email LIKE '%@gmail.com'
AND name = 'Joe Guy';

This works well if you know ahead of time what specific names you are querying for and what email addresses or domains you want to exclude or include.

Or if you don't care which email address you return but only want to return one, you could use something like this:

SELECT DISTINCT (name, email)
FROM name_email;
Noah Goodrich
thats the thing. i dont know the names ahead of time :) I am looking for duplicate names.
Omnipresent
Will the second one work any better using the DISTINCT operator?
Noah Goodrich
If you have additional criteria such as every name that has two email addresses should always return x email address then please edit your question to reflect that.
Noah Goodrich
A: 

You could do

SELECT TOP 1 email
FROM name_email
WHERE name = 'Joe Guy'
ORDER BY case when email like '%yahoo.com' then 1 else 0 end

So sort them by *@yahoo.com last and anything else first, and take the first one.

EDIT: sorry, misread the question - you want a list of each name, with only one email, and a preference for non-yahoo emails. Probably can use the above along with a group by, I'll have to rethink it.

MikeW
That'll work but I'm not a big fan of select-side per-row processing. This is the sort of stuff that should be done using a separate column and triggers. Most tables in my experience are read far more often than written. An insert/update-side process of setting the sort order is better IMNSHO :-).
paxdiablo
It has to do with my mainframe background where even our configuration tables dwarf the transaction tables of those "lesser" databases :-)
paxdiablo
+3  A: 

Okay, I'm not going to get involved in yet another fight with those who say I shouldn't advocate database schema changes (yes, you know who you are :-), but here's how I'd do it.

1/ If you absolutely cannot change the schema, I would solve it with code (either real honest-to-goodness procedural code outside the database or as a stored procedure in whatever language your DBMS permits).

This would check the database for a non-yahoo name and return it, if there. If not there, it would attempt to return the yahoo name. If neither are there, it would return an empty data set.

2/ If you can change the schema and you want an SQL query to do the work, here's how I'd do it. Create a separate column in your table called CLASS which is expected to be set to 0 for non-yahoo addresses and 1 for yahoo addresses.

Create insert/update triggers to examine each addition or change of a row, setting the CLASS based on the email address (what it ends in). This guarantees that CLASS will always be set correctly.

When you query your table, order it by name and class, and only select the first row. This will give you the email address in the following preference: non-yahoo, yahoo, empty dataset.

Something like:

select name, email
from tbl
where name = '[name]'
order by name, class
fetch first row only;

If your DBMS doesn't have an equivalent to the DB2 "fetch first row only" clause, you'll probably still need to write code to only process one record.

If you want to process all names but only the specific desired email for that name, a program such as this will suffice (my views on trying to use a relational algebra such as SQL in a procedural way are pretty brutal, so I won't inflict them on you here):

# Get entire table contents sorted in name/class order.
resultSet = execQuery "select name, email from tbl order by name, class"

# Ensure different on first row
lastName = resultSet.value["name"] + "X"

# Process every single row returned.
while not resultSet.endOfFile:
    # Only process the first in each name group (lower classes are ignored).
    if resultSet.value["name"] != lastName:
        processRow resultSet.value["name"] resultSet.value["email"]
    # Store the last name so we can detect next name group.
    lastName = resultSet.value["name"]
paxdiablo
This is the Right Way To Solve This Problem.
kquinn
A: 

Grabbing all the rows from the database, knowing not what the names are (and not needing to care about that really), but just want them to show, and if matching, skip a match if the email contains, in this case, @yahoo.com

SELECT DISTINCT name, email FROM name_email 
  WHERE email NOT LIKE '%@yahoo.com' 
  GROUP BY name;

Doing that will grab all the rows, but only one of a record if the names match with another row. But then, if there are two rows with matching names, junk the one with @yahoo.com in the email.

random
If you group by name, then you would have to have an aggregate function like min(email)Also, I believe if the person only has one email and it's a yahoo email, he wants it returned. It's only if there are duplicates and one of them is yahoo he wants the other one.
MikeW
OPQ reads like he wants to grab all distinct rows based on name, which he doesn't know beforehand. In the case of a row with the same name, wants to check to see what emails they're using. If one of them is with yahoo.com, then to ditch that and show the non-yahoo one.
random
+1  A: 
select ne.*
from name_email ne
where ne.email not like '%@yahoo.com' escape '\' or
    not exists(
               select 1 from name_email
               where name = ne.name and
               email not like '%@yahoo.com' escape '\'
              )
Juris
A: 

Not very pretty, but I believe it should work

select 
    ne.name 
    ,ne.email
from 
    name_email ne
    inner join (
     select 
      name 
      ,count(*) as emails_per_name
     from 
      name_email
     group by name
    ) nec 
     on ne.name = nec.name
where
    nec.emails_per_name = 1
    or (nec.emails_per_name > 1 and ne.email not like ('%@yahoo.com'))

That is assuming that the duplicate emails would be in yahoo.com domain - as specified in your question, and those would be excluded if there is more than one email per name

kristof
A: 

If you are working with SQL Server 2005 or Oracle, you can easily solve your problem with a ranking (analytical) function.

select a.name, a.name_email
from (select name, name_email,
             row_number() over (partition by name
                                order by case
                                           when name_email like '%@yahoo.com' then 1
                                           when name_email like '%@gmail.com' then 1
                                           when ... (other 'generic' email) then 1
                                           else 0
                                         end) as rn) as a
where a.rn = 1

By assigning different values to the various generic email names you can even have 'preferences'. As it is written here, if you have both a yahoo and a gmail address, you can't predict which one will be picked up.

A: 

You could use a UNION for this. Select everything without the yahoo.com and then just select the records that have yahoo.com and is not in the first list.

SELECT DISTINCT (name, name_email) FROM TABLE WHERE name_email NOT '%yahoo.com' UNION SELECT DISTINCT (name, name_email) FROM TABLE WHERE name NOT IN (SELECT DISTINCT (name, name_email) FROM TABLE WHERE name_email NOT '%yahoo.com')

Diago