Wednesday, March 7, 2012

How to find duplicate rows in SQL server

I would like to locate duplicate rows within a specific table. This table has 12 diffrent rows.

BASENO - POSITION - SEQ - PROD -STYL - DESCR - FIELD01 THRU FIELD05 - VALUE01 THRU VALUE05 - FORMTYPE - ANSWER

I have been playing with the following query but can't seem to get it perfect to locate my dups within sql. Can someone help me with the querry?

I'm playing with the following querry.
SELECT
<list of all columns>
FROM
tablename
GROUP BY
<list of all columns>
HAVING
Count(*) > 1

Can somone possible input my column names into this querry that would possibly get it to locate my dups? I'm missing somehting and not sure what.

Thanks for any help

SQL NewbieIf you simply want to eliminate all of the rows that have a duplicate (every one of them, leaving none behind at all), you can use:DELETE FROM tablename
WHERE 1 < (SELECT Count(*)
FROM tablename AS z
WHERE z.baseno = tablename.baseno
AND z.position = tablename.position
-- continue for all columns
)This is rarely what people want, since they usually want to keep one of the rows. That is a tougher challenge.

-PatP|||I had a situation where I needed to delete the duplicate values (there may be more than one) and keep the minimum (or first) value. Assuming you have an id value and BRANCHNO is your duplicate field:

SELECT id
FROM tablename
WHERE id
IN (SELECT a.id
FROM tablename AS a, tablename AS b
WHERE a.BRANCHNO = b.BRANCHNO
AND a.id > b.id);

Not 100% sure but give it a shot.

ddave|||To simply identify the duplicated values, this is the syntax:

SELECT BASENO, POSITION, SEQ, PROD, STYL, DESCR, FIELD01, FIELD02, FIELD03, FIELD04, FIELD05, VALUE01, VALUE02, VALUE03, VALUE04, VALUE05, FORMTYPE, ANSWER
FROM tablename
GROUP BY
BASENO, POSITION, SEQ, PROD, STYL, DESCR, FIELD01, FIELD02, FIELD03, FIELD04, FIELD05, VALUE01, VALUE02, VALUE03, VALUE04, VALUE05, FORMTYPE, ANSWER
HAVING Count(*) > 1|||Ooops, my bad. I misread the question thinking that you wanted to DELETE the rows, not just see them. Sorry.

-PatP|||If you simply want to eliminate all of the rows that have a duplicate (every one of them, leaving none behind at all), you can use:DELETE FROM tablename
WHERE 1 < (SELECT Count(*)
FROM tablename AS z
WHERE z.baseno = tablename.baseno
AND z.position = tablename.position
-- continue for all columns
)This is rarely what people want, since they usually want to keep one of the rows. That is a tougher challenge.

-PatP

I used your information above and I recieved a relply that (0 row(s) affected).
However when I try and set my primary keys and allow NUULS options I get an error message about duplicates. (below)

'TableName' table
-unable to create index 'PK_TableName'.
ODBC error CREATE UNIQUE INDEX terminated because a duplicate key was found for index ID 1. Most significant primary key is 415162|||Ah, that is quite differentthan the original question though. Now you want to find duplicates based on just the PK column(s)!

To find the rows with duplicate key values, you need to use something like:SELECT pk_col1, pk_col2, pk_colN
FROM tablename
WHERE 1 < (SELECT Count(*)
FROM tablename AS z
WHERE z.pk_col1 = tablename.pk_col1
AND z.pk_col2 = tablename.pk_col2
AND z.pk_colN = tablename.pk_colN)Which column(s) are your candidate key?

-PatP|||Ah, that is quite differentthan the original question though. Now you want to find duplicates based on just the PK column(s)!

To find the rows with duplicate key values, you need to use something like:SELECT pk_col1, pk_col2, pk_colN
FROM tablename
WHERE 1 < (SELECT Count(*)
FROM tablename AS z
WHERE z.pk_col1 = tablename.pk_col1
AND z.pk_col2 = tablename.pk_col2
AND z.pk_colN = tablename.pk_colN)Which column(s) are your candidate key?

-PatP

Thanks for your help. I really appreciate your time. i will give it a try.

No comments:

Post a Comment