Friday, February 24, 2012

How to find duplicate rows in Flat File ?

In the FLAT FILE source, I have to find the duplicate rows based on the two fields say, "bill number" & "invoice date".

The rows within flat file has like "bill number" which is duplicated on the same "invoice date".

If duplicate rows found then move the duplicate rows into another Flat File.

If not found then move the rows into Sql Server Table.

Pls provide the solution. Thank you

Since, for duplicate records, the bill numbers would be the same as would the invoice dates, how do you chose which record to keep and which to move on to another flat file? Does it matter?|||

Example

BillNo, invoicedate, billamt, shiptype,

920203348313,08/12/07,1000,A

920203348313,08/12/07,1000,A

792309510000, 08/13/07,867,C

First two rows to be moved into another FLAT FILE

Third row to be moved into SQL SERVER table.

Thanks

|||

You could use a Rank() function to identify duplicates and to select which rows are going to be discarded/kept (you could load the files to an staging table).

If you were using a relational DB as source you do it like this. Jamie has also a Rank transformation that you may want to check. You could also write your own solution in a script component.,

|||

is there any direct solution by using any component within SSIS ?

|||

Antony Kumar wrote:

is there any direct solution by using any component within SSIS ?

A script component could do this pretty easily, provided you have the data sorted by the two keys and it would also likely require a two-pass approach. The first pass to number the rows, the second to move the rows where count (rownumber) > 1

|||

Antony Kumar wrote:

Example

BillNo, invoicedate, billamt, shiptype,

920203348313,08/12/07,1000,A

920203348313,08/12/07,1000,A

792309510000, 08/13/07,867,C

First two rows to be moved into another FLAT FILE

Third row to be moved into SQL SERVER table.

Thanks

Based on this example you could use an agregation transformation to count the rows group by BillNo; that should provide an extra column with the count. Then use a conditional split to filter out rows where count>1...

|||

Rafael Salas wrote:


Based on this example you could use an agregation transformation to count the rows group by BillNo; that should provide an extra column with the count. Then use a conditional split to filter out rows where count>1...

Oh sure! Go the easy route! Not sure what I was thinking earlier.

No comments:

Post a Comment