Monday, March 26, 2012

How to flatten Relational Data for Analysis

Hello everyone, this is my first post here so hopefully I am not asking a common question.

I am trying to create a flat dataset in SQL 2005. Basically I run a query and I get multiple rows for the same primary key. The query I am running is quite large and has several different tables connected to it, here is a small sample of what it looks like...

Typeid(Primary Key) Individual Address

1 Sam 912 Ave. J

1 Sam 913 Ave. Q

1 Sam 914 Ave. R

2 Mike 1000 Ave. O

3 Jill 1001 Ave. O

I want it to kind of look like this

TypeID Individual Address_1 Address_2 Address_3

1 Sam 912 Ave. J 913 Ave. Q 914 Ave. R

2 Mike 1000 Ave. O

3 Jill 1001 Ave. O

As I said before, this query is pretty big, and has several variables like Address where multiple rows are being taken by one Primary Key.

If it is not possible to do this in SQL 2005, is there a program that may be able to? Right now we are using SPSS as sort of a bandaid... we run the query in small portions like the one in the sample and then restructure the in sections but this takes several hours.

Anyways, thanks for any help that you may be able to provide.

-John

It seems like you 'should' be able to create a VIEW that has the flattened data (results of a multi-table JOIN).

Then you can export to your DH using bcp, SQLCmd, or SSIS.

|||

You could use a Common Table Expression to generate a result set that would be more like

ID Name Addresses

1 Sam 912 Ave J., 913 Ave J., 914 Ave J.

2 Tom 1 A St

3 Henry 4 Fifth St

Would that be desirable?

|||

That would be great, but I am not sure how to go about doing that. Could you explain or provide an example?

Thanks

-John

|||Have you considered using the PIVOT "relational operator" in SQL 2K5? If you have a predictable maximum number of addresses, it would be easier, but you might be able to do something dynamic.|||

Try this one out, though you might have to change it a bit for your particular situation.

DECLARE @.ID int

DECLARE @.address nvarchar(50)

DECLARE @.ALLaddress nvarchar(MAX)

IF EXISTS ( SELECT * FROM sys.tables where [name] = 'Addresses_Flat')

DROP TABLE Addresses_Flat

CREATE TABLE Addresses_Flat

(ID int, [Name] nvarchar(50), [Address] nvarchar(MAX))

DECLARE IDCursor CURSOR FOR

SELECT DISTINCT id

FROM MyTest

OPEN IDCursor

WHILE 1=1 BEGIN

FETCH next FROM IDCursor INTO @.ID

IF @.@.FETCH_STATUS <> 0 BREAK

DECLARE AddressCursor CURSOR FOR

SELECT [Address]

FROM MyTest

WHERE ID = @.ID

SET @.Alladdress = ''

OPEN AddressCursor

WHILE 1=1 BEGIN

FETCH next FROM AddressCursor INTO @.address

IF @.@.FETCH_STATUS <> 0 BREAK

SET @.ALLaddress = @.ALLaddress + ', ' + @.address

END

CLOSE AddressCursor

DEALLOCATE AddressCursor

INSERT INTO Addresses_Flat

SELECT DISTINCT id, [name], @.ALLaddress as [Address]

FROM MyTest

WHERE ID = @.ID

END

CLOSE IDCursor

DEALLOCATE IDCursor

SELECT * FROM Addresses_Flat

DROP TABLE Addresses_Flat

No comments:

Post a Comment