Showing posts with label flatten. Show all posts
Showing posts with label flatten. Show all posts

Monday, March 26, 2012

How to flatten Relational Data for Analysis

Hello everyone, this is my first post here so hopefully I am not asking a common question.

I am trying to create a flat dataset in SQL 2005. Basically I run a query and I get multiple rows for the same primary key. The query I am running is quite large and has several different tables connected to it, here is a small sample of what it looks like...

Typeid(Primary Key) Individual Address

1 Sam 912 Ave. J

1 Sam 913 Ave. Q

1 Sam 914 Ave. R

2 Mike 1000 Ave. O

3 Jill 1001 Ave. O

I want it to kind of look like this

TypeID Individual Address_1 Address_2 Address_3

1 Sam 912 Ave. J 913 Ave. Q 914 Ave. R

2 Mike 1000 Ave. O

3 Jill 1001 Ave. O

As I said before, this query is pretty big, and has several variables like Address where multiple rows are being taken by one Primary Key.

If it is not possible to do this in SQL 2005, is there a program that may be able to? Right now we are using SPSS as sort of a bandaid... we run the query in small portions like the one in the sample and then restructure the in sections but this takes several hours.

Anyways, thanks for any help that you may be able to provide.

-John

It seems like you 'should' be able to create a VIEW that has the flattened data (results of a multi-table JOIN).

Then you can export to your DH using bcp, SQLCmd, or SSIS.

|||

You could use a Common Table Expression to generate a result set that would be more like

ID Name Addresses

1 Sam 912 Ave J., 913 Ave J., 914 Ave J.

2 Tom 1 A St

3 Henry 4 Fifth St

Would that be desirable?

|||

That would be great, but I am not sure how to go about doing that. Could you explain or provide an example?

Thanks

-John

|||Have you considered using the PIVOT "relational operator" in SQL 2K5? If you have a predictable maximum number of addresses, it would be easier, but you might be able to do something dynamic.|||

Try this one out, though you might have to change it a bit for your particular situation.

DECLARE @.ID int

DECLARE @.address nvarchar(50)

DECLARE @.ALLaddress nvarchar(MAX)

IF EXISTS ( SELECT * FROM sys.tables where [name] = 'Addresses_Flat')

DROP TABLE Addresses_Flat

CREATE TABLE Addresses_Flat

(ID int, [Name] nvarchar(50), [Address] nvarchar(MAX))

DECLARE IDCursor CURSOR FOR

SELECT DISTINCT id

FROM MyTest

OPEN IDCursor

WHILE 1=1 BEGIN

FETCH next FROM IDCursor INTO @.ID

IF @.@.FETCH_STATUS <> 0 BREAK

DECLARE AddressCursor CURSOR FOR

SELECT [Address]

FROM MyTest

WHERE ID = @.ID

SET @.Alladdress = ''

OPEN AddressCursor

WHILE 1=1 BEGIN

FETCH next FROM AddressCursor INTO @.address

IF @.@.FETCH_STATUS <> 0 BREAK

SET @.ALLaddress = @.ALLaddress + ', ' + @.address

END

CLOSE AddressCursor

DEALLOCATE AddressCursor

INSERT INTO Addresses_Flat

SELECT DISTINCT id, [name], @.ALLaddress as [Address]

FROM MyTest

WHERE ID = @.ID

END

CLOSE IDCursor

DEALLOCATE IDCursor

SELECT * FROM Addresses_Flat

DROP TABLE Addresses_Flat

How to Flatten ?

Is there an easy to understand method of bringing this into a flat structure
(all elements related to a line item on one line) ? I do not care if the
xsl is "optimal" with respect to efficiency, I just want it to work with as
little complexity as possible. Please note the multiple line items. I
have seen and understand examples where there is a one to one relationship
between the header and the detail line, but not when there are 3 levels
involved and multiple detail lines.
Thanks,
Rob
<Invoices>
<Invoice id="ABC">
<Supplier>123</Supplier>
<InvoiceItems>
<InvoiceItem
line="1"><NetAmount>2.88</NetAmount><Codes><Code1>ABC</Code1><Code2>DEF</Code2></Codes>
</InvoiceItem>
<InvoiceItem
line="2"><NetAmount>1.88</NetAmount><Codes><Code1>VBH</Code1><Code2>LKJ</Code2></Codes>
</InvoiceItem>
</InvoiceItems>
</Invoice>
<Invoice id="DEF">
<Supplier>568</Supplier>
<InvoiceItems>
<InvoiceItem
line="1"><NetAmount>7.00</NetAmount><Codes><Code1>VPP</Code1><Code2>LRE</Code2></Codes>
</InvoiceItem>
<InvoiceItem
line="2"><NetAmount>9.00</NetAmount><Codes><Code1>ZAQ</Code1><Code2>WIJ</Code2></Codes>
</InvoiceItem>
</InvoiceItems>
</Invoice>
</Invoices>
"Rob C" <rwc1960@.bellsouth.net> wrote in message
news:%KB7d.14101$yp.4743@.bignews1.bellsouth.net...
> Is there an easy to understand method of bringing this into a flat
structure
> (all elements related to a line item on one line) ? I do not care if
the
Rob,
Does this work for you:
DECLARE @.xml varchar(2000)
set @.xml =
'<Invoices>
<Invoice id="ABC">
<Supplier>123</Supplier>
<InvoiceItems>
<InvoiceItem
line="1"><NetAmount>2.88</NetAmount><Codes><Code1>ABC</Code1><Code2>DEF</Cod
e2></Codes>
</InvoiceItem>
<InvoiceItem
line="2"><NetAmount>1.88</NetAmount><Codes><Code1>VBH</Code1><Code2>LKJ</Cod
e2></Codes>
</InvoiceItem>
</InvoiceItems>
</Invoice>
<Invoice id="DEF">
<Supplier>568</Supplier>
<InvoiceItems>
<InvoiceItem
line="1"><NetAmount>7.00</NetAmount><Codes><Code1>VPP</Code1><Code2>LRE</Cod
e2></Codes>
</InvoiceItem>
<InvoiceItem
line="2"><NetAmount>9.00</NetAmount><Codes><Code1>ZAQ</Code1><Code2>WIJ</Cod
e2></Codes>
</InvoiceItem>
</InvoiceItems>
</Invoice>
</Invoices>'
DECLARE @.hDoc int
EXEC sp_xml_preparedocument @.hDoc output, @.xml
SELECT * FROM OPENXML(@.hdoc, '/Invoices/Invoice/InvoiceItems/InvoiceItem',
2)
WITH (
InvoiceId char(3) '../../@.id',
Supplier varchar(20) '../../Supplier',
line char(1) '@.line',
Code1 char(3) 'Codes/Code1',
Code2 char(3) 'Codes/Code2',
NetAmount numeric(9,2) 'NetAmount')
EXEC sp_xml_removedocument @.hDoc
sql