I like little challenges. The ones that don’t take all day to figure out, but are enough to capture your interest for a little while. Yesterday I had such a problem to solve.
I had some data in a table that was basically in ‘pairs’ of rows. It was actually different to the example below, but the example we’ll use is a ‘Message’ table, that contains requests and replies, that are linked through a particular identifier.
Our simple example looks like this (my actual table had more fields).
CREATE TABLE Message
(
MessageID INT NOT NULL IDENTITY ,
MessageType CHAR(1) NOT NULL,
TransactionID INT NOT NULL,
MessageBody VARCHAR(30),
CreatedDate DATETIME DEFAULT GetDate()
)
We’ll add a bit of sample data (script generated from my insert generator stored proc)
SET IDENTITY_INSERT Message ON
INSERT Message(MessageID,MessageType,TransactionID,MessageBody,CreatedDate) VALUES('1','Q','1','Request Message 1',convert(datetime,'2012-08-30 13:55:07.213',121))
INSERT Message(MessageID,MessageType,TransactionID,MessageBody,CreatedDate) VALUES('2','R','1','Reply Message 1',convert(datetime,'2012-08-30 13:55:37.680',121))
INSERT Message(MessageID,MessageType,TransactionID,MessageBody,CreatedDate) VALUES('3','Q','2','Request Message 2',convert(datetime,'2012-08-30 13:55:51.183',121))
INSERT Message(MessageID,MessageType,TransactionID,MessageBody,CreatedDate) VALUES('4','R','2','Reply Message 2',convert(datetime,'2012-08-30 13:56:04.020',121))
SET IDENTITY_INSERT Message OFF
SELECT * FROM Message
MessageID MessageType TransactionID MessageBody CreatedDate
----------- ----------- ------------- ------------------------------ -----------------------
1 Q 1 Request Message 1 2012-08-30 13:55:07.213
2 R 1 Reply Message 1 2012-08-30 13:55:37.680
3 Q 2 Request Message 2 2012-08-30 13:55:51.183
4 R 2 Reply Message 2 2012-08-30 13:56:04.020
We can see that some of the fields are consistent from row to row (in pairs), and some of the fields are unique to each row. My challenge was to represent a pair of messages in one row.
On the face of it, this seems like it would be simple – just grouping by the TransactionID (the field that links the two rows). The problem is that you won’t be able to get the unique information from both rows without some assumptions (that may not be solid).
For example, this will happily give you the MessageID’s of both sides of the transaction (given the assumption that the request comes before the reply, and that there are two messages in a transaction) …
SELECT TransactionID, MIN(MessageID) AS RequestID, MAX(MessageID) AS ReplyID
FROM [Message]
GROUP BY TransactionID HAVING COUNT(*) = 2
TransactionID RequestID ReplyID
------------- ----------- -----------
1 1 2
2 3 4
But – it’s doesn’t give you the unique data related to each ID, as you’d need to correlate the MessageBody to the right MessageID – MIN(MessageBody) won’t necessarily relate to the ‘Request’.
So… We can think about how to correlate the data to get the result we want. There’s a few options…
1. Use temporary tables, and get the result in two steps (reusing the query above)..
–1 – Two Step Process
SELECT TransactionID, MIN(MessageID) AS RequestID, MAX(MessageID) AS ReplyID
INTO #MessagePair
FROM [Message]
GROUP BY TransactionID HAVING COUNT(*) = 2
SELECT REQ.MessageID AS RequestMessageID,
REQ.TransactionId,
REQ.MessageBody AS RequestBody,
REQ.CreatedDate AS RequestDate,
RPY.MessageID AS ReplyMessageID,
RPY.MessageBody AS ReplyBody,
RPY.CreatedDate AS ReplyDate
FROM #MessagePair MP
INNER JOIN [Message] REQ
ON REQ.MessageID = MP.RequestID
INNER JOIN [Message] RPY
ON RPY.MessageID = MP.ReplyID
RequestMessageID TransactionId RequestBody RequestDate ReplyMessageID ReplyBody ReplyDate
---------------- ------------- ------------------------------ ----------------------- -------------- ------------------------------ -----------------------
1 1 Request Message 1 2012-08-30 13:55:07.213 2 Reply Message 1 2012-08-30 13:55:37.680
3 2 Request Message 2 2012-08-30 13:55:51.183 4 Reply Message 2 2012-08-30 13:56:04.020
2. Nasty correlated subquery and joins (not even going there)
3. Single query that makes use of the assmption that a request happens before a reply (meaning the messageId will be a lower value)
SELECT REQ.MessageID AS RequestMessageID,
REQ.TransactionId,
REQ.MessageBody AS RequestBody,
REQ.CreatedDate AS RequestDate,
RPY.MessageID AS ReplyMessageID,
RPY.MessageBody AS ReplyBody,
RPY.CreatedDate AS ReplyDate
FROM [Message] REQ
INNER JOIN [Message] RPY
ON REQ.TransactionID = RPY.TransactionID
AND REQ.MessageID < RPY.MessageID
This produces the same result as above, and is what I ended up going with. I reckon there’s probably a few more potential viable solutions, so I’d be interested to see anyone’s alternative solutions.