Thursday, March 29, 2012

Date range problem

Hello all,

I am trying to write a query that compares a member's enrollment period
with the products that their group has had during that period (all
members belong to a group and the products that the member has are
based on that group). I need to get the date range for all products
that the member had during their enrollment.

Here are a few rules:
- In the source table there are some group products that have two
ranges that are really contiguous. This is because another column that
we don't care about may have changed between those two periods. If the
end_date = DATEADD(dy, 1, start_date) then the two periods are actually
contiguous. These should only appear as one row in the output.
- If the gap is greater than one day then two rows should result
- If the product changes, of course it should be two rows in the output
- If a group has a product from before the start of the member's
enrollment then the start_date for the row should be the member's
start_date. If the product extends past the member leaving the group
then the end_date should be that of the member.
- In my sample data below I only have as many as two rows back to back
for the same product that are contiguous. In reality there could be
even more than that.

I have SQL that will join the two tables based on either the start or
the end date of the group product falling in the member's enrollment
period, but I'm not sure of the best way to merge the contiguous date
ranges into single rows. Any suggestions?

Erland, despite it being late on a Friday afternoon, the SQL and sample
output are below. ;-)

Thanks,
-Tom.

CREATE TABLE Members (
group_id INT NOT NULL,
member_id INT NOT NULL,
start_date DATETIME NOT NULL,
end_date DATETIME NOT NULL )
GO

ALTER TABLE Membership ADD CONSTRAINT PRIMARY KEY PK_Members PRIMARY
KEY (group_id, member_id)
GO

CREATE TABLE Group_Products (
group_id INT NOT NULL,
product_id INT NOT NULL,
start_date DATETIME NOT NULL,
end_date DATETIME NOT NULL )
GO

ALTER TABLE Group_Products ADD CONSTRAINT PRIMARY KEY PK_Group_Products
PRIMARY KEY (group_id, product_id, start_date)
GO

INSERT INTO Members VALUES (1, 1, '2002-01-01', '9999-12-31')
INSERT INTO Members VALUES (1, 2, '2004-11-01', '9999-12-31')
INSERT INTO Members VALUES (1, 3, '2000-10-01', '2004-12-31')
INSERT INTO Members VALUES (2, 4, '2002-01-01', '2005-01-15')
INSERT INTO Members VALUES (2, 5, '2004-10-01', '9999-12-31')
GO

INSERT INTO Group_Products VALUES (1, 1, '2001-01-01', '2003-12-31')
INSERT INTO Group_Products VALUES (1, 1, '2004-01-01', '2004-11-15')
INSERT INTO Group_Products VALUES (1, 2, '2004-11-16', '9999-12-31')
INSERT INTO Group_Products VALUES (2, 1, '2002-01-01', '2004-11-01')
INSERT INTO Group_Products VALUES (2, 1, '2004-11-15', '9999-12-31')
GO

Expected Output:

group_id member_id product_id start_date end_date
--- --- ---- ---- ----
1 1 1 2002-01-01 2004-11-15
1 1 2 2004-11-16 9999-12-31
1 2 1 2004-11-01 2004-11-15
1 2 2 2004-11-16 9999-12-31
1 3 1 2001-01-01 2004-11-15
1 3 2 2004-11-16 2004-12-31
2 4 1 2002-01-01 2004-11-01
2 4 1 2004-11-15 9999-12-31
2 5 1 2004-10-01 2004-11-01
2 5 1 2004-11-15 9999-12-31Thomas R. Hummel (tom_hummel@.hotmail.com) writes:
> I have SQL that will join the two tables based on either the start or
> the end date of the group product falling in the member's enrollment
> period, but I'm not sure of the best way to merge the contiguous date
> ranges into single rows. Any suggestions?
> Erland, despite it being late on a Friday afternoon, the SQL and sample
> output are below. ;-)

Hey, here on my side it's not late Friday afternoon, it's just about
midnight!

Anyway, this looks like it will work, but please test further:

SELECT m.group_id, m.member_id, gp.product_id, gp.start_date, gp.end_date
FROM Members m
JOIN (SELECT a.group_id, a.product_id, a.start_date,
end_date = MIN(b.end_date)
FROM Group_Products a
JOIN Group_Products b ON a.group_id = b.group_id
AND a.product_id = b.product_id
WHERE a.start_date <= b.start_date
AND NOT EXISTS
(SELECT *
FROM Group_Products c
WHERE a.group_id = c.group_id
AND a.product_id = c.product_id
AND dateadd(DAY, -1, a.start_date) = c.end_date)
AND NOT EXISTS
(SELECT *
FROM Group_Products d
WHERE b.group_id = d.group_id
AND b.product_id = d.product_id
AND b.end_date = dateadd(DAY, -1, d.start_date))
GROUP BY a.group_id, a.product_id, a.start_date) gp
ON m.group_id = gp.group_id
ORDER BY m.group_id, m.member_id, gp.product_id, gp.start_date

All the fun goes on the derived table. First I make partial self-join
between two instances of Group_Products, so I get all combinations
of start_date, end_date for a group/product combo. The first WHERE
condition filters away those where the left table has start_date later
then than to the right. Then the two NOT EXISTS does away with the
rows where end_date to the left has an adjacent start_date and
vice versa.

Here is some augmented test data:

INSERT INTO Group_Products VALUES (1, 1, '2001-01-01', '2003-12-31')
INSERT INTO Group_Products VALUES (1, 1, '2004-01-01', '2004-01-05')
INSERT INTO Group_Products VALUES (1, 1, '2004-01-06', '2004-11-15')
INSERT INTO Group_Products VALUES (1, 1, '2005-01-06', '2005-11-15')
INSERT INTO Group_Products VALUES (1, 2, '2004-11-16', '9999-12-31')
INSERT INTO Group_Products VALUES (2, 1, '2002-01-01', '2004-11-01')
INSERT INTO Group_Products VALUES (2, 1, '2004-11-15', '9999-12-31')

There's three adjacent periods for 1/1, and then there is a non-adjcent
rerun as well.

--
Erland Sommarskog, SQL Server MVP, esquel@.sommarskog.se

Books Online for SQL Server SP3 at
http://www.microsoft.com/sql/techin.../2000/books.asp|||On 18 Mar 2005 13:01:32 -0800, Thomas R. Hummel wrote:

>Erland, despite it being late on a Friday afternoon, the SQL and sample
>output are below. ;-)

Hi Tom,

I hope this doesn't mean that only Erland may reply?

Let's break this down in easy steps.

First, create a query that uses lots of self-joins and not exists
queries to join contiguous group_products. The logic is: only take rows
that are the start of a contiguous set (no row has end_date equal to
start_date - 1); join these to all rows with same group/product that
don't start earlier and that are not themselves the start of a
contiguous set. In SQL:

SELECT a.group_id, a.product_id, a.start_date, MAX(b.end_date)
FROM Group_Products AS a
INNER JOIN Group_Products AS b
ON b.group_id = a.group_id
AND b.product_id = a.product_id
AND b.start_date >= a.start_date
AND NOT EXISTS (SELECT *
FROM Group_Products AS c
WHERE c.group_id = a.group_id
AND c.product_id = a.product_id
AND c.start_date > a.start_date
AND c.start_date < b.end_date
AND NOT EXISTS (SELECT *
FROM Group_Products AS d
WHERE d.group_id = c.group_id
AND d.product_id =
c.product_id
AND d.end_date =
DATEADD(day, -1, c.start_date)))
WHERE NOT EXISTS (SELECT *
FROM Group_Products AS e
WHERE e.group_id = a.group_id
AND e.product_id = a.product_id
AND e.end_date = DATEADD(day, -1,
a.start_date))
GROUP BY a.group_id, a.product_id, a.start_date

Converting NOT EXISTS to outer joins can be interesting. It sometimes
speeds up the query. And if you do it with nested NOT EXISTS, you end up
with nested outer joins - always fun, at friday midnight! <g
SELECT a.group_id, a.product_id, a.start_date, MAX(b.end_date)
FROM Group_Products AS a
INNER JOIN Group_Products AS b
ON b.group_id = a.group_id
AND b.product_id = a.product_id
AND b.start_date >= a.start_date
LEFT JOIN Group_Products AS c
LEFT JOIN Group_Products AS d
ON d.group_id = c.group_id
AND d.product_id = c.product_id
AND d.end_date = DATEADD(day, -1, c.start_date)
ON c.group_id = a.group_id
AND c.product_id = a.product_id
AND c.start_date > a.start_date
AND c.start_date < b.end_date
AND d.group_id IS NULL
LEFT JOIN Group_Products AS e
ON e.group_id = a.group_id
AND e.product_id = a.product_id
AND e.end_date = DATEADD(day, -1, a.start_date)
WHERE e.group_id IS NULL
AND c.group_id IS NULL
GROUP BY a.group_id, a.product_id, a.start_date

In this case, I saw no performance difference. Test it on your own data
to see how it works for you. I'll stick with this version for the rest
of the post (if only to prevent linewrapping). you use whichever suits
you best.

Once we have this result, the rest is easy. Just use the query above as
derived table (or you could create a view) and join that to the members
table. Use CASE to find out which start_date and end_date to display:

SELECT m.group_id, m.member_id, g.product_id,
CASE WHEN m.start_date < g.start_date
THEN g.start_date
ELSE m.start_date END AS start_date,
CASE WHEN m.end_date > g.end_date
THEN g.end_date
ELSE m.end_date END AS end_date
FROM Members AS m
INNER JOIN (SELECT a.group_id, a.product_id, a.start_date,
MAX(b.end_date) AS end_date
FROM Group_Products AS a
INNER JOIN Group_Products AS b
ON b.group_id = a.group_id
AND b.product_id = a.product_id
AND b.start_date >= a.start_date
LEFT JOIN Group_Products AS c
LEFT JOIN Group_Products AS d
ON d.group_id = c.group_id
AND d.product_id = c.product_id
AND d.end_date = DATEADD(day, -1, c.start_date)
ON c.group_id = a.group_id
AND c.product_id = a.product_id
AND c.start_date > a.start_date
AND c.start_date < b.end_date
AND d.group_id IS NULL
LEFT JOIN Group_Products AS e
ON e.group_id = a.group_id
AND e.product_id = a.product_id
AND e.end_date = DATEADD(day, -1, a.start_date)
WHERE e.group_id IS NULL
AND c.group_id IS NULL
GROUP BY a.group_id, a.product_id, a.start_date) AS g
ON g.group_id = m.group_id
AND g.start_date <= m.end_date
AND g.end_date >= m.start_date
ORDER BY m.group_id, m.member_id, g.product_id, start_date

This query produces the requested output when I run it on your test
data. I didn't test it on other input data.

Best, Hugo
--

(Remove _NO_ and _SPAM_ to get my e-mail address)|||On Fri, 18 Mar 2005 23:03:10 +0000 (UTC), Erland Sommarskog wrote:

>Anyway, this looks like it will work, but please test further:

Drat! You did manage to beat me to it...

Have a nice weekend!

Best, Hugo
--

(Remove _NO_ and _SPAM_ to get my e-mail address)|||"Thomas R. Hummel" <tom_hummel@.hotmail.com> wrote in message
news:1111179692.412743.218250@.f14g2000cwb.googlegr oups.com...
> Hello all,
> I am trying to write a query that compares a member's enrollment period
> with the products that their group has had during that period (all
> members belong to a group and the products that the member has are
> based on that group). I need to get the date range for all products
> that the member had during their enrollment.
> Here are a few rules:
> - In the source table there are some group products that have two
> ranges that are really contiguous. This is because another column that
> we don't care about may have changed between those two periods. If the
> end_date = DATEADD(dy, 1, start_date) then the two periods are actually
> contiguous. These should only appear as one row in the output.
> - If the gap is greater than one day then two rows should result
> - If the product changes, of course it should be two rows in the output
> - If a group has a product from before the start of the member's
> enrollment then the start_date for the row should be the member's
> start_date. If the product extends past the member leaving the group
> then the end_date should be that of the member.
> - In my sample data below I only have as many as two rows back to back
> for the same product that are contiguous. In reality there could be
> even more than that.
> I have SQL that will join the two tables based on either the start or
> the end date of the group product falling in the member's enrollment
> period, but I'm not sure of the best way to merge the contiguous date
> ranges into single rows. Any suggestions?
> Erland, despite it being late on a Friday afternoon, the SQL and sample
> output are below. ;-)
> Thanks,
> -Tom.
> CREATE TABLE Members (
> group_id INT NOT NULL,
> member_id INT NOT NULL,
> start_date DATETIME NOT NULL,
> end_date DATETIME NOT NULL )
> GO
> ALTER TABLE Membership ADD CONSTRAINT PRIMARY KEY PK_Members PRIMARY
> KEY (group_id, member_id)
> GO
> CREATE TABLE Group_Products (
> group_id INT NOT NULL,
> product_id INT NOT NULL,
> start_date DATETIME NOT NULL,
> end_date DATETIME NOT NULL )
> GO
> ALTER TABLE Group_Products ADD CONSTRAINT PRIMARY KEY PK_Group_Products
> PRIMARY KEY (group_id, product_id, start_date)
> GO
> INSERT INTO Members VALUES (1, 1, '2002-01-01', '9999-12-31')
> INSERT INTO Members VALUES (1, 2, '2004-11-01', '9999-12-31')
> INSERT INTO Members VALUES (1, 3, '2000-10-01', '2004-12-31')
> INSERT INTO Members VALUES (2, 4, '2002-01-01', '2005-01-15')
> INSERT INTO Members VALUES (2, 5, '2004-10-01', '9999-12-31')
> GO
> INSERT INTO Group_Products VALUES (1, 1, '2001-01-01', '2003-12-31')
> INSERT INTO Group_Products VALUES (1, 1, '2004-01-01', '2004-11-15')
> INSERT INTO Group_Products VALUES (1, 2, '2004-11-16', '9999-12-31')
> INSERT INTO Group_Products VALUES (2, 1, '2002-01-01', '2004-11-01')
> INSERT INTO Group_Products VALUES (2, 1, '2004-11-15', '9999-12-31')
> GO
> Expected Output:
> group_id member_id product_id start_date end_date
> --- --- ---- ---- ----
> 1 1 1 2002-01-01 2004-11-15
> 1 1 2 2004-11-16 9999-12-31
> 1 2 1 2004-11-01 2004-11-15
> 1 2 2 2004-11-16 9999-12-31
> 1 3 1 2001-01-01 2004-11-15
> 1 3 2 2004-11-16 2004-12-31
> 2 4 1 2002-01-01 2004-11-01
> 2 4 1 2004-11-15 9999-12-31
> 2 5 1 2004-10-01 2004-11-01
> 2 5 1 2004-11-15 9999-12-31

The following view will coalesce ranges. It's been separated out from
the main query for readability and other applicability.

CREATE VIEW GroupProductsCoalesced
(group_id, product_id, start_date, end_date)
AS
SELECT P.group_id, P.product_id,
MIN(P.start_date) AS start_date, P.end_date
FROM (SELECT P2.group_id, P2.product_id,
P2.start_date, MIN(P1.end_date) AS end_date
FROM (SELECT group_id, product_id, start_date, end_date
FROM Group_Products AS P1
WHERE NOT EXISTS
(SELECT *
FROM Group_Products AS P2
WHERE P1.group_id = P2.group_id AND
P1.product_id = P2.product_id AND
P1.end_date = P2.start_date - 1))
AS P1
INNER JOIN
Group_Products AS P2
ON P1.group_id = P2.group_id AND
P1.product_id = P2.product_id AND
P1.start_date >= P2.start_date
GROUP BY P2.group_id, P2.product_id, P2.start_date) AS P
GROUP BY P.group_id, P.product_id, P.end_date

SELECT M.group_id, M.member_id, P.product_id,
CASE WHEN P.start_date >= M.start_date
THEN P.start_date
ELSE M.start_date
END AS start_date,
CASE WHEN P.end_date <= M.end_date
THEN P.end_date
ELSE M.end_date
END AS end_date
FROM Members AS M
INNER JOIN
GroupProductsCoalesced AS P
ON M.group_id = P.group_id AND
P.start_date < M.end_date AND
P.end_date > M.start_date
ORDER BY M.group_id, M.member_id, P.product_id, start_date

group_id member_id product_id start_date end_date
1 1 1 2002-01-01 00:00:00.000 2004-11-15 00:00:00.000
1 1 2 2004-11-16 00:00:00.000 9999-12-31 00:00:00.000
1 2 1 2004-11-01 00:00:00.000 2004-11-15 00:00:00.000
1 2 2 2004-11-16 00:00:00.000 9999-12-31 00:00:00.000
1 3 1 2001-01-01 00:00:00.000 2004-11-15 00:00:00.000
1 3 2 2004-11-16 00:00:00.000 2004-12-31 00:00:00.000
2 4 1 2002-01-01 00:00:00.000 2004-11-01 00:00:00.000
2 4 1 2004-11-15 00:00:00.000 2005-01-15 00:00:00.000
2 5 1 2004-10-01 00:00:00.000 2004-11-01 00:00:00.000
2 5 1 2004-11-15 00:00:00.000 9999-12-31 00:00:00.000

--
JAG|||I ended up using a method similar to this, but encapsulating the main
logic for determining the ranges within a view as John had suggested. I
checked about a dozen different cases and it looks like it is working
correctly. Performance isn't too bad either.

Thanks to everyone for the suggestions.

-Tom.|||Well, for some reason my original reply to this never seemed to get
posted, so...

>On 18 Mar 2005 13:01:32 -0800, Thomas R. Hummel wrote:
>>Erland, despite it being late on a Friday afternoon, the SQL and
sample
>>output are below. ;-)
> I hope this doesn't mean that only Erland may reply?

Not at all. Last week I posted a question to the newsgroup and I
neglected to include all of the CREATE TABLE, INSERT, and expected
outcome information. Erland gave me a (minor) hard time about it and my
excuse was that it was late on a Friday. I was just making a reference
to that.

Thanks for your solution!

-Tom.|||On 22 Mar 2005 13:27:41 -0800, Thomas R. Hummel wrote:

>> I hope this doesn't mean that only Erland may reply?
>Not at all.
(snip)

Hi Tom,

Woops, I appear to have caused confusion.

I meant to write a smiley on that line, but I apparently forgot it. Here
is is: :-)

Best, Hugo
--

(Remove _NO_ and _SPAM_ to get my e-mail address)

No comments:

Post a Comment