sql - How to find overlapping records and select the latest record? - Stack Overflow

admin2025-04-17  2

I am trying to get on record from a list of overlapping records in Presto(Trillo) DB. There are columns like loc_id, prod_id, line_id, start_dt, end_dt, update_dt.

The records are like this

( source records)

loc_id prod_id line_id qty start_dt end_dt upd_dt
200 10 1 500 30-1-2025:9:00 30-1-2025:15:00 30-1-2025:15:05
200 10 1 600 30-1-2025:11:00 30-1-2025:16:00 30-1-2025:16:05
200 10 1 550 30-1-2025:12:00 30-1-2025:18:00 30-1-2025:18:05
200 10 1 350 30-1-2025:18:00 30-1-2025:22:00 30-1-2025:22:05

I am trying to get on record from a list of overlapping records in Presto(Trillo) DB. There are columns like loc_id, prod_id, line_id, start_dt, end_dt, update_dt.

The records are like this

( source records)

loc_id prod_id line_id qty start_dt end_dt upd_dt
200 10 1 500 30-1-2025:9:00 30-1-2025:15:00 30-1-2025:15:05
200 10 1 600 30-1-2025:11:00 30-1-2025:16:00 30-1-2025:16:05
200 10 1 550 30-1-2025:12:00 30-1-2025:18:00 30-1-2025:18:05
200 10 1 350 30-1-2025:18:00 30-1-2025:22:00 30-1-2025:22:05

There is no specific key column. In the above data, the first 3 records overlap based on start and end date timestamp. But the start date of 4th record is the end date of 3rd record.

As an output I want to select only the latest record from the first 3 records and the 4th record.
So select the latest record from the overlapping records.
If the records are not overlapping , select that records.

Expected output

loc_id prod_id line_id qty start_dt end_dt upd_dt
200 10 1 550 30-1-2025:12:00 30-1-2025:18:00 30-1-2025:18:05
200 10 1 350 30-1-2025:18:00 30-1-2025:22:00 30-1-2025:22:05

select 
loc_id
prod_id, 
line_id, 
start_dt,
end_dt, 
qty, 
upd_dt
from
(select *, 
rank() over (partition by a.loc_id , 
                          a.prod_id, 
                          a.line_id 
                          order by  cast(upd_dt as timestamp )desc) as rnk
from 
tbl a
) tt
--where tt.rnk = 1
order by tt.rnk

I tried using a rank function but it is difficult to rank it with time stamp. The overlapping records are an issue.

A similar sql as above. I cannot display exact SQL or data due to restrictions

Share Improve this question edited Jan 31 at 7:10 Guru Stron 144k11 gold badges172 silver badges212 bronze badges asked Jan 31 at 6:03 ParagParag 231 silver badge4 bronze badges
Add a comment  | 

1 Answer 1

Reset to default 0

If I understand the task correctly then you can rephrase it as "I want data split into groups in a way that for every row previous row ended later then the current one and from each group I need the latest record". Then you can use approach derived from gaps-and-islands:

-- sample data
WITH dataset(loc_id, prod_id, line_id, qty, start_dt, end_dt, upd_dt) as (
    values (200, 10, 1, 500, timestamp '2025-01-25 09:00', timestamp '2025-01-30 15:00', '30-1-2025:15:05'),
        (200, 10, 1, 600, timestamp '2025-01-25 11:00', timestamp '2025-01-30 16:00', '30-1-2025:16:05'),
        (200, 10, 1, 550, timestamp '2025-01-25 12:00', timestamp '2025-01-30 18:00', '30-1-2025:18:05'),
        (200, 10, 1, 350, timestamp '2025-01-25 18:00', timestamp '2025-01-30 22:00', '30-1-2025:22:05')
),

-- query parts
with_leads as (
    select *,
           lead(start_dt) over (partition by loc_id, prod_id, line_id order by start_dt) next_start -- determine when next row starts
    from dataset
),
with_groups as (
    select *,
        sum(if(next_start < end_dt, 0, 1)) over (partition by loc_id, prod_id, line_id order by start_dt) grp -- form groups
    from with_leads
),
with_rns as(
    select *,
       row_number() over(partition by loc_id, prod_id, line_id, grp order by start_dt desc) rn -- prepare to select latest started in the group
    from with_groups
)

select loc_id, prod_id, line_id, qty, start_dt, end_dt, upd_dt
from with_rns
where rn = 1;

Output:

loc_id prod_id line_id qty start_dt end_dt upd_dt
200 10 1 550 2025-01-25 12:00:00 2025-01-30 18:00:00 30-1-2025:18:05
200 10 1 350 2025-01-25 18:00:00 2025-01-30 22:00:00 30-1-2025:22:05

Note that this will break if you have gaps in end data (i.e. if you change the end_dt for second row to '2025-01-25 11:30')

转载请注明原文地址:http://anycun.com/QandA/1744878609a88908.html