postgresql - SQL for returning number of days since last availability, grouped by category? - Stack Overflow

admin2025-05-02  1

I'm using SQL and I have some dummy car dealership data that for a given dealership tells me whether a particular make is currently being sold there, and if not, when the last time that make was available at that dealership. An example of a few rows of data for one dealership looks like this, focusing only on the fields of interest:

dealership_ID make Available? brought_in_date sold_date
612 BMW Yes 2024-11-23 NULL
612 BMW No 2024-09-13 2024-12-05
612 Audi No 2024-10-15 2024-10-28
612 Audi No 2024-09-06 2024-11-03
612 Mercedes Benz Yes 2024-10-20 NULL

I'm using SQL and I have some dummy car dealership data that for a given dealership tells me whether a particular make is currently being sold there, and if not, when the last time that make was available at that dealership. An example of a few rows of data for one dealership looks like this, focusing only on the fields of interest:

dealership_ID make Available? brought_in_date sold_date
612 BMW Yes 2024-11-23 NULL
612 BMW No 2024-09-13 2024-12-05
612 Audi No 2024-10-15 2024-10-28
612 Audi No 2024-09-06 2024-11-03
612 Mercedes Benz Yes 2024-10-20 NULL

What I'm trying to do is return one row per dealership that tells me for given car makes, whether they are currently available and if not, how many days has it been since they were last available

using dealership ID 612 as an example, it would return something like this:

dealership_ID BMW Audi Mercedes Benz Ford
612 0 61 0 NULL

Where BMW and Mercedes are 0 since at least one of each is available right now, Audi is 61 as it has been that many days since an Audi was last available at the dealership (audi with most recent sold_date) and Ford is Null as a ford has never been sold at this particular dealership before.

Share Improve this question edited Jan 2 at 19:40 jarlh 44.8k8 gold badges50 silver badges67 bronze badges asked Jan 2 at 13:53 dom_2108dom_2108 496 bronze badges 1
  • 3 The SQL language a strict rule: you MUST know the number of columns in the output before looking at any data. If you need to generate columns dynamically from the data, you will NOT be able to do this in one query. You will have to run a separate query first to find out about your columns, and then use that to dynamically generate a new query with the columns listed out in the query text. – Joel Coehoorn Commented Jan 2 at 14:49
Add a comment  | 

3 Answers 3

Reset to default 2

Well you can always simplify things and just use the with clause as in:

WITH LatestAvailability AS (
    SELECT 
        dealership_ID,
        make,
        MAX(CASE WHEN Available = 'Yes' THEN 1 ELSE 0 END) AS is_available,
        MAX(sold_date) AS last_sold_date
    FROM dealership_data
    GROUP BY dealership_ID, make
),
DaysSinceLastAvailable AS (
    SELECT
        dealership_ID,
        make,
        CASE
            WHEN is_available = 1 THEN 0
            WHEN last_sold_date IS NOT NULL THEN DATEDIFF(DAY, last_sold_date, GETDATE())
            ELSE NULL
        END AS days_since_last_available
    FROM LatestAvailability
),
PivotedData AS (
    SELECT
        dealership_ID,
        make,
        days_since_last_available
    FROM DaysSinceLastAvailable
)
SELECT
    pd.dealership_ID,
    MAX(CASE WHEN pd.make = 'BMW' THEN pd.days_since_last_available ELSE NULL END) AS BMW,
    MAX(CASE WHEN pd.make = 'Audi' THEN pd.days_since_last_available ELSE NULL END) AS Audi,
    MAX(CASE WHEN pd.make = 'Mercedes Benz' THEN pd.days_since_last_available ELSE NULL END) AS `Mercedes Benz`,
    MAX(CASE WHEN pd.make = 'Ford' THEN pd.days_since_last_available ELSE NULL END) AS Ford
FROM PivotedData pd
GROUP BY pd.dealership_ID;

Break through:

  1. Filter the dataset: For every dealership and make, find out whether the car is in stock now or the most recent sold_date for out-of-stock cars.
  2. Days since last in stock: Calculate date difference between today and the most recent sold_date for not currently in stock cars.
  3. Pivot the data: Rows to columns on each car make, respective values 0, the number of days or NULL.
dealership_ID BMW Audi Mercedes Benz Ford
612 0 61 0 NULL

Rather than messing with dynamic pivots, you should just return the result in long form.

You can use some fairly simple left-join and aggregation logic for this, but you need a table which lists all possible Makes, or put them in a VALUES clause.

SELECT
  m.make,
  CASE WHEN COUNT(*) FILTER (WHERE available AND sold_date IS NULL) > 0 THEN 0
       ELSE current_date - MAX(sold_date)
       END
FROM Make m
LEFT JOIN CarAvailability ca ON ca.make = m.make
  AND ca.dealership_id = 612
GROUP BY
  m.make;

db<>fiddle

While it's undefined which car makes to report on, a pivoted form like you display is hard to come by. SQL demands to know result columns beforehand. See:

  • Execute a dynamic crosstab query

Assuming a table of car makes, this basic, un-pivoted query with a LATERAL subquery should be as fast as it gets:

SELECT m.make, d.*
FROM   car_make m
LEFT   JOIN LATERAL (
   SELECT COALESCE(current_date - sold_date, 0) AS on_sale_indicator
   FROM   deal d
   WHERE  d.dealership_id = 612
   AND    d.make_id = m.make_id
   ORDER  BY available DESC, sold_date DESC  -- NULL comes first by design
   LIMIT  1
   ) d ON true
ORDER  BY make_id;  -- ?

fiddle

Assumes a proper table design with some columns defined NOT NULL.
And an index on deal(dealership_id, available, sold_date).

(But it would seem the column available is redundant to begin with.)

Note that true sorts before false, and null sorts before other values in descending order. See:

  • ORDER BY column, with specific notnull value LAST
  • Sort by column ASC, but NULL values first?

About the base technique:

  • Optimize GROUP BY query to retrieve latest row per user
转载请注明原文地址:http://anycun.com/QandA/1746116817a91902.html