sql - Update with limit and offset - Stack Overflow

admin2025-05-02  0

Simplified scenario:

I want to update one table A (10 million rows) with a value from another table B (20 mil rows). The two tables are linked with ids.

It took more than 7hrs to update the whole thing in one go. (I don't know exactly how long as I stopped the script in the middle.)
So my idea is to update table A in batches using OFFSET and LIMIT clauses. So far with no luck.

Wrapped up in a procedure, the code looks like below:

DECLARE 
    offset_number integer := 0;
    batch_size integer := 1000; 

BEGIN
    LOOP 
        UPDATE TableA temp1
        SET TableA_column_value_to_be_updated = (
            SELECT 
                tableB_column_value
            FROM 
                TableB temp2
            WHERE 
                temp2.id = temp1.id 
                AND some_other_conditions_in_TableB    
            ) 
        WHERE    
             some_other_conditions_in_Table
        OFFSET offset_number 
        LIMIT batch_size ; 
            
        COMMIT; 
    
        offset_number := offset_number + batch_size;
        EXIT WHEN NOT FOUND; 
    END LOOP;
END;

The engine reports an error with exception:

org.jkiss.dbeaver.model.sql.DBSQLException:
SQL Error [42601]: ERROR: syntax error at or near "OFFSET"

I have no idea what it is. Notably, it seems to work without OFFSET and LIMIT.

Any ideas why this would happen? Should I use other loop statement?

Simplified scenario:

I want to update one table A (10 million rows) with a value from another table B (20 mil rows). The two tables are linked with ids.

It took more than 7hrs to update the whole thing in one go. (I don't know exactly how long as I stopped the script in the middle.)
So my idea is to update table A in batches using OFFSET and LIMIT clauses. So far with no luck.

Wrapped up in a procedure, the code looks like below:

DECLARE 
    offset_number integer := 0;
    batch_size integer := 1000; 

BEGIN
    LOOP 
        UPDATE TableA temp1
        SET TableA_column_value_to_be_updated = (
            SELECT 
                tableB_column_value
            FROM 
                TableB temp2
            WHERE 
                temp2.id = temp1.id 
                AND some_other_conditions_in_TableB    
            ) 
        WHERE    
             some_other_conditions_in_Table
        OFFSET offset_number 
        LIMIT batch_size ; 
            
        COMMIT; 
    
        offset_number := offset_number + batch_size;
        EXIT WHEN NOT FOUND; 
    END LOOP;
END;

The engine reports an error with exception:

org.jkiss.dbeaver.model.sql.DBSQLException:
SQL Error [42601]: ERROR: syntax error at or near "OFFSET"

I have no idea what it is. Notably, it seems to work without OFFSET and LIMIT.

Any ideas why this would happen? Should I use other loop statement?

Share Improve this question edited Jan 2 at 7:07 Erwin Brandstetter 662k158 gold badges1.2k silver badges1.3k bronze badges asked Jan 2 at 6:04 UpLevelUpLevel 132 bronze badges 6
  • Please show the complete procedure. The header is an essential part to get the full picture. And your version of Postgres. – Erwin Brandstetter Commented Jan 2 at 6:13
  • 1 The "why" bit is fairly simple to answer — the update statement does not take limit or offset clauses! – Ture Pålsson Commented Jan 2 at 6:24
  • Didn't you ask this exact question just the other day? – Dale K Commented Jan 2 at 8:23
  • Please clarify your specific problem or provide additional details to highlight exactly what you need. As it's currently written, it's hard to tell exactly what you're asking. – Community Bot Commented Jan 2 at 11:18
  • @DaleK, this is the only question I asked so you may confuse me with someone else. I did search relative questions before I posted this one. there were no direct answers to my query. – UpLevel Commented Jan 2 at 22:38
 |  Show 1 more comment

2 Answers 2

Reset to default 1

LIMIT and OFFSET are not in the syntax of an SQL UPDATE statement. You need SELECT for that.
Also, OFFSET scales poorly to "paginate" a big table. Remember the upper bound from the last iteration instead.

Something like this could work:

CREATE OR REPLACE PROCEDURE upd_in_batches(_batch_size int = 1000)
  LANGUAGE plpgsql AS
$proc$
DECLARE
   _id_bound int = 0;  -- or whatever?
BEGIN
   LOOP
      WITH sel AS (
         SELECT a.id  -- id = PK!
         FROM   tablea a
         WHERE  a.id > _id_bound
      -- AND    <some other conditions in Table A>
         ORDER  BY a.id
         LIMIT  _batch_size
         FOR    UPDATE
         )
      , upd AS (
         UPDATE tablea a
         SET    target_col = b.b_source_col
         FROM   sel s
         JOIN   tableb b USING (id)
         WHERE  a.id = s.id
         AND    a.target_col IS DISTINCT FROM b.b_source_col
         )
      SELECT max(id)  -- always returns a row
      FROM   sel
      INTO   _id_bound;

      IF _id_bound IS NULL THEN
          EXIT;  -- no more rows found; we're done, exit loop
      ELSE
         COMMIT;
      END IF;
   END LOOP;
END
$proc$

Use a SELECT statement instead to apply your LIMIT. To avoid race conditions with concurrent writes, throw in a locking clause (FOR UPDATE). You may or may not need that.

You might be able to iUPDATE directly and just increment lower & upper bound for the filter on id instead, which is cheaper. Depends on the details of your setup and requirements. Each has its caveats.

See:

  • Postgres UPDATE ... LIMIT 1
  • How to force COMMIT inside function so other sessions can see updated row?
  • Optimize query with OFFSET on large table
  • How do I (or can I) SELECT DISTINCT on multiple columns?

The issue here is in using OFFSET and LIMIT within the UPDATE statement. Most SQL dialects (like PostgreSQL) do not support using OFFSET and LIMIT in an UPDATE query as they are usually used in SELECT statements.

Here’s an alternative solution:

DECLARE 
    offset_number INTEGER := 0;
    batch_size INTEGER := 1000; 

BEGIN
    LOOP 
        -- Update rows in batches using a subquery to limit the rows processed
        WITH cte AS (
            SELECT temp1.id
            FROM TableA temp1
            WHERE some_other_conditions_in_Table
            ORDER BY temp1.id
            OFFSET offset_number
            LIMIT batch_size
        )
        UPDATE TableA temp1
        SET TableA_column_value_to_be_updated = (
            SELECT tableB_column_value
            FROM TableB temp2
            WHERE temp2.id = temp1.id AND some_other_conditions_in_TableB
        )
        WHERE temp1.id IN (SELECT id FROM cte);

        -- Commit after each batch
        COMMIT;

        -- Exit loop if no more rows are returned
        EXIT WHEN NOT FOUND;

        -- Increment the offset for the next batch
        offset_number := offset_number + batch_size;
    END LOOP;
END;

转载请注明原文地址:http://anycun.com/QandA/1746132065a92017.html