performance - Pentaho Data Migration Job Running but Not Migrating Data - Stack Overflow

admin2025-05-01  0

I am working on optimizing a data migration task using Pentaho 9.4. My job, parallel-mandate-migration, is designed to migrate data in partitions for improved performance. Although the job executes successfully without errors, no data is being migrated.

Here are the details of the job structure and configurations:

Job Structure

Job Name: parallel-mandate-migration

Start

  • Set Variables Transformation
  • Reads a config.properties file.
  • Sets environment variables like START_ID, END_ID, etc.

Generate ID Pagination Sequence Transformation

  • SQL query to divide MANDATE_MASTER table into partitions:
WITH numbered_rows AS 
(
    SELECT 
        MM_ID, 
        ROW_NUMBER() OVER (ORDER BY MM_ID) AS row_num, 
        (SELECT COUNT(*) FROM MANDATE_MASTER 
         WHERE MM_STATUS IS NOT NULL) AS total_count 
    FROM 
        MANDATE_MASTER 
    WHERE 
        MM_STATUS IS NOT NULL 
),
partition_calc AS 
(
    SELECT 
        MM_ID, 
        FLOOR((row_num - 1) * 4.0 / total_count) as PARTITION_NUMBER 
    FROM 
        numbered_rows
)
SELECT 
    PARTITION_NUMBER, 
    MIN(MM_ID) AS START_ID, MAX(MM_ID) AS END_ID, 
    COUNT(*) AS PARTITION_SIZE 
FROM 
    partition_calc 
GROUP BY 
    PARTITION_NUMBER 
ORDER BY 
    PARTITION_NUMBER;

Result is passed to the next step via Copy rows to result.

Process Partitions Transformation

  • Uses Job Executor to execute the subjob migrate-mandate-job.kjb for each partition.
  • Passes START_ID and END_ID as parameters to the subjob.

Success

Subjob Name: migrate-mandate-job.kjb

Start

Transformation SQL Query in Table Input step:

SELECT * 
FROM MANDATE_MASTER
WHERE MM_ID >= ${START_ID} 
  AND MM_ID <= ${END_ID} 
  AND MM_STATUS IS NOT NULL

Data is processed and written to the target table using Table Output.

Success

Issue

The job completes successfully without errors, but no data is migrated to the target database.

Logs show no significant issues, and all steps report

success.

Debugging Steps Taken 1.Verified SQL queries in Generate ID Pagination Sequence and migrate-mandate-job.kjb manually. They produce the correct results in the database. 2.Confirmed that variables like START_ID and END_ID are set correctly in the logs. 3.Tested the subjob independently with sample START_ID and END_ID. It works as expected. 4.Enabled verbose logging but found no errors or warnings indicating a failure.

Request What could be the reason for the data not migrating, and how can I debug or resolve this issue? Are there any specific configurations or settings in Pentaho that could affect partitioned or parallel execution?

Environment

  • Pentaho 9.4
  • Source Database: SQL Server
  • Target Database: SQL Server

Flowchart

  1. Main Job: parallel-mandate-migration

    • Start

    • Set Variables Transformation

      • Reads config.properties and sets necessary variables (e.g., START_ID, END_ID).

    • Generate ID Pagination Sequence

      • Executes SQL query to divide the MANDATE_MASTER table into partitions.

      • Outputs:

        • PARTITION_NUMBER, START_ID, END_ID, PARTITION_SIZE.
      • Sends rows to the result.

    • Process Partitions (Job Executor)

      • Iterates over each partition.

      • Passes parameters:

        • START_ID, END_ID to the subjob migrate-mandate-job.kjb.
    • Success


  1. Subjob: migrate-mandate-job.kjb

    • Start

    • Transformation

      • Table Input: SQL query fetches data using START_ID and END_ID.

      • Modified JavaScript (if any): Transforms data (optional).

      • Table Output: Writes data to the target PostgreSQL table.

    • Success

转载请注明原文地址:http://anycun.com/QandA/1746111974a91835.html