I am working on optimizing a data migration task using Pentaho 9.4. My job, parallel-mandate-migration, is designed to migrate data in partitions for improved performance. Although the job executes successfully without errors, no data is being migrated.
Here are the details of the job structure and configurations:
Job Structure
Job Name: parallel-mandate-migration
Start
Generate ID Pagination Sequence Transformation
MANDATE_MASTER
table into partitions:WITH numbered_rows AS
(
SELECT
MM_ID,
ROW_NUMBER() OVER (ORDER BY MM_ID) AS row_num,
(SELECT COUNT(*) FROM MANDATE_MASTER
WHERE MM_STATUS IS NOT NULL) AS total_count
FROM
MANDATE_MASTER
WHERE
MM_STATUS IS NOT NULL
),
partition_calc AS
(
SELECT
MM_ID,
FLOOR((row_num - 1) * 4.0 / total_count) as PARTITION_NUMBER
FROM
numbered_rows
)
SELECT
PARTITION_NUMBER,
MIN(MM_ID) AS START_ID, MAX(MM_ID) AS END_ID,
COUNT(*) AS PARTITION_SIZE
FROM
partition_calc
GROUP BY
PARTITION_NUMBER
ORDER BY
PARTITION_NUMBER;
Result is passed to the next step via Copy rows to result.
Process Partitions Transformation
Success
Subjob Name: migrate-mandate-job.kjb
Start
Transformation SQL Query in Table Input step:
SELECT *
FROM MANDATE_MASTER
WHERE MM_ID >= ${START_ID}
AND MM_ID <= ${END_ID}
AND MM_STATUS IS NOT NULL
Data is processed and written to the target table using Table Output.
Success
Issue
The job completes successfully without errors, but no data is migrated to the target database.
Logs show no significant issues, and all steps report
success.
Debugging Steps Taken 1.Verified SQL queries in Generate ID Pagination Sequence and migrate-mandate-job.kjb manually. They produce the correct results in the database. 2.Confirmed that variables like START_ID and END_ID are set correctly in the logs. 3.Tested the subjob independently with sample START_ID and END_ID. It works as expected. 4.Enabled verbose logging but found no errors or warnings indicating a failure.
Request What could be the reason for the data not migrating, and how can I debug or resolve this issue? Are there any specific configurations or settings in Pentaho that could affect partitioned or parallel execution?
Environment
Flowchart
Main Job: parallel-mandate-migration
Start
Set Variables Transformation
Reads config.properties
and sets necessary variables (e.g., START_ID
, END_ID
).
↓
Generate ID Pagination Sequence
Executes SQL query to divide the MANDATE_MASTER
table into partitions.
Outputs:
PARTITION_NUMBER
, START_ID
, END_ID
, PARTITION_SIZE
.Sends rows to the result.
↓
Process Partitions (Job Executor)
Iterates over each partition.
Passes parameters:
START_ID
, END_ID
to the subjob migrate-mandate-job.kjb
.↓
Success
Subjob: migrate-mandate-job.kjb
Start
Transformation
Table Input: SQL query fetches data using START_ID
and END_ID
.
Modified JavaScript (if any): Transforms data (optional).
Table Output: Writes data to the target PostgreSQL table.
↓
Success