I am struggling with implementing pagination for a REST API in a Microsoft Fabric Data Factory pipeline. I have read the official Microsoft documentation and have tried various things but to no avail. In my scenario, I am using a Copy Data activity which connects to the API endpoint and outputs the resultant JSON file to a lakehouse folder.
The API endpoint URL is in the following format (example):
={startdatetime}&end={enddatetime}&limit=200&page={page}
The REST API returns the following values in the response body:
{
"total_count": 325,
"page": 1,
"total_pages": 2,
"result": [ array of result objects ]
}
The Source connection details are as follows (I have not specified a value for the page parameter in the query string):
Relative URL: :00:00Z&end=2025-01-29T00:00:00Z&limit=200&page=
Pagination rules:
My assumption was that Data Factory would dynamically increase the value of the page parameter by 1 until no more data was returned. I have tried variations where the 'page' parameter is wrapped in curly braces like so: {page} but no matter what, I only ever get the first page of results returned.
I'd like to avoid having to wrap the activity in a loop, as it is already in a loop because I need to fetch data for a range of days and nested loops are not supported in Data Factory. I'm aware that I could use a child pipeline for the inner loop, but I want to try see if the pagination will work first before I go down that route.
I am struggling with implementing pagination for a REST API in a Microsoft Fabric Data Factory pipeline. I have read the official Microsoft documentation and have tried various things but to no avail. In my scenario, I am using a Copy Data activity which connects to the API endpoint and outputs the resultant JSON file to a lakehouse folder.
The API endpoint URL is in the following format (example):
https://api.somecompany.com/data?start={startdatetime}&end={enddatetime}&limit=200&page={page}
The REST API returns the following values in the response body:
{
"total_count": 325,
"page": 1,
"total_pages": 2,
"result": [ array of result objects ]
}
The Source connection details are as follows (I have not specified a value for the page parameter in the query string):
Relative URL: https://api.somecompany.com/data?start=2025-01-29T00:00:00Z&end=2025-01-29T00:00:00Z&limit=200&page=
Pagination rules:
My assumption was that Data Factory would dynamically increase the value of the page parameter by 1 until no more data was returned. I have tried variations where the 'page' parameter is wrapped in curly braces like so: {page} but no matter what, I only ever get the first page of results returned.
I'd like to avoid having to wrap the activity in a loop, as it is already in a loop because I need to fetch data for a range of days and nested loops are not supported in Data Factory. I'm aware that I could use a child pipeline for the inner loop, but I want to try see if the pagination will work first before I go down that route.
You can use below configurations in the copy activity source REST API to achieve your requirement.
First give the page number value with any parameter in between {}
in the dataset Relative URL as shown below.
?page={Offset}&size=20
In the copy activity source, give the Range and End condition as below.
EndCondition - $.<your_object> - Nonexist
QueryParameter - {Offset} - Range - <your_page_start_value>,End,1
Give your sink and debug the pipeline. Now, all data from the REST API pages will be copied to the target location as below sample.