Copying Azure Data Factory SFTP data to ADLS - Stack Overflow

admin2025-04-29  2

I am trying to use Metadata activity such a way that I want to be able to get all the files based on the last_modified_timestamp param and based on this output, I should be able to copy the data to ADLS.

this below is the directory structure in SFTP server. there are files in rootpath itself, and bunch of subdirectories I want to capture all the csv files,zipfiles, anything that's File Format.

rootpath/
        test.csv
        test1.csv
        
rootpath/
        dir1/
            test2.csv
            test3.zip
rootpath/
        dir2/
            dir3/
                test4.csv
                test5.csv
                test6.zip

I've tried Wildcard Path to capture all the possible files starting from the rootpath in dataset I defined.

rootpath/**/**/**

However this did not give output for files that are in rootpath that is test.csv, test1.csv

Is there a dynamic way to do this and efficient? I saw some post regarding using forEach activity, but seems to be an overkill..

my desired output sample is below.

"childItems": [
        {
            "name": "test.csv",
            "type": "File"
        },
        {
            "name": "test1.csv",
            "type": "File"
        },
        {
            "name": "test2.csv",
            "type": "File"
        },
        {
            "name": "test3.zip",
            "type": "File"
        },

...

        {
            "name": "test6.zip",
            "type": "File"
        },

I am trying to use Metadata activity such a way that I want to be able to get all the files based on the last_modified_timestamp param and based on this output, I should be able to copy the data to ADLS.

this below is the directory structure in SFTP server. there are files in rootpath itself, and bunch of subdirectories I want to capture all the csv files,zipfiles, anything that's File Format.

rootpath/
        test.csv
        test1.csv
        
rootpath/
        dir1/
            test2.csv
            test3.zip
rootpath/
        dir2/
            dir3/
                test4.csv
                test5.csv
                test6.zip

I've tried Wildcard Path to capture all the possible files starting from the rootpath in dataset I defined.

rootpath/**/**/**

However this did not give output for files that are in rootpath that is test.csv, test1.csv

Is there a dynamic way to do this and efficient? I saw some post regarding using forEach activity, but seems to be an overkill..

my desired output sample is below.

"childItems": [
        {
            "name": "test.csv",
            "type": "File"
        },
        {
            "name": "test1.csv",
            "type": "File"
        },
        {
            "name": "test2.csv",
            "type": "File"
        },
        {
            "name": "test3.zip",
            "type": "File"
        },

...

        {
            "name": "test6.zip",
            "type": "File"
        },

Share Improve this question edited Jan 7 at 4:18 Daniel Mann 59.2k13 gold badges105 silver badges127 bronze badges asked Jan 7 at 0:31 JasonMJasonM 156 bronze badges 2
  • have you facing any error? – Pratik Lad Commented Jan 7 at 3:33
  • No errors so far. – JasonM Commented Jan 7 at 4:44
Add a comment  | 

1 Answer 1

Reset to default 0

In azure data factory if you want all the files contained at any level of a nested a folder subtree Recursive metadata is not possible. The output of getmetadata includes only files in the specified path to get nested values you need to iterate on output directories of getmetadata values. using foreach loop this way you can go up to 1 level deep.

@Richard Swinbank here Get Metadata recursively in Azure Data Factory document discussed the Workaround for same situation using until loop and some variables.

And then you can use these paths to get last_modified_timestamp further using get metadata activity.

转载请注明原文地址:http://anycun.com/QandA/1745937545a91371.html