python - Hydra does not allow any command line script in Azure ML - Stack Overflow

admin2025-04-25  2

I am trying to pass data from one component to the next in Azure ML pipeline. I am able to do it in a simple code.

I have 2 components and I am defining them as below:

components_dir = "."
prep = load_component(source=f"{components_dir}/preprocessing_config.yml")
middle = load_component(source=f"{components_dir}/middle_config.yml")

Then I am defining a pipeline as below:

@pipeline(
    display_name="test_pipeline3",
    tags={"authoring": "sdk"},
    description="test pipeline to test things just like all other test pipelines."
)
def data_pipeline(
    # raw_data: Input,
    compute_train_node: str,
):
   
    prep_node = prep()

    prep_node.outputs.Y_df= Output(type="uri_folder", mode = 'rw_mount', path="path/testing/") 
    prep_node.outputs.S_df= Output(type="uri_folder", mode = 'rw_mount', path="path/testing/")


    transform_node = middle(Y_df=prep_node.outputs.Y_df,
                            S_df=prep_node.outputs.S_df)

The prep node has a script which involves hydra to get in the parameters from a config file. This script also has a config file that kicksoff the script in command line as below:

  python preprocessing_script.py
  --Y_df ${{outputs.Y_df}} 
  --S_df ${{outputs.S_df}}

I try to get the values of Y_df.path and S_df.path in the main function of the prep script as below:

@hydra.main(version_base=None, config_path=".", config_name="config_file")
def main(cfg: DictConfig):

    parser = argparse.ArgumentParser("prep")
    parser.add_argument("--Y_df", type=str, help="Path of prepped data")
    parser.add_argument("--S_df", type=str, help="Path of prepped data")
    args = parser.parse_args()
   
    # Call the preprocessing function with Hydra configurations
    df1,df2 = processing_func(cfg.data_name,cfg.prod_filter)
    df1.to_csv(Path(cfg.Y_df) / "Y_df.csv")
    df2.to_csv(Path(cfg.S_df) / "S_df.csv")

When I run all of this, I get an error in the prep component itself saying

Execution failed. User process 'python' exited with status code 2. Please check log file 'user_logs/std_log.txt' for error details. Error: /bin/bash: /azureml-envs/azureml_bbh34278yrnrfuehn78340/lib/libtinfo.so.6: no version information available (required by /bin/bash)
usage: data_processing.py [--help] [--hydra-help] [--version]
                          [--cfg {job,hydra,all}] [--resolve]
                          [--package PACKAGE] [--run] [--multirun]
                          [--shell-completion] [--config-path CONFIG_PATH]
                          [--config-name CONFIG_NAME]
                          [--config-dir CONFIG_DIR]
                          [--experimental-rerun EXPERIMENTAL_RERUN]
                          [--info [{all,config,defaults,defaults-tree,plugins,searchpath}]]
                          [overrides ...]
data_processing.py: error: unrecognized arguments: --Y_df --S_df /mnt/azureml/cr/j/ffyh7fs984ryn8f733ff3/cap/data-capability/wd/S_df

The code runs fine and data is transferred between the components when there is no hydra involved but when hydra is involved, I get this error. why is that so?

Edit: Below is the data component config file for prep:

$schema: .schema.json
type: command

name: preprocessing24
display_name: preprocessing24


outputs:
  Y_df:
    type: uri_folder

  S_df:
    type: uri_folder



code: ./preprocessing_final


environment: azureml:datapipeline-environment:4

command: >-
  python data_processing.py

data preprocessing config file just contains a bunch of variables but I have added 2 more which are:

Y_df:
  random_txt

S_df:
  random_txt

the main function of the data processing script is mentioned above.

I am trying to pass data from one component to the next in Azure ML pipeline. I am able to do it in a simple code.

I have 2 components and I am defining them as below:

components_dir = "."
prep = load_component(source=f"{components_dir}/preprocessing_config.yml")
middle = load_component(source=f"{components_dir}/middle_config.yml")

Then I am defining a pipeline as below:

@pipeline(
    display_name="test_pipeline3",
    tags={"authoring": "sdk"},
    description="test pipeline to test things just like all other test pipelines."
)
def data_pipeline(
    # raw_data: Input,
    compute_train_node: str,
):
   
    prep_node = prep()

    prep_node.outputs.Y_df= Output(type="uri_folder", mode = 'rw_mount', path="path/testing/") 
    prep_node.outputs.S_df= Output(type="uri_folder", mode = 'rw_mount', path="path/testing/")


    transform_node = middle(Y_df=prep_node.outputs.Y_df,
                            S_df=prep_node.outputs.S_df)

The prep node has a script which involves hydra to get in the parameters from a config file. This script also has a config file that kicksoff the script in command line as below:

  python preprocessing_script.py
  --Y_df ${{outputs.Y_df}} 
  --S_df ${{outputs.S_df}}

I try to get the values of Y_df.path and S_df.path in the main function of the prep script as below:

@hydra.main(version_base=None, config_path=".", config_name="config_file")
def main(cfg: DictConfig):

    parser = argparse.ArgumentParser("prep")
    parser.add_argument("--Y_df", type=str, help="Path of prepped data")
    parser.add_argument("--S_df", type=str, help="Path of prepped data")
    args = parser.parse_args()
   
    # Call the preprocessing function with Hydra configurations
    df1,df2 = processing_func(cfg.data_name,cfg.prod_filter)
    df1.to_csv(Path(cfg.Y_df) / "Y_df.csv")
    df2.to_csv(Path(cfg.S_df) / "S_df.csv")

When I run all of this, I get an error in the prep component itself saying

Execution failed. User process 'python' exited with status code 2. Please check log file 'user_logs/std_log.txt' for error details. Error: /bin/bash: /azureml-envs/azureml_bbh34278yrnrfuehn78340/lib/libtinfo.so.6: no version information available (required by /bin/bash)
usage: data_processing.py [--help] [--hydra-help] [--version]
                          [--cfg {job,hydra,all}] [--resolve]
                          [--package PACKAGE] [--run] [--multirun]
                          [--shell-completion] [--config-path CONFIG_PATH]
                          [--config-name CONFIG_NAME]
                          [--config-dir CONFIG_DIR]
                          [--experimental-rerun EXPERIMENTAL_RERUN]
                          [--info [{all,config,defaults,defaults-tree,plugins,searchpath}]]
                          [overrides ...]
data_processing.py: error: unrecognized arguments: --Y_df --S_df /mnt/azureml/cr/j/ffyh7fs984ryn8f733ff3/cap/data-capability/wd/S_df

The code runs fine and data is transferred between the components when there is no hydra involved but when hydra is involved, I get this error. why is that so?

Edit: Below is the data component config file for prep:

$schema: https://azuremlschemas.azureedge.net/latest/commandComponent.schema.json
type: command

name: preprocessing24
display_name: preprocessing24


outputs:
  Y_df:
    type: uri_folder

  S_df:
    type: uri_folder



code: ./preprocessing_final


environment: azureml:datapipeline-environment:4

command: >-
  python data_processing.py

data preprocessing config file just contains a bunch of variables but I have added 2 more which are:

Y_df:
  random_txt

S_df:
  random_txt

the main function of the data processing script is mentioned above.

Share Improve this question edited Mar 12 at 13:35 Daraan 4,2407 gold badges22 silver badges47 bronze badges asked Jan 15 at 22:01 Ameya BhaveAmeya Bhave 1171 gold badge1 silver badge14 bronze badges 4
  • add the config file and code you are running in data_processing.py. – JayashankarGS Commented Jan 16 at 3:35
  • ok, editing the question – Ameya Bhave Commented Jan 16 at 14:31
  • Pass the args like this python data_processing.py --Y_df ${{outputs.Y_df}} --S_df ${{outputs.S_df}} also only defining main function doesn't work ,you need to call inside python file – JayashankarGS Commented Jan 16 at 14:55
  • When I pass the args like that, I get the error I mentioned above – Ameya Bhave Commented Jan 16 at 15:21
Add a comment  | 

2 Answers 2

Reset to default 1

hydra and argparse are natively not compatible, as hydra handles the the parsing.

If you want to combine both its easiest not use @hydra.main but Hydra's Compose API. Which takes care of some but not all setup features, iirc the custom logger output is not included the last time I used it.

The arguments for the compose API align with hydra.main, for the argparser use ArgumentParser.parse_known_args

import sys
import argparse
from hydra import compose, initialize
from omegaconf import OmegaConf   # optional for printing

def main():
    # global initialization
    parser = argparse.ArgumentParser("prep")
    parser.add_argument("--Y_df", type=str, help="Path of prepped data")
    parser.add_argument("--S_df", type=str, help="Path of prepped data")
    args, unparsed_args = parser.parse_known_args()  # <- ignore unknown args

    # Before running hydra; remove the already parsed arguments
    sys.argv[1:] = unparsed_args

    initialize(version_base=None, config_path="conf", job_name="test_app")
    cfg = compose(config_name="config", overrides=["db=mysql", "db.user=me"])
    print(OmegaConf.to_yaml(cfg))

Alternatively you also parse the args before using @hydra.main in a similar way.

import sys
import argparse
import hydra

# guard with if __name__ == "__main__": if needed
parser = argparse.ArgumentParser("prep")
parser.add_argument("--Y_df", type=str, help="Path of prepped data")
parser.add_argument("--S_df", type=str, help="Path of prepped data")
args, unparsed_args = parser.parse_known_args()
sys.argb[1:] = unparsed_args

@hydra.main(version_base=None, config_path=".", config_name="config_file")
def main(cfg: DictConfig):
   # work with cfg and args or merge
   ...

Ok here is what was happening.

This notation in CLI script did not work

  python preprocessing_script.py
  --Y_df ${{outputs.Y_df}} 
  --S_df ${{outputs.S_df}}

Thats because hydra does not like that notation (I think)

Instead this notation worked:

  python data_processing.py '+Y_df=${{outputs.Y_df}}' '+S_df=${{outputs.S_df}}'

What this does is that it adds those 2 new variables - Y_df and S_df into the config file These variables can be accessed in the program just like all other variables in the config file by doing cfg.Y_df or cfg.S_df

转载请注明原文地址:http://anycun.com/QandA/1745556157a90914.html