I am using hydra for config management for my experiments. I am trying to use the hydra submitit launcher plugin
to submit jobs automatically to a slurm
cluster.
the main config
file is like this:
defaults:
- override hydra/launcher: slurm
foo: 1
and the slurm
config file is like this:
defaults:
- submitit_slurm
_target_: hydra_plugins.hydra_submitit_launcher.submitit_launcher.SlurmLauncher
submitit_folder: ${hydra.sweep.dir}/.submitit/%j
name: ${hydra.job.name}
The project structure is like this:
|--project
| |--src
| | |--main.py
| | |--models
| | | |--__init__.py
| | | |--file1.py
| | | |--file2.py
| |--scripts
| | |--run.sh
in the init.py I define what to import from models
and in main
I use the import as usual from models import func
THE PROBLEM:
when I comment out the submitit launcher in the main config
file everything works smoothly BUT when I uncomment it I get this error message
submitit ERROR (2025-01-04 21:31:32,789) - Submitted job triggered an exception
Traceback (most recent call last):
File "<frozen runpy>", line 198, in _run_module_as_main
File "<frozen runpy>", line 88, in _run_code
File "/home/a/.conda/envs/hinet/lib/python3.12/site-packages/submitit/core/_submit.py", line 11, in <module>
submitit_main()
File "/home/a/.conda/envs/hinet/lib/python3.12/site-packages/submitit/core/submission.py", line 76, in submitit_main
process_job(args.folder)
File "/home/a/.conda/envs/hinet/lib/python3.12/site-packages/submitit/core/submission.py", line 69, in process_job
raise error
File "/home/a/.conda/envs/hinet/lib/python3.12/site-packages/submitit/core/submission.py", line 52, in process_job
delayed = utils.DelayedSubmission.load(paths.submitted_pickle)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/a/.conda/envs/hinet/lib/python3.12/site-packages/submitit/core/utils.py", line 153, in load
obj = pickle_load(filepath)
^^^^^^^^^^^^^^^^^^^^^
File "/home/a/.conda/envs/hinet/lib/python3.12/site-packages/submitit/core/utils.py", line 232, in pickle_load
return pickle.load(ifile)
^^^^^^^^^^^^^^^^^^
ModuleNotFoundError: No module named 'models'
Do you know what a possible reason for this error is and how to fix it?
Thanks!