Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Have fsurdat_modifier system test use conda environment rather than ncar_pylib #1798

Merged
merged 22 commits into from
Feb 6, 2023

Conversation

ekluzek
Copy link
Collaborator

@ekluzek ekluzek commented Jul 6, 2022

Description of changes

Use the manage_python_env script to setup the conda environment and use it rather than ncar_pylib

Specific notes

Contributors other than yourself, if any: None

CTSM Issues Fixed (include github issue #):
Fixes #1786
Fixes #1925

Are answers expected to change (and if so in what way)? No

Any User Interface Changes (namelist or namelist defaults changes)? No

Testing performed, if any:
FSURDATMODIFYCTSM_D_Mmpi-serial_Ld1.5x5_amazon.I2000Clm50SpRs.cheyenne_intel runs and PASSes on cheyenne

@ekluzek ekluzek added enhancement new capability or improved behavior of existing capability testing additions or changes to tests labels Jul 6, 2022
@ekluzek ekluzek self-assigned this Jul 6, 2022
@billsacks
Copy link
Member

Thanks for working on this. It's not as pretty or simple as I hoped, but it's still an improvement over what we had before - and I really hope any such mechanism can go away before too long.

Can you confirm two things:

  • Does this work both for a user's shell being csh and bash? (given the shell-specific logic you have here)
  • Does it work to run FSURDATMODIFYCTSM as one of a suite of tests? I'm both wanting to confirm that earlier tests don't mess anything up about the environment that would interfere with this one (which I think I've seen before) and that this one doesn't mess anything up with the environment that could impact following tests.

@ekluzek
Copy link
Collaborator Author

ekluzek commented Jul 7, 2022

I think I can improve on what I have here, so I'll try to do that. But, I also really like your suggestions on tests as well, so I'll do that.

To simplify this further we'll need conda to be loaded on machines in ccs_config/cime rather than python. That should be OK as when you activate an environment with conda you get a python version with it. But, we've got to convince CESM people that doing that would be OK.

@billsacks
Copy link
Member

I'm not sure it's worth spending much more time on this. (Sorry: my last comment probably suggested that it would be good to improve this, but I don't really think that should be a priority.) The vision we have discussed in CSEG is that users will load an appropriate python environment before doing anything with CESM / CIME. We don't want CIME to be responsible for controlling the python environment. Once we are comfortable with that (maybe the switch to derecho will be a good time???), I believe we can remove any 'module load python' from CIME. (I don't think we want a conda load in ccs_config/cime: I think we want to leave this up to the user - at least based on previous discussions.) So I feel like a less-than-elegant solution is okay for now as long as it works, and then we can hopefully remove this soon.

@billsacks
Copy link
Member

Also: I appreciate that this new solution works on multiple machines, not just cheyenne.

@billsacks
Copy link
Member

One other thing to confirm, if you haven't already, is: if there are failures in various aspects of the subprocess call, make sure that the test stops at that point with a FAIL result noted in the TestStatus file. You should definitely confirm this is the case for a possible failure in running fsurdat_modifier (e.g., deliberately introduce an error there and make sure the test aborts appropriately), but I'd also want to know that a failure in an earlier step in the subprocess call (where you have strung a bunch of commands together) correctly results in an error being reported. (I see you use check=True, but I think that will only work right if you get an error return code from the subprocess. Given the complexity of the subprocess call, it isn't immediately obvious to me that that is guaranteed to be the case.)

Copy link
Member

@billsacks billsacks left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approving with the testing caveats noted in my earlier comments. Thanks for your work on changing this so it will work on other systems as well as not relying on the deprecated ncar_pylib!

@slevis-lmwg
Copy link
Contributor

Thanks for working on this. It's not as pretty or simple as I hoped, but it's still an improvement over what we had before

Thanks, @ekluzek
It's not a lot of code, but I doubt that I would have figured this out without help :-)
Let me know if you'd like me to try any or all of the testing that @billsacks recommended.

ekluzek added 8 commits July 28, 2022 12:36
 The tools modify_fsurdat and subset_data have similar options
 modify_fsurdat: zero_nonveg
 subset_data: include-nonveg

 modify_fsurdat: dom_plant
 subset_data: dom-pft

 Make more similar and use README file for documenting modify_fsurdat.
 There's potential for additional changes to be addressed later.

 Change name of manage_python_env to py_env_create as agreed in CTSM software meeting
…m_pylib to match ncar_pylib and to use a different name from the Hammon script that uses a ctsm_py environment name
@ekluzek
Copy link
Collaborator Author

ekluzek commented Dec 14, 2022

Oh, I had forgotten about this one too. Thanks @billsacks

@ekluzek
Copy link
Collaborator Author

ekluzek commented Dec 15, 2022

@slevisconsulting I'm not going to be able to get back to working on this one for awhile, I wonder if you shouldn't merge this into your PR and shepherd bringing it in to CTSM main?

This set of changes allows CTSM to continue API compatability with changes to the FATES API. FATES has updated its nutrient dynamics routine, and required a modification to the test environment, some minor updates to variable dimensions in the history, and a call to a new FATES history routine.  Implicitly, the updating of the FATES tag introduces new content in the FATES model since the last API update (mostly bug fixes).

Resolved conflicts:
cime_config/SystemTests/fsurdatmodifyctsm.py
test/tools/test_driver.sh
@slevis-lmwg
Copy link
Contributor

@slevisconsulting I'm not going to be able to get back to working on this one for awhile, I wonder if you shouldn't merge this into your PR and shepherd bringing it in to CTSM main?

@ekluzek looks as though I can push directly to your PR, so I will work here for now, and we will likely close the redundant PR that I opened (#1925)

@ekluzek
Copy link
Collaborator Author

ekluzek commented Dec 15, 2022

@slevisconsulting I think some of the documentation changes you have in #1925, might still apply. So you could merge this into that, or the other way around, whichever you prefer. I'm glad you'll be able to work on it though, it would be nice to see this come in.

@slevis-lmwg

This comment was marked as resolved.

FSURDATMODIFYCTSM_D_Mmpi-serial_Ld1.5x5_amazon.I2000Clm50SpRs.cheyenne_intel
@slevis-lmwg
Copy link
Contributor

Okay, all sounds good - thank you @slevisconsulting and @ekluzek . It might be good to confirm that this works in the context of a larger test suite, if you haven't already. This wouldn't necessarily need the full aux_clm, but you could create a text-based testlist file containing this test and a few others – with one or more tests before this one and one or more after it – to make sure this doesn't interfere with the python environment loaded by the test system and vice versa.

I ran aux_clm on cheyenne and it passed other than expected failures. I did not get motivated to try a custom list of tests :-)

@billsacks
Copy link
Member

I ran aux_clm on cheyenne and it passed other than expected failures. I did not get motivated to try a custom list of tests :-)

Awesome, thanks @slevisconsulting

@slevis-lmwg
Copy link
Contributor

Tested the latest commit in clm_pymods, as well as with ./create_test and both passed.
@ekluzek please let me know if you agree with my implementation.

@slevis-lmwg
Copy link
Contributor

@ekluzek and @slevisconsulting agreed this morning (2023/1/24) that this PR will likely get merged/tagged as dev118, pending successful testing once I update to dev117.

@slevis-lmwg
Copy link
Contributor

@ekluzek I think something has changed in my permissions here. I updated to dev117, but the push to the PR has been failing. If nothing has changed on your side, we may need to troubleshoot tomorrow morning after Stand-up.

@slevis-lmwg
Copy link
Contributor

slevis-lmwg commented Feb 3, 2023

Meanwhile
PASS make all from the /python directory
but
FAIL ./run_sys_tests -s clm_pymods -c ctsm5.1.dev117 --skip-generate
so I'm trying this PR in ctsm5.1.dev115 where it used to work:
PASS ./run_sys_tests -s clm_pymods -c ctsm5.1.dev115 --skip-generate
and vanilla ctsm5.1.dev117
PASS ./run_sys_tests -s clm_pymods -c ctsm5.1.dev116 --skip-generate

So now I need to difference this PR from vanilla dev117 to figure out what causes the above failure...

Resolved conflicts:
cime_config/SystemTests/fsurdatmodifyctsm.py
test/tools/test_driver.sh
@slevis-lmwg
Copy link
Contributor

I have tried once more doing everything from scratch and the FSURDAT system test still fails.

This is how I resolved the conflicts:

  1. In test/tools/test_driver.sh I'm keeping escomp/master:
# Setup conda environement
<<<<<<< HEAD
\$CLM_ROOT/py_env_create
conda activate ctsm_pylib
||||||| 17e2acb6a
\$CLM_ROOT/py_env_create
conda activate ctsm_py
=======
conda activate ctsm_py
if [ \$? -ne 0 ]; then
   echo "ERROR: Trouble activating the ctsm_py conda environment, be sure it's setup with \$CLM_ROOT/py_env_create, then rerun"
   exit 4
fi
>>>>>>> escomp/master

but changing ctsm_py to ctsm_pylib as I did elsewhere in this PR.

  1. In cime_config/SystemTests/fsurdatmodifyctsm.py I'm keeping HEAD:
<<<<<<< HEAD

        self._case.load_env(reset=True)
        conda_env = ". "+self._get_caseroot()+"/.env_mach_specific.sh; "
        # Preprend the commands to get the conda environment for python first
        conda_env += self._get_conda_env()
        # Source the env
        try:
            subprocess.run( conda_env+"python3 "+tool_path+" "+self._cfg_file_path, shell=True, check=True)
        except subprocess.CalledProcessError as error:
            print("ERROR while getting the conda environment and/or ")
            print("running the fsurdat_modifier tool: ")
            print("(1) If your ctsm_pylib environment is out of date or you ")
            print("have not created the ctsm_pylib environment, yet, you may ")
            print("get past this error by running ./py_env_create ")
            print("in your ctsm directory and trying this test again. ")
            print("(2) If conda is not available, install and load conda, ")
            print("run ./py_env_create, and then try this test again. ")
            print("(3) If (1) and (2) are not the issue, then you may be ")
            print("getting an error within the fsurdat_modifier tool itself. ")
            print("Default error message: ")
            print(error.output)
        except:
            print("ERROR trying to run fsurdat_modifier tool.")
            raise
||||||| 17e2acb6a
        # Need to specify a specific python version that has the required
        # dependencies
        python_path = _get_python_path()
        subprocess.check_call([python_path, tool_path, self._cfg_file_path])
=======
        # Need to specify a specific python version that has the required
        # dependencies
        python_path = _get_python_path()
        subprocess.check_call([python_path, tool_path, self._cfg_file_path, "--verbose", "--overwrite", "--allow_ideal_and_include_non_veg"])
>>>>>>> escomp/master

I now suspect that one or more new things in this PR fail for some reason. New items include:

  • load_env
  • .env_mach_specific.sh
  • which conda
  • module unload python; module load conda
  • conda activate ctsm_pylib

Also, I still get an error when I try to push to the PR:

ERROR: Authentication error: Authentication required: You must have push access to verify locks
error: failed to push some refs to 'https://github.com/ekluzek/CTSM.git'

@billsacks
Copy link
Member

Regarding the permissions: I have noticed that there is a bug with git-lfs that can prevent you from pushing to other people's forks. The workaround is to do a git lfs uninstall, then git push, then restore with git lfs install.

@ekluzek
Copy link
Collaborator Author

ekluzek commented Feb 3, 2023

I think the permission is just the problem with making @slevisconsulting a collaborator on my fork, which we've resolved now.

In test driver you should change the use of ctsm_py to ctsm_pylib as follows...

conda activate ctsm_pylib
if [ \$? -ne 0 ]; then
   echo "ERROR: Trouble activating the ctsm_py conda environment, be sure it's setup with \$CLM_ROOT/py_env_create, then rerun"
   exit 4
fi

in fsurdatmodifyctsm.py you do need to change the line in

        try:
            subprocess.run( conda_env+"python3 "+tool_path+" "+self._cfg_file_path, shell=True, check=True)
        except subprocess.CalledProcessError as error:

to

        try:
            subprocess.check_call(conda_env+"python3 "+tool_path+" "+self._cfg_file_path+"--verbose --overwrite --allow_ideal_and_include_non_veg", shell=True, check=True)
        except subprocess.CalledProcessError as error:

Do you see why those need to change? Just want to make sure this makes sense to you...

@slevis-lmwg
Copy link
Contributor

@ekluzek I think I have found the source of the problem. For some reason it eluded me until now. I have discovered this message, and wanted to ask whether you had a specific reason to add this requirement or if simply it made sense to you:
ERROR: idealized AND include_nonveg can NOT both be on, pick one or the other

@slevis-lmwg
Copy link
Contributor

If you had a specific reason, then I will keep it. If not, then I would rather remove this requirement.

@slevis-lmwg
Copy link
Contributor

I now see (probably saw in the review, as well, but forgot) that you had added this option:
--allow_ideal_and_include_non_veg
I will try to resolve using this option.

@ekluzek
Copy link
Collaborator Author

ekluzek commented Feb 3, 2023

@slevisconsulting the reason I thought that idealized and include_nonveg should be on together is because idealized explicitly sets the dominant PFT to zero, while include_nonveg (actually false) zeros' out the PFT's. So they contradict each other. It looks like I got the logic wrong though -- and that combination should be allowed. First I got it wrong about it being about include_nonveg True, when it should be False. But, also I got it wrong that include_nonveg=False isn't zeroing vegetation -- just all the other landunit types -- so that's actually fine.

So you should remove that restriction and option I added.

The one that still makes sense is idealized and dom_pft, because idealized explicitly sets the dominant PFT to bare-soil, while dom_pft sets it to the value the user asks for. What behavior you get is then dependent on how it's implemented in the code -- but since it's ambiguous that means it's something that should be checked for.

Sorry I added a non useful restriction there. It is good to think these things through and make sure options make sense together and restrict ones that don't.

@slevis-lmwg
Copy link
Contributor

@ekluzek I thought about it some more and decided that I also disagree about keeping the second restriction. The comments in the configure file explain why:

# PFT/CFT to be set to 100% according to user-defined mask.
# If idealized = True and dom_pft = UNSET, the latter defaults to 0
# (bare soil). Valid values range from 0 to a max value (int) that one can
# obtain from the fsurdat_in file using ncdump (or method preferred by user).
# The max valid value will equal (lsmpft - 1) and will also equal the last
# value of cft(cft).
dom_pft = UNSET

In other words, dom_pft is meant to be zero if the user leaves it UNSET or if the user explicitly sets it to zero.

@slevis-lmwg
Copy link
Contributor

Regarding the permissions: I have noticed that there is a bug with git-lfs that can prevent you from pushing to other people's forks. The workaround is to do a git lfs uninstall, then git push, then restore with git lfs install.

Thanks @billsacks .
@ekluzek made me collaborator, so I am able to push again.
My only confusion is that I had pushed to this PR previously (e.g. Dec 15-19). Do collaborator permissions expire?

@billsacks
Copy link
Member

@ekluzek made me collaborator, so I am able to push again.
My only confusion is that I had pushed to this PR previously (e.g. Dec 15-19). Do collaborator permissions expire?

I really think this is due to the lfs issue. There are two ways you can have permission to push to someone else's fork:

(1) If they make you a collaborator. This works even if you have git lfs installed.

(2) If you are a "maintainer" of CTSM (which I think means you have write permission), then by default you are given permission to push to any branch for which there is a PR. (Authors can choose not to give this permission, but the default is to give it. For this pr, it is allowed: I see the message "Maintainers are allowed to edit this pull request" in the metadata on the top right of the PR.) This feature is broken by git lfs for some reason.

@ekluzek
Copy link
Collaborator Author

ekluzek commented Feb 3, 2023

@slevisconsulting that makes sense, and nicely documented in the file. Thanks for pointing that out.

It might be good to add a comment in the code that the dom_pft section HAS to be after the the idealized part for those reasons above. You could also add a comment to the idealized section that this should be done first, so that other options could override the behavior there. The idea that the idealized section is done and other things override its behavior then becomes an expectation of how the script is supposed to function. It's good to make all the expectations on how the script is intended to behavior to be as clear as possible.

@slevis-lmwg
Copy link
Contributor

  • Tests now PASS:
    clm_pymods test-suite and make all in /python directory
    I linked the dev117 baselines to dev118: ln -s ctsm5.1.dev117 ctsm5.1.dev118
  • I have now also updated the ChangeLog
  • @ekluzek I have added comments in the code as you suggested. Pls let me know if you have further concerns. Otherwise I think this is ready for the merge.

@ekluzek ekluzek merged commit e099103 into ESCOMP:master Feb 6, 2023
@ekluzek ekluzek deleted the fsurdatsystest_conda branch February 6, 2023 01:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement new capability or improved behavior of existing capability testing additions or changes to tests
Projects
Status: Done (non release/external)
Development

Successfully merging this pull request may close these issues.

ncar_pylib
3 participants