Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add remaining reproduction scripts #34

Merged
merged 1 commit into from
Jun 3, 2024
Merged

Conversation

forsyth2
Copy link
Collaborator

@forsyth2 forsyth2 commented Jan 11, 2024

Add remaining reproduction scripts. Follow-up to #32, #33. Resolves #23.

@forsyth2
Copy link
Collaborator Author

Currently running sbatch test_reproduction_scripts.bash on Chrysalis.

@forsyth2 forsyth2 mentioned this pull request Jan 11, 2024
@forsyth2
Copy link
Collaborator Author

Relevant output from ./check_results.bash:

v2.LR.historical_0101_bonus
./check_results.bash: line 22: cd: /lcrc/group/e3sm/ac.forsyth2/E3SMv2_test/v2.LR.historical_0101_bonus/tests: No such file or directory
Line count test passed
Checksum test failed
42ffbf170db587dc25d84d5d2ec7bc12 atm_XS_1x10_ndays.txt
d23e455ba5bef0bf87211468570b6835 atm_XS_1x10_ndays.txt
Debug:
/lcrc/group/e3sm/ac.forsyth2/E3SMv2_test/v2.LR.historical_0101_bonus/tests

v2.LR.amip_0301
./check_results.bash: line 22: cd: /lcrc/group/e3sm/ac.forsyth2/E3SMv2_test/v2.LR.amip_0301/tests: No such file or directory
Line count test passed
Checksum test failed
64e0fae59c1f6a48da0cae534c8be4a1 atm_XS_1x10_ndays.txt
6ae0ba340ef42b945c8573e9e5d7a0c7 atm_XS_1x10_ndays.txt
Debug:
/lcrc/group/e3sm/ac.forsyth2/E3SMv2_test/v2.LR.amip_0301/tests

v2.LR.amip_0101_bonus
./check_results.bash: line 22: cd: /lcrc/group/e3sm/ac.forsyth2/E3SMv2_test/v2.LR.amip_0101_bonus/tests: No such file or directory
Line count test passed
Checksum test failed
64e0fae59c1f6a48da0cae534c8be4a1 atm_XS_1x10_ndays.txt
c4b1c7337e89134fca7420437992ea97 atm_XS_1x10_ndays.txt
Debug:
/lcrc/group/e3sm/ac.forsyth2/E3SMv2_test/v2.LR.amip_0101_bonus/tests

v2.NARRM.abrupt-4xCO2_0101
./check_results.bash: line 22: cd: /lcrc/group/e3sm/ac.forsyth2/E3SMv2_test/v2.NARRM.abrupt-4xCO2_0101/tests: No such file or directory
Line count test passed
Checksum test failed
c18df3c0834abd2b5c63899e37559ccd atm_XS_1x10_ndays.txt
1eb5423d852764bbcd1bf67b180efc43 atm_XS_1x10_ndays.txt
Debug:
/lcrc/group/e3sm/ac.forsyth2/E3SMv2_test/v2.NARRM.abrupt-4xCO2_0101/tests

v2.NARRM.1pctCO2_0101
./check_results.bash: line 22: cd: /lcrc/group/e3sm/ac.forsyth2/E3SMv2_test/v2.NARRM.1pctCO2_0101/tests: No such file or directory
Line count test passed
Checksum test failed
c18df3c0834abd2b5c63899e37559ccd atm_XS_1x10_ndays.txt
80e6c83b39d58cb00876506deabfd8c2 atm_XS_1x10_ndays.txt
Debug:
/lcrc/group/e3sm/ac.forsyth2/E3SMv2_test/v2.NARRM.1pctCO2_0101/tests

v2.NARRM.amip_0101
./check_results.bash: line 22: cd: /lcrc/group/e3sm/ac.forsyth2/E3SMv2_test/v2.NARRM.amip_0101/tests: No such file or directory
Line count test passed
Checksum test failed
24147fbb5d601e1bd6fcae6ace72968c atm_XS_1x10_ndays.txt
930b7fc7e946910c3c8e716f733d0f31 atm_XS_1x10_ndays.txt
Debug:
/lcrc/group/e3sm/ac.forsyth2/E3SMv2_test/v2.NARRM.amip_0101/tests

v2.NARRM.amip_0201
./check_results.bash: line 22: cd: /lcrc/group/e3sm/ac.forsyth2/E3SMv2_test/v2.NARRM.amip_0201/tests: No such file or directory
Line count test passed
Checksum test failed
24147fbb5d601e1bd6fcae6ace72968c atm_XS_1x10_ndays.txt
a8326dd3922cbf32dccedb494fcedffb atm_XS_1x10_ndays.txt
Debug:
/lcrc/group/e3sm/ac.forsyth2/E3SMv2_test/v2.NARRM.amip_0201/tests

v2.NARRM.amip_0301
./check_results.bash: line 22: cd: /lcrc/group/e3sm/ac.forsyth2/E3SMv2_test/v2.NARRM.amip_0301/tests: No such file or directory
Line count test passed
Checksum test failed
24147fbb5d601e1bd6fcae6ace72968c atm_XS_1x10_ndays.txt
f8bcd50a7e9c5ef8253908b73ee7471c atm_XS_1x10_ndays.txt
Debug:
/lcrc/group/e3sm/ac.forsyth2/E3SMv2_test/v2.NARRM.amip_0301/tests

@forsyth2
Copy link
Collaborator Author

forsyth2 commented Jan 11, 2024

Seeing Globus errors in test_reproduction_scripts.o454938, which must have been why the test script finished so quickly. Authenticated for "LCRC Improv DTN" and "NERSC HPSS" in the Globus file manager. Re-running sbatch test_reproduction_scripts.bash on Chrysalis.

@forsyth2
Copy link
Collaborator Author

forsyth2 commented Jan 11, 2024

There was still a Globus error, so tried to update authentications using the script at E3SM-Project/zstash#302 (comment) (I believe doing a small zstash transfer would have had the same effect). This seems to be the recurring issue that the Globus Consents sometimes need to be entered manually.

@forsyth2
Copy link
Collaborator Author

Still run into a Globus error. Attempting the fix at E3SM-Project/zstash#322 (comment)

@forsyth2
Copy link
Collaborator Author

I'm not seeing the same error now and I see the auth code prompt in the output file. Re-running the fix from E3SM-Project/zstash#302 (comment)

@forsyth2
Copy link
Collaborator Author

I added the following lines:

  pwd                                                                                                                                          
  echo zstash extract -v --hpss=globus://nersc/home/projects/e3sm/www/WaterCycle/E3SMv2/${resolution}/${case_name} "init/*"                    
  exit 1 

That gave:

/lcrc/group/e3sm/ac.forsyth2/E3SMv2_test/v2.LR.historical_0101_bonus
zstash extract -v --hpss=globus://nersc/home/projects/e3sm/www/WaterCycle/E3SMv2/LR/v2.LR.historical_0101_bonus init/*

So, I went to that directory and ran that command. It prompted me twice for auth codes.

I then commented out those lines and re-ran the testing script.

@forsyth2
Copy link
Collaborator Author

From #32 (comment):

Line count test passed, Checksum test failed (3):

v2.LR.amip_0301
v2.NARRM.abrupt-4xCO2_0101 (./check_results.bash: line 22: cd: /lcrc/group/e3sm/ac.forsyth2/E3SMv2_test/v2.NARRM.abrupt-4xCO2_0101/tests: No such file or directory)
v2.NARRM.1pctCO2_0101 (./check_results.bash: line 22: cd: /lcrc/group/e3sm/ac.forsyth2/E3SMv2_test/v2.NARRM.1pctCO2_0101/tests: No such file or directory)

Line count test failed, Checksum test failed (5):

v2.LR.historical_0101_bonus (gzip: XS_1x10_ndays/run/atm.log.*.gz: No such file or directory)
v2.LR.amip_0101_bonus (gzip: XS_1x10_ndays/run/atm.log.*.gz: No such file or directory)
v2.NARRM.amip_0101 (gzip: XS_1x10_ndays/run/atm.log.*.gz: No such file or directory)
v2.NARRM.amip_0201 (gzip: XS_1x10_ndays/run/atm.log.*.gz: No such file or directory)
v2.NARRM.amip_0301 (gzip: XS_1x10_ndays/run/atm.log.*.gz: No such file or directory)

From running ./check_results.bash today:

Line count test passed, Checksum test failed (8):

v2.LR.historical_0101_bonus (./check_results.bash: line 22: cd: /lcrc/group/e3sm/ac.forsyth2/E3SMv2_test/v2.LR.historical_0101_bonus/tests: No such file or directory)
v2.LR.amip_0301 (./check_results.bash: line 22: cd: /lcrc/group/e3sm/ac.forsyth2/E3SMv2_test/v2.LR.amip_0301/tests: No such file or directory)
v2.LR.amip_0101_bonus (./check_results.bash: line 22: cd: /lcrc/group/e3sm/ac.forsyth2/E3SMv2_test/v2.LR.amip_0101_bonus/tests: No such file or directory)
v2.NARRM.abrupt-4xCO2_0101 (./check_results.bash: line 22: cd: /lcrc/group/e3sm/ac.forsyth2/E3SMv2_test/v2.NARRM.abrupt-4xCO2_0101/tests: No such file or directory)
v2.NARRM.1pctCO2_0101 (./check_results.bash: line 22: cd: /lcrc/group/e3sm/ac.forsyth2/E3SMv2_test/v2.NARRM.1pctCO2_0101/tests: No such file or directory)
v2.NARRM.amip_0101 (./check_results.bash: line 22: cd: /lcrc/group/e3sm/ac.forsyth2/E3SMv2_test/v2.NARRM.amip_0101/tests: No such file or directory)
v2.NARRM.amip_0201 (./check_results.bash: line 22: cd: /lcrc/group/e3sm/ac.forsyth2/E3SMv2_test/v2.NARRM.amip_0201/tests: No such file or directory)
v2.NARRM.amip_0301 (./check_results.bash: line 22: cd: /lcrc/group/e3sm/ac.forsyth2/E3SMv2_test/v2.NARRM.amip_0301/tests: No such file or directory)

So, all tests now pass the line count test. We have 8 checksum failures to debug.

@forsyth2
Copy link
Collaborator Author

Follow-up for the table in #32 (comment):

  1. https://e3sm-project.github.io/e3sm_data_docs/_build/html/v2/reproducing_simulations.html
  2. https://github.com/E3SM-Project/e3sm_data_docs/blob/main/utils/check_results.bash
  3. Results from ./check_results.bash
  4. Previously showed "No such file or directory" (in Add reproduction scripts #32 (comment))? (Note that ALL of the remaining scripts showed "No such file or directory" on the latest run.)
Simulation (1) checksum from html (2) expected checksum from test (3) actual checksum from test (4) Missing files previously?
v2.LR.historical_0101_bonus d23e455 d23e455 42ffbf1 gzip: XS_1x10_ndays/run/atm.log.*.gz
v2.LR.amip_0301 6ae0ba3 6ae0ba3 64e0fae No
v2.LR.amip_0101_bonus c4b1c73 c4b1c73 64e0fae gzip: XS_1x10_ndays/run/atm.log.*.gz
v2.NARRM.abrupt-4xCO2_0101 1eb5423 1eb5423 c18df3c cd: /lcrc/group/e3sm/ac.forsyth2/E3SMv2_test/v2.NARRM.abrupt-4xCO2_0101/tests
v2.NARRM.1pctCO2_0101 80e6c83 80e6c83 c18df3c cd: /lcrc/group/e3sm/ac.forsyth2/E3SMv2_test/v2.NARRM.1pctCO2_0101/tests
v2.NARRM.amip_0101 930b7fc 930b7fc 24147fb gzip: XS_1x10_ndays/run/atm.log.*.gz
v2.NARRM.amip_0201 a8326dd a8326dd 24147fb gzip: XS_1x10_ndays/run/atm.log.*.gz
v2.NARRM.amip_0301 f8bcd50 f8bcd50 24147fb gzip: XS_1x10_ndays/run/atm.log.*.gz

@forsyth2
Copy link
Collaborator Author

It appears the files are missing because the script couldn't find the reproduction scripts. And that is because I didn't run ./update_reproduction_scripts.bash first. I just ran that and am now re-running sbatch test_reproduction_scripts.bash

@forsyth2
Copy link
Collaborator Author

Latest results:

  1. https://e3sm-project.github.io/e3sm_data_docs/_build/html/v2/reproducing_simulations.html
  2. https://github.com/E3SM-Project/e3sm_data_docs/blob/main/utils/check_results.bash
  3. Results from ./check_results.bash
  4. "No such file or directory" appears?

Both line count test and checksum test failed (5):

Simulation (1) checksum from html (2) expected checksum from test (3) actual checksum from test (4) Missing files?
v2.LR.historical_0101_bonus d23e455 d23e455 d41d8cd gzip: XS_1x10_ndays/run/atm.log.*.gz
v2.LR.amip_0101_bonus c4b1c73 c4b1c73 d41d8cd gzip: XS_1x10_ndays/run/atm.log.*.gz
v2.NARRM.amip_0101 930b7fc 930b7fc d41d8cd gzip: XS_1x10_ndays/run/atm.log.*.gz
v2.NARRM.amip_0201 a8326dd a8326dd d41d8cd gzip: XS_1x10_ndays/run/atm.log.*.gz
v2.NARRM.amip_0301 f8bcd50 f8bcd50 d41d8cd gzip: XS_1x10_ndays/run/atm.log.*.gz

Line count test passed, checksum test failed (1):

Simulation (1) checksum from html (2) expected checksum from test (3) actual checksum from test (4) Missing files?
v2.LR.amip_0301 6ae0ba3 6ae0ba3 a6cff5e No

Both line count test and checksum test passed (2):

v2.NARRM.abrupt-4xCO2_0101
v2.NARRM.1pctCO2_0101

@forsyth2
Copy link
Collaborator Author

forsyth2 commented Jan 12, 2024

@golaz Update on reproduction scripts:

#32 added 11 reproduction scripts to E3SM Data docs. Based on the above results, we can add 2 more.

Still, there are 6 reproduction scripts that are failing:

  • v2.LR.amip_0301 is generating results but they don't create the same checksum as expected. The actual checksum matches the value for v2.LR.amip_0101 on https://e3sm-project.github.io/e3sm_data_docs/_build/html/v2/reproducing_simulations.html. That seems strange. That implies the v2.LR.amip_0301 reproduction script is reproducing v2.LR.amip_0101 instead.
  • 2 scripts aren't even generating results, which is presumably why their atm_XS_1x10_ndays.txt files are empty. Their /lcrc/group/e3sm/ac.forsyth2/E3SMv2_test/<case_name>/tests/XS_1x10_ndays/run directories are missing log files and nc files. I'm not sure why these aren't generating results.
v2.LR.historical_0101_bonus
v2.LR.amip_0101_bonus
  • 3 scripts are producing empty atm_XS_1x10_ndays.txt files, despite having log files and nc files in /lcrc/group/e3sm/ac.forsyth2/E3SMv2_test/<case_name>/tests/XS_1x10_ndays/run. However, the log files are not gzipped. grep -i "error" *log* shows a time limit being hit, so I will try to re-run these.
v2.NARRM.amip_0101
v2.NARRM.amip_0201
v2.NARRM.amip_0301

Then, there are the 52 remaining simulations listed on https://e3sm-project.github.io/e3sm_data_docs/_build/html/v2/reproducing_simulations.html, that don't seem to have been in my original script to generate reproduction scripts:

Water Cycle (low-resolution) > DECK (1)
v2.LR.piControl_land

Water Cycle (low-resolution) > Historical LE (16)
<all>

Water Cycle (low-resolution) > SSP370 LE (21)
<all>

Water Cycle (low-resolution) > Single-forcing (DAMIP-like) (3)
v2.LR.hist-GHG_0151
v2.LR.hist-aer_0151 <somehow has a reproduction script listed, but not a checksum>
v2.LR.hist-all-xGHG-xaer_0151

Water Cycle (low-resolution) > RFMIP (2)
v2.LR.piClim-histall_0031
v2.LR.piClim-histaer_0031

Water Cycle (low-resolution) > Other (2)
<all>

Water Cycle (NARRM) > Historical (4)
v2.NARRM.historical_0151
v2.NARRM.historical_0201
v2.NARRM.historical_0251
v2.NARRM.historical_0101_bonus

Water Cycle (NARRM) > AMIP (1)
v2.NARRM.amip_0101_bonus

Water Cycle (NARRM) > Other (2)
<all>

@forsyth2
Copy link
Collaborator Author

shows a time limit being hit

Re-running sbatch test_reproduction_scripts.bash after manually editing run_scripts/v2/reproduce/run.v2.NARRM.amip_{0101,0201,0301}.sh files to double the walltime from 20 minutes to 40 minutes. If that works, patch_helper.py will probably need to be updated to conditionally extend the walltime (that way these reproduction scripts can be generated automatically, without manual input).

@forsyth2
Copy link
Collaborator Author

2 hours seemed sufficient for run.v2.NARRM.amip_0201, but I'm doubling the walltime to 4 hours for 0101 and 0301. 0201 passed the line count test but still fails the checksum test:

Checksum test failed
1961efb78abebca230f7f4fe738c7ba7 atm_XS_1x10_ndays.txt
a8326dd3922cbf32dccedb494fcedffb atm_XS_1x10_ndays.txt

@forsyth2
Copy link
Collaborator Author

Re: the remaining simulations. The following 9 simulations should also have reproduction scripts. The others listed above are either extra simulations or ones generated on Cori, which is no longer available.

Water Cycle (low-resolution) > DECK (1)
v2.LR.piControl_land

Water Cycle (low-resolution) > Single-forcing (DAMIP-like) (3)
v2.LR.hist-GHG_0151
v2.LR.hist-aer_0151 <somehow has a reproduction script listed, but not a checksum>
v2.LR.hist-all-xGHG-xaer_0151

Water Cycle (low-resolution) > RFMIP (2)
v2.LR.piClim-histall_0031
v2.LR.piClim-histaer_0031

Water Cycle (NARRM) > Historical (3)
v2.NARRM.historical_0151
v2.NARRM.historical_0201
v2.NARRM.historical_0251

@forsyth2
Copy link
Collaborator Author

Test script failed early again due to Globus issue. Appear to have fixed with:

$ cd /lcrc/group/e3sm/ac.forsyth2/E3SMv2_test/v2.LR.historical_0101_bonus
$ rm ~/.globus-native-apps.cfg
# zstash extract -v --hpss=globus://nersc/home/projects/e3sm/www/WaterCycle/E3SMv2/LR/v2.LR.historical_0101_bonus init/*
# Enter the TWO authentication codes
$ cd -

@forsyth2
Copy link
Collaborator Author

Summary of remaining reproduction scripts to add:
From #34 (comment):

v2.LR.amip_0301
v2.NARRM.amip_0101 # I still need to run ./check_results.bash on these ones.
v2.NARRM.amip_0201
v2.NARRM.amip_0301

From #34 (comment):

Water Cycle (low-resolution) > DECK (1)
v2.LR.piControl_land

Water Cycle (low-resolution) > Single-forcing (DAMIP-like) (3)
v2.LR.hist-GHG_0151
v2.LR.hist-aer_0151 <somehow has a reproduction script listed, but not a checksum>
v2.LR.hist-all-xGHG-xaer_0151

Water Cycle (low-resolution) > RFMIP (2)
v2.LR.piClim-histall_0031
v2.LR.piClim-histaer_0031

Water Cycle (NARRM) > Historical (3)
v2.NARRM.historical_0151
v2.NARRM.historical_0201
v2.NARRM.historical_0251

That's 13 more scripts. Also, as noted in #34 (comment), these 2 are good to add already:

v2.NARRM.abrupt-4xCO2_0101
v2.NARRM.1pctCO2_0101

@forsyth2
Copy link
Collaborator Author

forsyth2 commented Jan 30, 2024

I'm doubling the walltime to 4 hours for 0101 and 0301

That was apparently sufficient for 0101, but not 0301. Double 0301's walltime to 8 hours. 0101, like 0201, now only fails the checksum test:

Checksum test failed
4738a146984d03725d32ede945690c41 atm_XS_1x10_ndays.txt
930b7fc7e946910c3c8e716f733d0f31 atm_XS_1x10_ndays.txt
Debug:
/lcrc/group/e3sm/ac.forsyth2/E3SMv2_test/v2.NARRM.amip_0101/tests
v2.NARRM.amip_0101

Checksum test failed
1961efb78abebca230f7f4fe738c7ba7 atm_XS_1x10_ndays.txt
a8326dd3922cbf32dccedb494fcedffb atm_XS_1x10_ndays.txt
Debug:
/lcrc/group/e3sm/ac.forsyth2/E3SMv2_test/v2.NARRM.amip_0201/tests
v2.NARRM.amip_0201

gzip: XS_1x10_ndays/run/atm.log.*.gz: No such file or directory
Line count test failed
0 atm_XS_1x10_ndays.txt
482 atm_XS_1x10_ndays.txt
Debug:
/lcrc/group/e3sm/ac.forsyth2/E3SMv2_test/v2.NARRM.amip_0301/tests
Checksum test failed
d41d8cd98f00b204e9800998ecf8427e atm_XS_1x10_ndays.txt
f8bcd50a7e9c5ef8253908b73ee7471c atm_XS_1x10_ndays.txt
Debug:
/lcrc/group/e3sm/ac.forsyth2/E3SMv2_test/v2.NARRM.amip_0301/tests
v2.NARRM.amip_0301

@forsyth2
Copy link
Collaborator Author

It can be confusing keeping track of all the directories and scripts. For reference:

  • /home/ac.forsyth2/e3sm_data_docs/utils: run the utility scripts, like sbatch test_reproduction_scripts.bash and ./check_results.bash
  • /lcrc/group/e3sm/ac.forsyth2/E3SMv2_test/<case-name>/tests/XS_1x10_ndays/run: look at logs, with grep -i "error" *log*
  • run_scripts/v2/reproduce/run.<case-name>.sh: manually edit the reproduction scripts. NOTE: do this with caution, since any call to /home/ac.forsyth2/e3sm_data_docs/utils/update_reproduction_scripts.bash will overwrite it. (Once you know the manually edited script works, you can try to edit the updater scripts to make the changes you made manually).

@forsyth2
Copy link
Collaborator Author

0301 finished. Again, checksum test failed. zgrep error *log* doesn't show anything interesting (just as in #32 (comment))

@forsyth2
Copy link
Collaborator Author

forsyth2 commented Mar 6, 2024

I'm still not seeing why those amip runs are failing. I'm currently testing the remaining scripts.

$ rm ~/.globus-native-apps.cfg # Handle ClientError.AuthenticationFailed
$ cd /lcrc/group/e3sm/ac.forsyth2/E3SMv2_test/v2.LR.historical_0101_bonus
$ source /lcrc/soft/climate/e3sm-unified/load_latest_e3sm_unified_chrysalis.sh # Load Unified
$ zstash extract -v --hpss=globus://nersc/home/projects/e3sm/www/WaterCycle/E3SMv2/LR/v2.LR.historical_0101_bonus init/*
# Manually enter the two Auth Codes
$ cd /home/ac.forsyth2/e3sm_data_docs/utils
$ sbatch test_reproduction_scripts.bash # Test scripts
$ tail -n 6 test_reproduction_scripts.bash # Running these scripts:
for simulation_name in piControl_land hist-GHG_0151 hist-aer_0151 hist-all-xGHG-xaer_0151 piClim-histall_0031 piClim-histaer_0031; do
  test_reproduction E3SMv2_test LR ${simulation_name} false false
done
for simulation_name in historical_0151 historical_0201 historical_0251; do
  test_reproduction E3SMv2_test NARRM ${simulation_name} false false
done

@forsyth2
Copy link
Collaborator Author

Coming back to this after a while:

Which reproduction scripts are ready to be included?

I ran ./check_results.bash, commenting out scripts that are already listed on https://docs.e3sm.org/e3sm_data_docs/_build/html/v2/reproducing_simulations.html. That left 6 checks:

check_test_results E3SMv2_test LR amip_0301 6ae0ba340ef42b945c8573e9e5d7a0c7
check_test_results E3SMv2_test NARRM abrupt-4xCO2_0101 1eb5423d852764bbcd1bf67b180efc43
check_test_results E3SMv2_test NARRM 1pctCO2_0101 80e6c83b39d58cb00876506deabfd8c2
check_test_results E3SMv2_test NARRM amip_0101 930b7fc7e946910c3c8e716f733d0f31
check_test_results E3SMv2_test NARRM amip_0201 a8326dd3922cbf32dccedb494fcedffb
check_test_results E3SMv2_test NARRM amip_0301 f8bcd50a7e9c5ef8253908b73ee7471c

The output is:

Checksum test failed
a6cff5ea277dd3a08be6bbc4b1c84a69 atm_XS_1x10_ndays.txt
6ae0ba340ef42b945c8573e9e5d7a0c7 atm_XS_1x10_ndays.txt
Debug:
/lcrc/group/e3sm/ac.forsyth2/E3SMv2_test/v2.LR.amip_0301/tests
v2.LR.amip_0301

Checksum test failed
4738a146984d03725d32ede945690c41 atm_XS_1x10_ndays.txt
930b7fc7e946910c3c8e716f733d0f31 atm_XS_1x10_ndays.txt
Debug:
/lcrc/group/e3sm/ac.forsyth2/E3SMv2_test/v2.NARRM.amip_0101/tests
v2.NARRM.amip_0101

Checksum test failed
1961efb78abebca230f7f4fe738c7ba7 atm_XS_1x10_ndays.txt
a8326dd3922cbf32dccedb494fcedffb atm_XS_1x10_ndays.txt
Debug:
/lcrc/group/e3sm/ac.forsyth2/E3SMv2_test/v2.NARRM.amip_0201/tests
v2.NARRM.amip_0201

Checksum test failed
f0ee236696536f75f8288b5ffd1b4c77 atm_XS_1x10_ndays.txt
f8bcd50a7e9c5ef8253908b73ee7471c atm_XS_1x10_ndays.txt
Debug:
/lcrc/group/e3sm/ac.forsyth2/E3SMv2_test/v2.NARRM.amip_0301/tests
v2.NARRM.amip_0301


Failed line count:
0
Failed checksum:
4

That means these 2 reproduction scripts are ready to be added to https://docs.e3sm.org/e3sm_data_docs/_build/html/v2/reproducing_simulations.html:

v2.NARRM.abrupt-4xCO2_0101
v2.NARRM.1pctCO2_0101

That matches the last note in #34 (comment).

Where do those expected checksums come from?

They're listed on the "10 day checksum" column of https://docs.e3sm.org/e3sm_data_docs/_build/html/v2/reproducing_simulations.html. But where do those come from?

They're taken from the original simulation pages (specifically the "sanity checks" sections) which are linked from https://acme-climate.atlassian.net/wiki/spaces/ED/pages/2766340117/V2+Simulation+Planning. (The corresponding v3 page is linked on https://docs.e3sm.org/running-e3sm-guide/guide-long-term-archiving/#4-document).

Full summary of remaining scripts

(Update on #34 (comment))

Simulation Is there an expected checksum?* Reproduction script matches checksum?
Set (1)
v2.LR.amip_0301 yes no
v2.NARRM.amip_0101 yes no
v2.NARRM.amip_0201 yes no
v2.NARRM.amip_0301 yes no
Set (2)
v2.LR.piControl_land no N/A
v2.LR.hist-GHG_0151 no N/A
v2.LR.hist-aer_0151** no N/A
v2.LR.hist-all-xGHG-xaer_0151 no N/A
v2.LR.piClim-histall_0031 no N/A
v2.LR.piClim-histaer_0031 no N/A
v2.NARRM.historical_0151 no N/A
v2.NARRM.historical_0201 no N/A
v2.NARRM.historical_0251 no N/A
Set (3)
v2.NARRM.abrupt-4xCO2_0101 yes yes
v2.NARRM.1pctCO2_0101 yes yes

*listed on https://docs.e3sm.org/e3sm_data_docs/_build/html/v2/reproducing_simulations.html
**somehow has a reproduction script listed, but not a checksum. Implies the reproduction script shouldn't be included on the site yet.
(1) Have expected checksums, but the reproduction scripts' checksums don't match up. We need to fix the reproduction scripts so that the checksum matches.
(2) No expected checksums. We need to get an expected checksum somehow... by re-running the small tests on the original simulations??
(3) Reproduction scripts' checksums match up with expected checksums. We could add these to the table at https://docs.e3sm.org/e3sm_data_docs/_build/html/v2/reproducing_simulations.html immediately.

@forsyth2
Copy link
Collaborator Author

(2) No expected checksums. We need to get an expected checksum somehow... by re-running the small tests on the original simulations??

On Perlmutter hsi:

Simulation Directory exists?* I have access to the directory?
Set (a)
v2.LR.piControl_land no N/A
Set (b)
v2.LR.hist-GHG_0151 yes no
v2.LR.hist-aer_0151 yes no
v2.LR.hist-all-xGHG-xaer_0151 yes no
v2.LR.piClim-histall_0031 yes no
v2.LR.piClim-histaer_0031 yes no
Set (c)
v2.NARRM.historical_0151 yes yes
v2.NARRM.historical_0201 yes yes
v2.NARRM.historical_0251 yes yes

*In /home/projects/e3sm/www/WaterCycle/E3SMv2/LR or /home/projects/e3sm/www/WaterCycle/E3SMv2/NARRM
(a) There doesn't seem to be an archive of this simulation. We probably won't be able to find the original checksum.
(b) I will need permissions to these simulation archives.
(c) I have permissions to these simulation archives. In theory, if I run zstash extract on them and then run the following checksum generator code block, we could find the expected checksums:

cd  <extracted_simulation>/tests
for test in *_*_ndays
do
  gunzip -c ${test}/run/atm.log.*.gz | grep '^ nstep, te ' | uniq > atm_${test}.txt
done
md5sum atm_*_ndays.txt

@forsyth2
Copy link
Collaborator Author

(c) I have permissions to these simulation archives. In theory, if I run zstash extract on them and then run the following checksum generator code block, we could find the expected checksums

Running:

$ screen
$ cd /lcrc/group/e3sm/ac.forsyth2/E3SMv2_test/zstash_extractions
$ mkdir v2.NARRM.historical_0151
$ cd v2.NARRM.historical_0151
$ source /lcrc/soft/climate/e3sm-unified/load_latest_e3sm_unified_chrysalis.sh
$ zstash extract -v --hpss=globus://nersc/home/projects/e3sm/www/WaterCycle/E3SMv2/NARRM/v2.NARRM.historical_0151/tests

Got globus_sdk.services.transfer.errors.TransferAPIError.

Logged into Globus with NERSC credentials. Authenticated with LCRC credentials for the LCRC endpoint in the file manager.

Rerunning

$ rm -rf zstash
$ zstash extract -v --hpss=globus://nersc/home/projects/e3sm/www/WaterCycle/E3SMv2/NARRM/v2.NARRM.historical_0151/tests

Still got globus_sdk.services.transfer.errors.TransferAPIError.

$ rm ~/.globus-native-apps.cfg
$ rm -rf zstash
$ zstash extract -v --hpss=globus://nersc/home/projects/e3sm/www/WaterCycle/E3SMv2/NARRM/v2.NARRM.historical_0151/test
# Prompted for auth code once

Got globus_sdk.services.transfer.errors.TransferAPIError --> Error listing directory '/home/projects/e3sm/www/WaterCycle/E3SMv2/NARRM/v2.NARRM.historical_0151/test

$ rm -rf zstash
$ zstash extract -v --hpss=globus://nersc/home/projects/e3sm/www/WaterCycle/E3SMv2/NARRM/v2.NARRM.historical_0151/tests

Got globus_sdk.services.transfer.errors.TransferAPIError --> Error listing directory '/home/projects/e3sm/www/WaterCycle/E3SMv2/NARRM/v2.NARRM.historical_0151/tests

$ rm -rf zstash
$ zstash ls --hpss=globus://nersc/home/projects/e3sm/www/WaterCycle/E3SMv2/NARRM/v2.NARRM.historical_0151
# Prompted for auth code
# "Consents added, please re-run the previous command to start transfer"
$ rm -rf zstash
$ zstash ls --hpss=globus://nersc/home/projects/e3sm/www/WaterCycle/E3SMv2/NARRM/v2.NARRM.historical_0151
# Quite a few files listed -- not just direct subdirectories. 
# Interestingly, `tests` is included...
$ rm -rf zstash
$ zstash ls --hpss=globus://nersc/home/projects/e3sm/www/WaterCycle/E3SMv2/NARRM/v2.NARRM.historical_0151/tests

This is because /home/projects/e3sm/www/WaterCycle/E3SMv2/NARRM/v2.NARRM.historical_0151 has the tars in it... it's not an exact copy of the simulation's directory structure. I should have just listed the files I wanted per https://docs.e3sm.org/zstash/_build/html/main/usage.html#extract

$ rm -rf zstash
$ zstash ls --hpss=globus://nersc/home/projects/e3sm/www/WaterCycle/E3SMv2/NARRM/v2.NARRM.historical_0151 tests/*
# This actually lists just the files in `tests/`
$ rm -rf zstash
$ zstash extract --hpss=globus://nersc/home/projects/e3sm/www/WaterCycle/E3SMv2/NARRM/v2.NARRM.historical_0151 tests/*
$ ls v2.NARRM.historical_0151/tests/XL2_1x5_ndays/run
# Files are listed here

Now, we can try to find the expected checksum:

$ cd /lcrc/group/e3sm/ac.forsyth2/E3SMv2_test/zstash_extractions/v2.NARRM.historical_0151/tests
$ for test in *_*_ndays
> do
>   gunzip -c ${test}/run/atm.log.*.gz | grep '^ nstep, te ' | uniq > atm_${test}.txt
> done
$ md5sum atm_*_ndays.txt
668fb58e3da9070640cf1ec907ac66c0  atm_XL2_1x5_ndays.txt

Added

check_test_results E3SMv2_test/zstash_extractions/ NARRM historical_0151 668fb58e3da9070640cf1ec907ac66c0

to check_results.bash. That gives:

Line count test failed
242 atm_XL2_1x5_ndays.txt
482 atm_XS_1x10_ndays.txt
Debug:
/lcrc/group/e3sm/ac.forsyth2/E3SMv2_test/zstash_extractions//v2.NARRM.historical_0151/tests
Checksum test failed
668fb58e3da9070640cf1ec907ac66c0 atm_XL2_1x5_ndays.txt
668fb58e3da9070640cf1ec907ac66c0 atm_XS_1x10_ndays.txt
Debug:
/lcrc/group/e3sm/ac.forsyth2/E3SMv2_test/zstash_extractions//v2.NARRM.historical_0151/tests
v2.NARRM.historical_0151

It looks like v2.NARRM.historical_0151 only ran the XL2_1x5_ndays test. The checksum test fails because the test name differs -- the checksum itself is actually identical. I presume the line count also differs because it's not the exact same test.

@forsyth2
Copy link
Collaborator Author

Note to self -- the checksum may be matching up because the result checker is calculating the checksum on the newly extracted tests data rather than the reproduction script data. Need to check that.

@forsyth2
Copy link
Collaborator Author

Why does the checksum match even though the number of days differs?

zstash_extractions should not have been added in check_test_results E3SMv2_test/zstash_extractions/ NARRM historical_0151 668fb58e3da9070640cf1ec907ac66c0. That made the checksum test check against itself...

We actually needed to run check_test_results E3SMv2_test NARRM historical_0151 668fb58e3da9070640cf1ec907ac66c0, which would check /lcrc/group/e3sm/ac.forsyth2/E3SMv2_test/v2.NARRM.historical_0151. I presume /lcrc/group/e3sm/ac.forsyth2/E3SMv2_test/v2.NARRM.historical_0151/tests doesn't exist because sbatch test_reproduction_scripts.bash wasn't run for the case of v2.NARRM.historical_0151.

And that hasn't been run yet because ./update_reproduction_scripts.bash has not yet been run for the case of v2.NARRM.historical_0151.

What's the status of all scripts?

Simulation Has a reproduction script been generated?* Is there an expected checksum?** Reproduction script checksum matches expected checksum?
Set (1)
v2.LR.amip_0301 yes yes no
v2.NARRM.amip_0101 yes yes no
v2.NARRM.amip_0201 yes yes no
v2.NARRM.amip_0301 yes yes no
Set (2a)
v2.LR.piControl_land no no N/A
Set (2b)
v2.LR.hist-GHG_0151 no no N/A
v2.LR.hist-aer_0151 no no N/A
v2.LR.hist-all-xGHG-xaer_0151 no no N/A
v2.LR.piClim-histall_0031 no no N/A
v2.LR.piClim-histaer_0031 no no N/A
Set (2c)
v2.NARRM.historical_0151 no no N/A
v2.NARRM.historical_0201 no no N/A
v2.NARRM.historical_0251 no no N/A
Set (3)
v2.NARRM.abrupt-4xCO2_0101 yes yes yes
v2.NARRM.1pctCO2_0101 yes yes yes

*listed as added on the diff for this PR: https://github.com/E3SM-Project/e3sm_data_docs/pull/34/files (this implies ./update_reproduction_scripts.bash has been run to include this simulation)
**listed on https://docs.e3sm.org/e3sm_data_docs/_build/html/v2/reproducing_simulations.html

So, we can see that all simulations in set (2) need to have ./update_reproduction_scripts.bash run.

Sets (2b) and (2c) are effectively one set now that permissions have been opened up. We will probably drop the set in (2a) since it's not archived on HPSS. I just ran ./update_reproduction_scripts.bash set up as:

resolution=LR
for case_name in hist-GHG_0151 hist-aer_0151 hist-all-xGHG-xaer_0151 piClim-histall_0031 piClim-histaer_0031; do
  ./generate_reproduction_script.bash ${resolution} ${case_name}
  diff ../run_scripts/v2/reproduce/run.v2.${resolution}.${case_name}.sh run.v2.${resolution}.${case_name}.sh
  mv run.v2.${resolution}.${case_name}.sh ../run_scripts/v2/reproduce/run.v2.${resolution}.${case_name}.sh
done

resolution=NARRM
for case_name in historical_0151 historical_0201 historical_0251; do
  ./generate_reproduction_script.bash ${resolution} ${case_name}
  diff ../run_scripts/v2/reproduce/run.v2.${resolution}.${case_name}.sh run.v2.${resolution}.${case_name}.sh
  mv run.v2.${resolution}.${case_name}.sh ../run_scripts/v2/reproduce/run.v2.${resolution}.${case_name}.sh
done

I'm now running sbatch test_reproduction_scripts.bash set up as:

for simulation_name in hist-GHG_0151 hist-aer_0151 hist-all-xGHG-xaer_0151 piClim-histall_0031 piClim-histaer_0031; do
  test_reproduction E3SMv2_test LR ${simulation_name} false false
done
for simulation_name in historical_0151 historical_0201 historical_0251; do
  test_reproduction E3SMv2_test NARRM ${simulation_name} false false
done

@forsyth2
Copy link
Collaborator Author

Confirming reproduction script generation

Added:

	run_scripts/v2/reproduce/run.v2.LR.hist-GHG_0151.sh
	run_scripts/v2/reproduce/run.v2.LR.hist-all-xGHG-xaer_0151.sh
	run_scripts/v2/reproduce/run.v2.LR.piClim-histaer_0031.sh
	run_scripts/v2/reproduce/run.v2.LR.piClim-histall_0031.sh
	run_scripts/v2/reproduce/run.v2.NARRM.historical_0151.sh
	run_scripts/v2/reproduce/run.v2.NARRM.historical_0201.sh
	run_scripts/v2/reproduce/run.v2.NARRM.historical_0251.sh

Modified:

run_scripts/v2/reproduce/run.v2.LR.hist-aer_0151.sh

(Recall that v2.LR.hist-aer_0151 had a reproduction script listed before, but no associated checksum, implying it was previously added in error).

Thus, we've now generated the reproduction scripts for the 8 simulations of sets (2b-c).

Testing new scripts

We now have the actual checksums for these 8 reproduction scripts, to include in ./check_results.bash. However, we still need to find the correct expected checksums to check against. (run.v2.NARRM.historical_0151.sh, for instance, only has the XL2_1x5_ndays test, so we cannot compare that directly to the XS_1x10_ndays test that were run by the reproduction scripts.)

@forsyth2
Copy link
Collaborator Author

forsyth2 commented May 29, 2024

Extracting simulations as follows (as part of the process to find the expected checksums):

$ cd /lcrc/group/e3sm/ac.forsyth2/E3SMv2_test/zstash_extractions
$ mkdir <simulation-name>
$ cd <simulation-name>
$ zstash ls --hpss=globus://nersc/home/projects/e3sm/www/WaterCycle/E3SMv2/<LR or NARRM>/<simulation-name> > out.txt; grep tests/ out.txt
Simulation Files in tests/?
v2.LR.hist-GHG_0151 tests/ does not exist
v2.LR.hist-aer_0151 tests/ does not exist
v2.LR.hist-all-xGHG-xaer_0151 tests/ does not exist
v2.LR.piClim-histaer_0031 tests/ does not exist
v2.LR.piClim-histall_0031 tests/ does not exist
v2.NARRM.historical_0151 tests/XL2_1x5_ndays
v2.NARRM.historical_0201 tests/XL2_1x5_ndays
v2.NARRM.historical_0251 tests/XL2_1x5_ndays

So, for the 3 NARRM simulations, we either have to run 10-day tests or change the reproduction scripts to run 5-day tests. The 5 LR simulations will need to have tests run in the first place, for comparison. So, it seems the best step forward is to run a 10-day test on all 8 of these original simulation scripts to get the expected checksums.

@forsyth2
Copy link
Collaborator Author

To run the tests with the original scripts, we'll first need to find those. They are located in this repo.

In theory, we'd just need to set readonly run='XS_1x10_ndays', and follow along with https://docs.e3sm.org/running-e3sm-guide/guide-prior-to-production/#running-short-tests-an-example.

@forsyth2
Copy link
Collaborator Author

forsyth2 commented May 29, 2024

I copied down run.v2.LR.hist-GHG_0151.sh from this repo to /home/ac.forsyth2/E3SMv2_test/data_docs_scripts and changed:

readonly RUN_REFDIR="/lcrc/group/e3sm/ac.forsyth2/E3SMv2_test/v2.LR.hist-GHG_0151/init"
readonly run='XS_1x10ndays'

But I'm getting

./run.v2.LR.hist-GHG_0151.sh: line 311: /home/ac.forsyth2/E3SMv2/code/20210813/cime/scripts/create_newcase: No such file or directory

Apparently I have no /home/ac.forsyth2/E3SMv2/code/20210813/, which I assume is because I didn't run that particular simulation originally and therefore didn't happen to check out the code on that exact date.

I believe setting do_fetch_code=true will fix that, because even though it will fetch the latest E3SM code, it will then checkout the code as of 2021-08-13:

readonly BRANCH="37959275bf3384157264e45a8d9c7c43f2be1d56" # master as of 2021-08-13  

@forsyth2
Copy link
Collaborator Author

Debugging mismatched checksums

I thought perhaps the v2.LR.amip_0301 run script failed to incorporate an important diff, and that would explain the mismatched checksum.

However, I ran ./generate_reproduction_script.bash LR amip_0301 and all of the diffs reported in run.v2.LR.amip_0301.sh.rej seem accounted for in the generated reproduction script and in patch_helper.py.

So, we still don't have a clear answer on why the checksums aren't matching up.

Generating expected checksums

From /home/ac.forsyth2/E3SMv2_test/data_docs_scripts, I ran ./run.v2.LR.hist-GHG_0151.sh. Important diffs from the original run script:

readonly RUN_REFDIR="/lcrc/group/e3sm/ac.forsyth2/E3SMv2_test/v2.LR.hist-GHG_0151/init" # Changed
readonly run='XS_1x10_ndays' # Changed
do_fetch_code=true #  Changed                                                                                
do_case_build=true # Changed   

Now, /lcrc/group/e3sm/ac.forsyth2/E3SMv2_test/v2.LR.hist-GHG_0151/init exists, as does /lcrc/group/e3sm/ac.forsyth2/E3SMv2/v2.LR.hist-GHG_0151, which includes the tests/ subdirectory.

for test in *_*_ndays
do
   gunzip -c ${test}/run/atm.log.*.gz | grep '^ nstep, te ' | uniq > atm_${test}.txt
done
$ md5sum atm_*_ndays.txt
c9aff4fd826f18d0872135b845090a6b  atm_XS_1x10_ndays.txt

This is our newly generated expected checksum. Now, we can check against the reproduction script's checksum. We add this line:

check_test_results E3SMv2_test LR hist-GHG_0151 c9aff4fd826f18d0872135b845090a6b

to utils/check_results.bash.

That shows:

./check_results.bash: line 21: cd: /lcrc/group/e3sm/ac.forsyth2/E3SMv2_test/v2.LR.hist-GHG_0151/tests: No such file or directory
gzip: *_*_ndays/run/atm.log.*.gz: No such file or directory
Line count test failed
0 atm_*_*_ndays.txt
482 atm_XS_1x10_ndays.txt
Debug:
/lcrc/group/e3sm/ac.forsyth2/E3SMv2_test/v2.LR.hist-GHG_0151/tests
Checksum test failed
d41d8cd98f00b204e9800998ecf8427e atm_*_*_ndays.txt
c9aff4fd826f18d0872135b845090a6b atm_XS_1x10_ndays.txt
Debug:
/lcrc/group/e3sm/ac.forsyth2/E3SMv2_test/v2.LR.hist-GHG_0151/tests
v2.LR.hist-GHG_0151

lcrc/group/e3sm/ac.forsyth2/E3SMv2_test/v2.LR.hist-GHG_0151/tests: No such file or directory implies that the sbatch test_reproduction_scripts.bash command failed.

@forsyth2
Copy link
Collaborator Author

I wrote #39 to explain, in detail, the complicated process of producing reproduction scripts. Using the steps there, I'm able to produce a clearer picture of the status of the remaining reproduction scripts:

Milestones:

  1. Reproduction script is in /home/ac.forsyth2/ez/e3sm_data_docs/run_scripts/v2/reproduce?
  2. Copied reproduction script is in /home/ac.forsyth2/E3SMv2_test/scripts?
  3. /lcrc/group/e3sm/ac.forsyth2/E3SMv2_test/<simulation>/init exists and is non-empty? [Check with cd /lcrc/group/e3sm/ac.forsyth2/E3SMv2_test; ls */init]
  4. /lcrc/group/e3sm/ac.forsyth2/E3SMv2_test/<simulation>/tests exists and is non-empty? [Check with cd /lcrc/group/e3sm/ac.forsyth2/E3SMv2_test; ls */tests]
  5. Expected checksum found? If not, what point on step 3's checklist on My process for setting up reproduction scripts #39 are we at?
  6. ./check_results.bash is passing (i.e, the line counts match and the checksums match)?

Milestone (5) is independent of (1-4). All 5 must be completed to get to the point where we can check (6).

Simulation 1 2 3 4 5 6
v2.NARRM.1pctCO2_0101 Empty ✅ (1)
v2.NARRM.abrupt-4xCO2_0101 Empty ✅ (1)
v2.LR.amip_0301 Empty ✅ (1) ❌ (checksum only)
v2.NARRM.amip_0101 Empty ✅ (1) ❌ (checksum only)
v2.NARRM.amip_0201 Empty ✅ (1) ❌ (checksum only)
v2.NARRM.amip_0301 Empty ✅ (1) ❌ (checksum only)
v2.LR.hist-GHG_0151 Does not exist ✅ (4) ❌ (line count & checksum)
v2.LR.hist-aer_0151 Does not exist ❌ (try method 4) Can't run yet
v2.LR.hist-all-xGHG-xaer_0151 Does not exist ❌ (try method 4) Can't run yet
v2.LR.piClim-histall_0031 Does not exist ❌ (try method 4) Can't run yet
v2.LR.piClim-histaer_0031 Does not exist ❌ (try method 4) Can't run yet
v2.NARRM.historical_0151 Does not exist Does not exist ❌ (try method 4) Can't run yet
v2.NARRM.historical_0201 Does not exist Does not exist ❌ (try method 4) Can't run yet
v2.NARRM.historical_0251 Does not exist Does not exist ❌ (try method 4) Can't run yet

Note that initial conditions (milestone 3) might not actually be necessary if a simulation used another simulation's initial conditions.

I now notice on https://docs.e3sm.org/e3sm_data_docs/_build/html/v2/reproducing_simulations.html that v2.NARRM.historical_{0151,0201,0251} were run on cori-knl, not chrysalis and therefore won't be included in the reproduction script generation.

The v2.NARRM.1pctCO2_0101 and v2.NARRM.abrupt-4xCO2_0101 reproduction scripts, as mentioned previously, are ready to include. That leaves 9 remaining simulations that need reproduction scripts.

Failing the tests -- milestone (6)

4 are failing the checksum test (and passing the line count test). It is very unclear why this is the case.

v2.LR.amip_0301
v2.NARRM.amip_0101
v2.NARRM.amip_0201
v2.NARRM.amip_0301

1 is failing both the checksum test and the line count test. It seems this is because /lcrc/group/e3sm/ac.forsyth2/E3SMv2_test/<simulation>/tests failed to generate. That needs to be debugged.

v2.LR.hist-GHG_0151

Need to find an expected checksum -- milestone (5)

4 need to have an expected checksum calculated by running from scratch the original script's test on a 10-day period.

v2.LR.hist-aer_0151
v2.LR.hist-all-xGHG-xaer_0151
v2.LR.piClim-histall_0031
v2.LR.piClim-histaer_0031

This will require running the original run script. It's possible we can get away without fetching the code again, but if we need to (e.g., a different branch is used), that would mean budgeting an hour per script for that.

Need test_reproduction_scripts.bash debugged -- milestones (2) and (4)

5 need to have test_reproduction_scripts.bash successfully copy over the reproduction script so it can run it (milestone 2) and produce a tests/ directory (milestone 4)

v2.LR.hist-GHG_0151
v2.LR.hist-aer_0151
v2.LR.hist-all-xGHG-xaer_0151
v2.LR.piClim-histall_0031
v2.LR.piClim-histaer_0031

@forsyth2
Copy link
Collaborator Author

Need test_reproduction_scripts.bash debugged -- milestones (2) and (4)"

cp /home/ac.forsyth2/ez/e3sm_data_docs/run_scripts/v2/reproduce/${script_name} ${script_name} needed to be updated to include the ez in the file path. Re-running

@forsyth2
Copy link
Collaborator Author

I tried running sbatch test_reproduction_scripts.bash but it appears to have failed due to a Globus error. I will need to check the authentications.

Relatedly, it appears my simultaneous running of the modified original run script of aer_0151 (to get the expected checksum) failed because initial conditions were wiped by test_reproduction_scripts.bash -- rm -rf /lcrc/group/e3sm/${USER}/${test_subdir}/${case_name})

@forsyth2
Copy link
Collaborator Author

The re-running of sbatch test_reproduction_scripts.bash was successful. Milestones (2) and (4) are now met for the 5 new reproduction scripts.

I originally couldn't compute the expected checksum for v2.LR.hist-aer_0151. The log files never ended up gzipped in /lcrc/group/e3sm/ac.forsyth2/E3SMv2/v2.LR.hist-aer_0151/tests/XS_1x10_ndays/run, which suggested an error. However, grep -i error *log* showed nothing relevant. I checked again later, however, and the logs were in fact gzipped. (Perhaps the launched job was still running and I forgot to check its completion?)

$ for test in *_*_ndays
do
   gunzip -c ${test}/run/atm.log.*.gz | grep '^ nstep, te ' | uniq > atm_${test}.txt
done
$ md5sum atm_*_ndays.txt
1a85a01b55fa91abdf9983a17f24e774  atm_XS_1x10_ndays.txt

Running ./check_results.bash, I see that now v2.LR.hist-GHG_0151 and v2.LR.hist-aer_0151 are passing, thus meeting milestones (5) and (6). These scripts can now join v2.NARRM.1pctCO2_0101 and v2.NARRM.abrupt-4xCO2_0101 in being added to the reproduction scripts table.

@forsyth2
Copy link
Collaborator Author

All the newer scripts pass the tests. At this point, we can add the following 7 reproduction scripts officially:

v2.NARRM.1pctCO2_0101
v2.NARRM.abrupt-4xCO2_0101 
v2.LR.hist-GHG_0151
v2.LR.hist-aer_0151
v2.LR.hist-all-xGHG-xaer_0151
v2.LR.piClim-histaer_0031
v2.LR.piClim-histall_0031

These 4 reproduction scripts need to be debugged to get the checksum tests passing:

v2.LR.amip_0301
v2.NARRM.amip_0101
v2.NARRM.amip_0201
v2.NARRM.amip_0301

It may be good at this point to merge in the 7 working reproduction scripts (and also update the table on the website), and address the remaining 4 in another pull request.

@forsyth2 forsyth2 force-pushed the issue-23-more-scripts branch 2 times, most recently from 8ef7a46 to a50e60d Compare June 3, 2024 17:40
@forsyth2 forsyth2 force-pushed the issue-23-more-scripts branch from a50e60d to bc6e292 Compare June 3, 2024 17:42
@forsyth2 forsyth2 marked this pull request as ready for review June 3, 2024 17:45
@forsyth2
Copy link
Collaborator Author

forsyth2 commented Jun 3, 2024

Going to merge the 7 working reproduction scripts.

@forsyth2 forsyth2 merged commit 1fa290a into main Jun 3, 2024
1 check passed
@forsyth2 forsyth2 deleted the issue-23-more-scripts branch June 3, 2024 17:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Finish adding reproduction scripts
1 participant