This section of the tutorial will go over the basics of setting up a 3DVAR with a plain static background error covariance.
The following tutorial assumes you have already set up your environment, compiled SOCA, and run the initialization steps in the previous section.
Important
Within your main soca-tutorial
directory perform the following actions to create the directories and link in the required files, including the initialized diffusion from the previous tutorial:
mkdir -p tutorial/3dvar
cd tutorial/3dvar
ln -s ../../input_data/* .
cp ../../3dvar/files/* .
ln -s ../init/diffusion_{hz,vt}.nc .
ln -s <PATH_TO_SOCA_BUILD_DIR>/bin/soca_{hofx3d,var}.x .
mkdir obs_out
The length scales that were used in the previous section were a little larger than should be used, on purpose, in order to help visualize the dirac tests. To use length scales that are more realistic do the following:
Tip
Rerun the diffusion calibration in the tutorial/init
directory with smaller scales:
- change
diffusion_setscales.yaml
so thatHZ_ROSSBY_MULT: 1.0
andHZ_MIN_GRID_MULT: 1.5
- rerun
./calc_scales.py
- rerun the diffusion calibration
Note how there are fewer iterations of the diffusion operator required now (since the length scales are shorter). This will help the 3DVAR in this part of the tutorial run faster!
Before actually performing a data assimilation cycle, it's helpful to look at the HofX (i.e. soca_hofx3d.x
that will read in a background state and observation files and calculate the model equivalent at the given observation locations.
Look at the hofx.yaml
configuration file and note the following sections:
The state
section specifies the background state, which variables should be loaded, and the date of the background. Note that the date given by state.date
overwrites anything that is present in the ocean or ice restart files. In our case here we are pretending these backgrounds are for Aug 1, 2021 at 12Z (even though they actually came from a previous data assimilation run in 2019).
There are several state variables listed. If you're curious about how the state variable names are mapped from the input restart files to what is in SOCA, and what is expected by other parts of JEDI such as UFO, look at the field mapping file (fields_metadata.yml
). In short, the variables listed here are for:
from the file given by ice_filename
:
cice
- sea ice concentrationhicen
- sea ice thicknesshsnon
- sea ice snow thickness
from the file given by ocn_filename
:
tocn
- ocean temperaturesocn
- ocean salinityssh
- ocean surface heighthocn
- ocean layer thickness
and the following fields, which are not directly present in the input files, but rather are calculated by SOCA, and are needed by some of the variable changes for now:
mld
- mixed layer depthlayer_depth
- model layer depth
Unused variables can be removed. For example, if you are only doing ocean DA, without sea ice, you can remove everything ice related from this list.
The time window
section specifies the time window in which observations are considered valid. Note that the date of the background state is in the center of this time window.
The observations.observers
section is where one or more observation input files are specified. Each observation has the following sections:
obs space
- this section has an arbitrary name (name
) that can be whatever you want, input file (obsdatain
), the optional output file (obsdataout
), and the variables that should be used from the files (simulated variables
)obs filters
- optional quality control filters (this will be discussed more later)obs operator
- the observation operator. There are many types of observation operators available in UFO (see observation operator documentation for more extensive information). We will discuss the major three that are important for the ocean (excluding radiances).
For observations that directly observe some variable in the model state, the Identity
observation operator type is used. Here we have SST observations. Our simulated variable name is seaSurfaceTemperature
which maps to the model variable sea_surface_temperature
. This simple variable name mapping is listed in the obsop_name_map.yml
file.
- obs space:
name: sst_avhrr_metop-b
...
simulated variables: [seaSurfaceTemperature]
obs operator:
name: Identity
observation alias file: ./obsop_name_map.yml
Metop-B SST observations 2021-08-01 |
---|
![]() |
The ADT
operator (absolute dynamic topography) is close to being an identity operator, however there is a special step that removes the global offset between the model SSH and the observed ADT. This offset is calculated individually for each ADT obs space present.
- obs space:
name: ADT
...
simulated variables: [absoluteDynamicTopography]
obs operator:
name: ADT
Cryosat-2 observations 2021-08-01 |
---|
![]() |
The Composite
operator is used for insitu T/S profiles so that we may apply different observation operators for different variables in the file. Here, the VertInterp
operator is used for salinity
. It is similar to the Identity
operator, but performs vertical interpolation in addition to horizontal interpolation. The InsituTemperature
operator is used for waterTemperature
. It is similar ot the VertInterp
operator, however it also converts between potential temperature (model) and insitu temperature (observation).
- obs space:
name: insitu_ts
...
simulated variables: [waterTemperature, salinity]
obs operator:
name: Composite
components:
- name: InsituTemperature
variables:
- name: waterTemperature
- name: VertInterp
observation alias file: ./obsop_name_map.yml
variables:
- name: salinity
vertical coordinate: sea_water_depth
observation vertical coordinate: depth
interpolation method: linear
Insitu observations 2021-08-01 |
---|
![]() |
Important
Run the observation operators:
mpirun -n 10 ./soca_hofx3d.x hofx.yaml
If you look at the output log, you'll notice several sections that let you know what is happening. There is an H(x)
section that shows you the number of input observations that are valid for the given time window
, as well as some statistics on the observation operator values that were calculated from the given background state.
H(x):
sst_avhrr_metop-b nobs= 775202 Min=-1.86137, Max=33.1543, RMS=20.5008
adt_c2 nobs= 40476 Min=-1.50592, Max=1.86233, RMS=0.791879
insitu_ts nobs= 730386 Min=-1.90618, Max=38.0376, RMS=26.3217
You'll notice that, unfortunately, the temperature and salinity statistics are combined for the insitu_ts
obs space, but that is just a side effect of how the diagnostics are presented in the log file.
The log file will also present details on how many observations passed the quality control filters, and how many failed for each filter. (We will discuss basic QC filters in more detail in the 3DVAR section below.) There is a long list of generic quality control filters in UFO, see the online documentation for more information on the filters available. Here is the log output with the qc filters that were given in hofx.yaml
QC sst_avhrr_metop-b seaSurfaceSkinTemperature: 809406 rejected as processed but not assimilated.
QC sst_avhrr_metop-b seaSurfaceSkinTemperature: 0 passed out of 809406 observations.
QC sst_avhrr_metop-b seaSurfaceTemperature: 5 out of bounds.
QC sst_avhrr_metop-b seaSurfaceTemperature: 34203 H(x) failed.
QC sst_avhrr_metop-b seaSurfaceTemperature: 775198 passed out of 809406 observations.
QC adt_c2 absoluteDynamicTopography: 81 out of bounds.
QC adt_c2 absoluteDynamicTopography: 28 H(x) failed.
QC adt_c2 absoluteDynamicTopography: 40476 passed out of 40585 observations.
QC insitu_ts salinity: 1 missing values.
QC insitu_ts salinity: 22936 H(x) failed.
QC insitu_ts salinity: 365192 passed out of 388129 observations.
QC insitu_ts waterTemperature: 178 missing values.
QC insitu_ts waterTemperature: 22879 H(x) failed.
QC insitu_ts waterTemperature: 149223 rejected by first-guess check.
QC insitu_ts waterTemperature: 215849 passed out of 388129 observations.
Output observation files are generated as defined in the obsdataout
section of each observation space. Look at the contents of one of the output in obs_out/
, for example:
ncdump -h obs_out/hofx.adt_c2.20210801T120000Z.nc4
The contents of this file should look similar to the input file, except now several additional variable groups have been added:
EffectiveError
- this is the calculated observation error that would be used for data assimilation. The input file may have had theObsError
variable, which is the default used for determining the observation error. However, there are various filters than can either modify or explicitly set the final error. (see UFO filter actions). If an observation is QC'ed out, theEffectiveError
is missing.EffectiveQC
- non-zero values here indicate the observation was removed for some reason and will not be used in the data assimilation. The value indicates which filter (listed in the UFO QCflags.h) was responsible for removing the observations.hofx
- this is the end product of the forward operator
You can examine these output files to get detailed information of which observations are or aren't used, and the resulting hofx values.
The variational data assimilation methods are available via the soca_var.x
executable. In addition to 3DVAR, the other variational methods are available via this executable, including 3DFGAT, 3DEnVAR, 4DEnVAR, and (if we had a model TL/ADJ) 4DVAR. The effective variational methods used depend on the cost function type and the background error type specified in the configuration file. First, we will look at a plain 3DVAR.
Look at the 3dvar.yaml
configuration file in your working directory. It is set up to perform data assimilation with insitu T/S profiles only. You'll notice the cost function type is set to 3DVAR:
cost function:
cost type: 3D-Var
The background error section looks similar to what was done in the previous section for the dirac tests (indicating a plain 3DVAR, and not a 3DEnVAR which will be covered in the next section), and the observations section should look similar to what was in the hofx.yaml
configuration.
Important
run the 3DVAR and see what happens:
mpirun -n 10 ./soca_var.x 3dvar.yaml
The output log contains some useful information. It has the same observation hofx and qc information that the hofx application produced. It also shows the final analysis and increment values, lets look at those to make sure everything is working:
CostFunction::addIncrement: Increment:
Valid time: 2021-08-01T12:00:00Z
socn min= -52.225874 max= 2.759373 mean= -0.002550
tocn min= -42.685815 max= 55.446582 mean= -0.005559
ssh min= -2.106387 max= 11.213957 mean= 0.002550
...
CostFunction::addIncrement: Analysis:
Valid time: 2021-08-01T12:00:00Z
socn min= -17.546187 max= 40.756473 mean= 34.283437
tocn min= -14.931892 max= 69.766894 mean= 7.859047
ssh min= -2.176699 max= 10.284270 mean= -0.308790
...
Wow, that looks horrible! The increments are huge, and the resulting analysis is crazy. An SSH increment of 11m, and salinity analysis of -17, what happened??
You could look at the observation output file. Similar to the hofx application, the observation space statistics are saved and if you run a ncdump -h ...
on a file in the obs_out/
directory you'll see the additional variables that are created for each observation space. We don't provide tools (yet) to easily look at the statistics in these files, but you could easily write a python script to do so for you.
hofx0
- The observation operator applied to the backgroundhofx1
- The observation operator applied after the first outer iteration of 3DVAR (i.e. the observation operator applied to the analysis in this case since we only have a single outer loop)EffectiveQC0
- The same meaning asEffectiveQC
from the hofx application, applied to the initial backgroundEffectiveQC1
- The same asEffectiveQC0
, though after the first outer iteration of 3DVARombg
- observation minus background, effectively the same as(ObsValue - hofx0)
oman
- observation minus analysis, effectively the same as(ObsValue - hofx1)
The executable produces two other output files, an increment (ocn.3dvar.incr.2021-08-01T12:00:00Z.nc
), and an analysis (ocn.3dvar.an.2021-08-01T12:00:00Z.nc
) ( which is just the increment plus the background). If you open those you'll see that there are some large increments and bad analysis in a few localized spots.
bad SSH Analysis without QC |
---|
![]() |
It's time to add some basic QC to those observations!
For each observation space in the config file you can enable one or more QC filters. These are the same filters that can be enabled in the hofx application. There is a long list of filters available that can remove obs, change obs values, set or inflate obs errors, create derived variables, etc. You can read more about them in the UFO obs filters documentation. We'll look at 3 basic qc filters here. Enable these filters by uncommenting the appropriate lines in your 3dvar.yaml
file for the insitu TS observations.
Important
Enable the following sections of quality control filters in 3dvar.yaml
(If you're curious about how complicated QC filters could get, check out what we are currently using for the full set of QC for the insitu obs in ocean_profile.yaml
!)
The Bounds Check Filter simply rejects observations that lie outside specific min/max values. Here, temperature obs outside -1.9 to 40.0 C are rejected, and salinity observations outside 2.0 to 41.0 PSU are rejected.
- filter: Bounds Check
filter variables: [{name: waterTemperature}]
minvalue: -1.9
maxvalue: 40.0
- filter: Bounds Check
filter variables: [{name: salinity}]
minvalue: 2.0
maxvalue: 41.0
The Domain Check Filter only keeps observations that pass all the given where
sections listed. The tests in the where
section can be based on any variable that is in the observation file, or interpolated from the model state (e.g. GeoVaLs/sea_surface_temperature
). Here observations that do not have a proper observation error assigned to them from the input file are rejected.
- filter: Domain Check
filter variables: [{name: waterTemperature}]
where:
- variable: {name: ObsError/waterTemperature}
minvalue: 0.001
The Background Check Filter rejects observations that are too far from the background, either as an absolute value (absolute threshold
) or as a number calculated as a multiple of the observation error (threshold
). Here, we remove temperature observations that are more than 5 times the observation error away from the background.
- filter: Background Check
filter variables: [{name: waterTemperature}]
threshold: 5
If you rerun the var now with the QC filters in place, you'll see a much more reasonable analysis.
Important
run the 3DVAR again, after having enabled the qc filters in 3dvar.yaml
mpirun -n 10 ./soca_var.x 3dvar.yaml
The output log will indicate that additional TS observations are being rejected now by the filters, most notably quite a few observations are removed by the bounds check and are shown as "out of bounds" in the log:
QC insitu_ts salinity: 1 missing values.
QC insitu_ts salinity: 815 out of bounds.
QC insitu_ts salinity: 22814 H(x) failed.
QC insitu_ts salinity: 85380 rejected by first-guess check.
QC insitu_ts salinity: 279119 passed out of 388129 observations.
QC insitu_ts waterTemperature: 178 missing values.
QC insitu_ts waterTemperature: 654 out of bounds.
QC insitu_ts waterTemperature: 23042 H(x) failed.
QC insitu_ts waterTemperature: 148349 rejected by first-guess check.
QC insitu_ts waterTemperature: 215906 passed out of 388129 observations.
Turn on all the observations that are in 3dvar.yaml
and the results should look like the following:
SSH increment | SST increment | SSS increment |
---|---|---|
![]() |
![]() |
![]() |
Using the resulting analysis for the next forecast as part of a cycling system will be covered in a later tutorial. But, in short, you can either use the increment file to drive IAU for MOM6, or you can use the analysis file to replace the given state variables in the original background restart file. (The analysis file only contains the small number of variables used in the data assimilation, it does not contain all the variables needed for a MOM6 restart).
Tip
(optional) Here are some exercises to try on your own:
Add observations and the appropriate QC filters for the other observation files that are in input_data/obs/
but are not yet included in 3dvar.yaml
. In order to add the ice concentration observations, you'll need to:
- add ice to the background state
- add the ice vars to the state, analysis variables, and the diffusion operator variables
There are several other options in the variational
section of 3dvar.yaml
that are of importance.
First is the variational.iterations
section. Each item of this list defines an outer loops. We normally only do a single outer loop with 3DVAR with the ocean, however multiple outer loops, where everything is re-linearized at the beginning of the loop, is possible. The geometry is specified again, because in theory you could use a lower resolution geometry for the variational minimizer, though we don't exercise that option yet within SOCA.
The number of iterations of the minimizer is controlled by two parameters. Minimization stops when one of the two conditions are met:
iterations.ninner
- the maximum number of inner iterationsiterations.gradient norm reduction
- the target norm for the minimization of the gradient.
But how do I know what to set for these two parameters? When you run the 3DVAR the output log will show diagnostics for each iteration of the minimization, including the value of the norm reduction for the iterations, which will let you know how it is converging:
DRPCG end of iteration 1
Gradient reduction ( 1) = 2598709.78851671
Norm reduction ( 1) = 0.5090873539758356
Quadratic cost function: J ( 1) = 660206600.3306482
Quadratic cost function: Jb ( 1) = 38.85576871046547
Quadratic cost function: JoJc( 1) = 660206561.4748795
...
DRPCG end of iteration 2
Gradient reduction ( 2) = 1928596.023566572
Norm reduction ( 2) = 0.3778120399839757
Quadratic cost function: J ( 2) = 647740573.7450265
Quadratic cost function: Jb ( 2) = 139.8621227075966
Quadratic cost function: JoJc( 2) = 647740433.8829038
If you parse the output and plot the cost function (J) over the iterations we get the following plot. You can see that there is a big decrease in the cost function at first, but then after 20 or 30 iterations you get diminishing returns as it converges slower. This is why, for this tutorial, the number of iterations was set to only 20. The number of iterations required will vary depending on the configuration of SOCA, and the observations used.
Tip
(optional) exercises you can try on you own:
Add a second outer loop to the 3DVAR. (you can simply duplicate the existing outer loop). Make sure to name the increment output file to something different for the second loop, otherwise the increment from the first loop will be overwritten. Bonus points for plotting the cost function over the two outer loops.
The cost function.cost type
parameter in 3dvar.yaml
is set to one of the long list of minimizers available in OOPS, however there only two that you should be concerned with:
- DRPCG - Derber-Rosati Preconditioned Conjugate Gradients. For details see S. Gurol, PhD Manuscript, 2013. Use this if your model space is smaller than your observation space
- RPCG : Augmented Restricted Preconditioned Conjugate Gradients. Based on the algorithm proposed in Gratton and Tshimanga, QJRMS, 135: 1573-1585 (2009). Use this if your observation space is smaller than your model space
These two minimizers should produce nearly the same results, the difference will simply be in runtime and memory usage. Since with ocean DA we nearly always have fewer observations than 3D model points, you should use RPCG
unless you have reason to do otherwise.
For example, for 30 iterations, you can see that RPCG
uses much less memory and CPU time with the given observations and grid size used in this tutorial:
-
DRPCG
OOPS_STATS ------------------------ Parallel Timing Statistics ( 30 MPI tasks) ----------------------------------- OOPS_STATS Run end - Runtime: 105.91 sec, Memory: total: 40.34 Gb, per task: min = 1.23 Gb, max = 1.43 Gb
-
RPCG
OOPS_STATS ------------------------ Parallel Timing Statistics ( 30 MPI tasks) ----------------------------------- OOPS_STATS Run end - Runtime: 89.46 sec, Memory: total: 22.23 Gb, per task: min = 696.48 Mb, max = 770.02 Mb
One variation on the previously shown 3DVAR is the 3DVAR-FGAT (First Guess at Appriopriate Time). This differs from the standard 3DVAR mainly in how the hofx is calculated. Instead of a single background at the center of the window, multiple background time slots can be given so that the hofx is calculated using the background time slot that is closest to the actual observation time. The ocean changes slowly, so if using short windows (e.g. 6 hrs) it's uncertain how much benefit FGAT would have, though it might be beneficial in the mixed layer where there is a strong diurnal signal, or with a high resolution grid.
Tip
You can run 3DFGAT with the sample yaml given.
mpirun -n 10 ./soca_var.x 3dfgat.yaml
Looking inside the configuration file, to enable FGAT this with the variational application, you'll need to change the cost function type to:
cost function:
cost type: 3D-FGAT
and add a model
section. We will not be running an actual model from the initial background, as part of the variational application (this is possible though with other JEDI systems, namely FV3-JEDI or MPAS-JEDI, but not so much with SOCA). Instead, we will be using a "pseudo model" which will load in several time slots that have already been saved from a model run.
model:
name: PseudoModel
tstep: PT6H
states:
- read_from_file: 1
date: 2021-08-01T06:00:00Z
basename: ./bkg/
ocn_filename: ocn.bkg.2019080112.nc
- read_from_file: 1
date: 2021-08-01T12:00:00Z
basename: ./bkg/
ocn_filename: ocn.bkg.2019080112.nc
- read_from_file: 1
date: 2021-08-01T18:00:00Z
basename: ./bkg/
ocn_filename: ocn.bkg.2019080112.nc
- read_from_file: 1
date: 2021-08-02T00:00:00Z
basename: ./bkg/
ocn_filename: ocn.bkg.2019080112.nc
One thing to notice is that the background time is no longer at the center of the window, but rather at the beginning.
Also, if you actually ran the 3DFGAT and looked at the output, you should notice that the output is the same as the simple 3DVAR case, this is because the same background file was used for each time slot! (I didn't have multiple time slots available in the input data)
The next tutorial will present how to add a hybrid ensemble background error covariance to run a 3DEnVAR.