Preparation of initial and boundary files

Introduction

HARMONIE can be coupled with external models as IFS, ARPEGE, HIRLAM. Internally it is possible to nest the different ALADIN/ALARO/AROME with some restrictions. In the following we describe the host initial and boundary files are generated depending on different configurations. Boundary file preparation basically includes two parts: forecast file fetching and boundary file generation.

The ECFLOW tasks for initial and boundary preparation

Boundary strategies

There are a number of ways to chose which forecast lengths you use as boundaries. The strategy is determined by BDSTRATEGY in ecf/config_exp.h and there are a number of strategies implemented.

  • available : Search for available files in BDDIR adn try to keep forecast consistency. This is ment to be used operationally since it will at least keep your run going, but with old boundaries, if no new boundaries are available.
  • simulate_operational : Mimic the behaviour of the operational runs using ECMWF 6h old boundaries.
  • same_forecast : Use all boundaries from the same forecast, start from analysis
  • analysis_only : Use only analyses as boundaries. Note that BDINT cannot be shorter than the frequency of the analyses.
  • latest : Use the latest possible boundary with the shortest forecast length
  • jb_ensemble : Same as same_forecast but used for JB-statistics generation. With this you should export JB_ENS_MEMBER=some_number
  • eps_ec_oper : ECMWF EPS members (on reduced Gaussian grid). It is only meaningful with ENSMSEL non-empty, i.e., ENSSIZE > 0

All the strategies are defined in scr/Boundary_strategy.pl. The script generates a file bdstrategy in your working directory that could look like:

 Boundary strategy

       DTG: 2011090618
        LL: 36
     BDINT: 3
   BDCYCLE: 6
  STRATEGY: simulate_operational
     BDDIR: /scratch/snh/hm_home/alaro_37h1_trunk/ECMWF/archive/@YYYY@/@MM@/@DD@/@HH@
HOST_MODEL: ifs
INT_BDFILE: /scratch/snh/hm_home/alaro_37h1_trunk/20110906_18/ELSCFHARMALBC@NNN@

# The output bdstrategy file has the format of 
# NNN|YYYYMMDDHH INT_BDFILE BDFILE BDFILE_REQUEST_METHOD 
# where 
# NNN        is the input hour
# YYYYMMDDHH is the valid hour for this boundary
# INT_BDFILE is the final boundary file
# BDFILE                is the input boundary file
# BDFILE_REQUEST_METHOD is the method to the request BDFILE from e.g. MARS, ECFS or via scp

SURFEX_INI| /scratch/snh/hm_home/alaro_37h1_trunk/20110906_18/SURFXINI.lfi 
000|2011090618 /scratch/snh/hm_home/alaro_37h1_trunk/20110906_18/ELSCFHARMALBC000 /scratch/snh/hm_home/alaro_37h1_trunk/ECMWF/archive/2011/09/06/12/fc20110906_12+006 MARS_umbrella -d 20110906 -h 12 -l 6 -t
003|2011090621 /scratch/snh/hm_home/alaro_37h1_trunk/20110906_18/ELSCFHARMALBC001 /scratch/snh/hm_home/alaro_37h1_trunk/ECMWF/archive/2011/09/06/12/fc20110906_12+009 MARS_umbrella -d 20110906 -h 12 -l 9 -t
...

Meaning that the if the boundary file is not found under BDDIR the command MARS_umbrella -d YYYYMMDD -h HH -l LLL -t BDDIR will be executed. A local interpretation could be to search for external data if your file is not on BDDIR. Like the example from SMHI:

 Boundary strategy

       DTG: 2011090112
        LL: 24
     BDINT: 3
   BDCYCLE: 06
  STRATEGY: latest
     BDDIR: /nobackup/smhid9/sm_esbol/hm_home/ice_36h1_4/g05a/archive/@YYYY@/@MM@/@DD@/@HH@
HOST_MODEL: hir
INT_BDFILE: /nobackup/smhid9/sm_esbol/hm_home/ice_36h1_4/20110901_12/ELSCFHARMALBC@NNN@
 EXT_BDDIR: smhi_file:/data/arkiv/field/f_archive/hirlam/G05_60lev/@YYYY@@MM@/G05_@YYYY@@MM@@DD@@HH@00+@LLL@H00M
EXT_ACCESS: scp

# The output bdstrategy file has the format of 
# NNN|YYYYMMDDHH INT_BDFILE BDFILE BDFILE_REQUEST_METHOD 
# where 
# NNN        is the input hour
# YYYYMMDDHH is the valid hour for this boundary
# INT_BDFILE is the final boundary file
# BDFILE                is the input boundary file
# BDFILE_REQUEST_METHOD is the method to the request BDFILE from e.g. MARS, ECFS or via scp

# hh_offset is 0 ; DTG is  
SURFEX_INI| /nobackup/smhid9/sm_esbol/hm_home/ice_36h1_4/20110901_12/SURFXINI.lfi 
000|2011090112 /nobackup/smhid9/sm_esbol/hm_home/ice_36h1_4/20110901_12/ELSCFHARMALBC000 /nobackup/smhid9/sm_esbol/hm_home/ice_36h1_4/g05a/archive/2011/09/01/12/fc20110901_12+000 scp smhi:/data/arkiv/field/f_archive/hirlam/G05_60lev/201109/G05_201109011200+000H00M 
003|2011090115 /nobackup/smhid9/sm_esbol/hm_home/ice_36h1_4/20110901_12/ELSCFHARMALBC001 /nobackup/smhid9/sm_esbol/hm_home/ice_36h1_4/g05a/archive/2011/09/01/12/fc20110901_12+003 scp smhi:/data/arkiv/field/f_archive/hirlam/G05_60lev/201109/G05_201109011200+003H00M 

In this example an scp from smhi will be executed if the expected file is not in BDDIR. There are a few environment variables that one can play with in sms/confi_exp.h that deals with the initial and boundary files

  • HOST_MODEL : Tells the origin of your boundary data * ifs : ecmwf data * hir : hirlam data * ald : Output from aladin physics, this also covers arpege data after fullpos processing. * ala : Output from alaro physics * aro : Output from arome physics
  • BDINT : Interval of boundaries in hours
  • BDLIB : Name of the forcing experiment. Set
    • ECMWF to use MARS data
    • RCRa to use RCRa data from ECFS
    • Other HARMONIE/HIRLAM experiment
  • BDDIR : The path to the boundary file. In the default location BDDIR=$HM_DATA/${BDLIB}/archive/@YYYY@/@MM@/@DD@/@HH@ the file retrieved from e.g. MARS will be stored in a separate directory. On could also consider to configure this so that all the retrieved files are located in your working directory $WRK. Locally this points to the directory where you have all your common boundary HIRLAM or ECMWF files.
  • INT_BDFILE : is the full path of the interpolated boundary files. The default setting is to let the boundary file be removed by directing it to $WRK.
  • INT_SINI_FILE : The full path of the initial surfex file.

There are a few optional environment variables that could be used that are not visible in config_exp.h

  • EXT_BDDIR : External location of boundary data. If not set rules are depending on HOST_MODEL
  • EXT_ACCESS : Method for accessing external data. If not set rules are depending on HOST_MODEL
  • BDCYCLE : Assimilation cycle interval of forcing data, default is 6h.

More about this can be bounds in the Boundary_strategy.pl script.

The bdstrategy file is parsed by the script ExtractBD.

  • scr/ExtractBD Checks if data are on BDDIR otherwise copy from EXT_BDDIR. The operation performed can be different depending on HOST and HOST_MODEL. IFS data at ECMWF are extracted from MARS, RCR data are copied from ECFS.
    • Input parameters: Forecast hour
    • Executables: none.

In case data should be retrieved from MARS there is also a stage step. When calling MARS with the stage command we ask MARS to make sure data are on disk. In HARMONIE we ask for all data for one day of r forecasts ( normally four cycles ) at the time.

Near real time aerosols

The use of near real time aerosols require the presence of aerosol fields in the boundary files.

  • BDAERO : Origin of the aerosol fields
    • none : no aerosols (default configuration)
    • cams : aerosol from CAMS.

A bdstrategy_cams file is generated. After the data is retrieved, the files are merge with the files from the HOST_MODEL to get the final boundary conditions files.

Data extraction from MARS

The default setup when running on ECMWF HPC atos is to extract IFS data from the MARS archive. The data extraction is done in three steps;

  • Make sure MARS data is on disk on the MARS server by running Prepare_MARS_stage_bd and MARS_stage_bd. This can we switched off by seting MARS_STAGE=no in ecf/config_exp.h.
  • Extract the required data to disk. This is done by a running a number of parallel MARS requests in Prefetch_boundaries. The MARS requests are prepared and optimized in prepare_MARS_prefetch. The number of parallel tasks are default set to 32 and can be controlled by the environment variable MARS_MAX_TASKS.
  • The final generation of each file is done in ExtractBD

The MARS request has two bottlenecks. The first is if data is on tape and this is why a MARS stage is performed before the parallel request execution. The second bottleneck are the spectral transforms performed when tranforming to a regular grid.

The temporary files are finally removed by Clean_prefetch.

Initial and Boundary file generation

To be able to start the model we need the variables defining the model state.

  • T,U,V,PS in spectral space
  • Q in gridpoint or spectral space

Optional:

  • $Q_l$, $Q_i$, $Q_r$, $Q_g$, $Q_s$, $Q_h$
  • TKE

For the surface we need the different state variables for the different tiles. The scheme selected determines the variables.

Boundary files (coupling files) for HARMONIE are prepared in two different ways depending on the nesting procedure defined by HOST_MODEL.

Using gl

If you use data from HIRLAM or ECMWF gl will be called to generate boundaries. The generation can be summarized in the following steps:

  • Setup geometry and what kind of fields to read depending on HOST_MODEL
  • Read the necessary climate data from a climate file
  • Translate and interpolate the surface variables horizontally if the file is to be used as an initial file. All interpolation respects land sea mask properties. The soil water is not interpolated directly but interpolated using the Soil Wetness Index to preserve the properties of the soil between different models. The treatment of the surface fields is only done for the initial file.
  • Horizontal interpolation of upper air fields as well as restaggering of winds.
  • Vertical interpolation using the same method (etaeta) as in HIRLAM
    • Conserve boundary layer structure
    • Conserve integrated quantities
  • Output to an FA file ( partly in spectral space )

gl is called by the script scr/gl_bd where we make different choices depending on PHYSICS and HOST_MODEL

When starting a forecast there are options to whether e.g. cloud properties and TKE should be read from the initial/boundary file through NREQIN and NCOUPLING. At the moment these fields are read from the initial file but not coupled to. gl reads them if they are available in the input files and sets them to zero otherwise. For a Non-Hydrostatic run the non-hydrostatic pressure departure and the vertical divergence are demanded as an initial field. The pressure departure is by definition zero if you start from a non-hydrostatic mode and since the error done when disregarding the vertical divergence is small it is also set to zero in gl. There are also a choice in the forecast model to run with Q in gridpoint or in spectral space.

It's possible to use an input file without e.g. the uppermost levels. By setting LDEMAND_ALL_LEVELS=.FALSE. the missing levels will be ignored. This is used at some institutes to reduce the amount of data transferred for the operational runs.

Using fullpos

If you use data generated by HARMONIE you will use fullpos to generate boundaries and initial conditions. Here we will describe how it's implemented in HARMONIE but there are also good documentation on the gmapdoc site.

In HARMONIE it is done by the script scr/E927. It contains the following steps:

  • Fetcht climate files. Fullpos needs a climate file and the geometry definition for both the input and output domains.

  • Set different moist variables in the namelists depending if your run AROME or ALADIN/ALARO.

  • Check if input data has Q in gridpoint or spectral space.

  • Demand NH variables if we run NH.

  • Determine the number of levels in the input file and extract the correct levels from the definition in scr/Vertical_level.pl

  • Run fullpos

E927 is also called from 4DVAR when the resolution is changed between the inner and outer loops.

Generation of initial data for SURFEX

For SURFEX we have to fill the different tiles with correct information from the input data. This is called the PREP step in the SURFEX context. scr/Prep_ini_surfex creates an initial SURFEX file from an FA file if you run with SURFACE=surfex.

Read more about SURFEX

Reading SST/SIC information

It is possible to update sea-surface temperature (SST) and sea-ice concentration (SIC) from the LBC/coupling files. Since June 2018 and Cycle 45r1, ECMWF's IFS has used interactive ocean and sea ice components. It has been shown that use of these components "... can significantly improve SST predictions in Europe, and as a result, predictions of near-surface air temperature". The use of SST and SIC as surface boundary conditions has the potential to improve the quality of LAM NWP forecasts. See the ECMWF Newsletter article https://www.ecmwf.int/en/newsletter/156/news/effects-ocean-coupling-weather-forecasts describing examples of the coupling improved IFS forecasts in the seas near Europe.

The reading of these data is controlled by the SSTSIC_UPD switch in ecf/config_exp.h. With SSTSIC_UPD=no (default) SST/SIC are read at analysis time and not updated during the forecast. With SSTSIC_UPD=yes SST and SIC are read by the model from files created by the Interpol_sst_mll task in the Boundaries ecFlow family.

Data preparation

The ecf/Interpol_sst_mll.ecf task reads the bdstrategy file described above and calls the scr/Interpol_sst_mll script to "Interpolate SST/SIC from various sources to the model geometry for given MLL & INFILE". The script uses gl (with -sst3 option set) to carry out the interpolation.

Interpol_sst_mll inputDescription
-hCommnand-line option. Model forecast hour
-iCommnand-line option. Input file name
SST_SOURCESEnvironment variable. External SST source used to set gl namelist
EXT_SST_SIC_$LLLHard-coded. Output filename expected by the code (LLL is the forecast length).

The code

The reading of the SST/SIC input files (EXT_SST_SIC_$LLL) is controlled in the scripts by the SSTSIC_UPD environment variable. With it set to yes, the following NAMMCC namelist entries are set to .TRUE.:

&NAMMCC
  LMCC01_MSE=.TRUE.,
  LMCCECSST=.TRUE.,
/

From src/arpifs/module/yommcc.F90:

! LMCC01_MSE = .T.   ===> THE CLIM.FIELD(S) ARE READ IN LBC FILE AND USED IN SURFEX
 :
! LMCCECSST =.T. ===> SST FROM ECMWF (SST-ANA COMB with surf temp over seaice)
!           =.F. ===> SST FROM SURFTEMPERATURE