Running Harmonie under ecFlow
Introduction
This document describes how to run Harmonie under ecFlow scheduler at ECMWF. ecFlow is the ECMWF workflow manager and it has been written using python to improve maintainability, allow easier modification and introduce object orientated features as compared to the old scheduler SMS. ecFlow can be used in any HARMONIE version in and above harmonie-40h1.1.beta.1.
New users
On the ECMWF Atos machine in Bologna, each user has a virtual machine on which ecFlow is running. If you don't have a VM yet, ask ECMWF to set it up for you. If you are starting ecFlow for the first time at ECMWF, you may have to add your ssh key to the authorized_keys file to allow passwordless access, as ssh is used to communicate between the servers:
cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keysStart your experiment supervised by ecFlow
Launch the experiment in the usual manner by giving start time, DTG, end time, DTGEND and other optional arguments
~hlam/Harmonie start DTG=YYYYMMDDHHIf successful, ecFlow will identify your experiment name and start building your binaries and run your forecast. If not, you need to examine the ecFlow log file $HM_DATA/ECF.log. $HM_DATA is defined in your Env_system file. At ECMWF $HM_DATA=$SCRATCH/hm_home/$EXP where $EXP is your experiment name.
The ecFlow viewer starts automatically. To view any suite for your server or other servers, the server must be added to the ecFlow viewer (via Servers -> Manage servers, Add server) and selected in Servers. See below on how to find the port and server name.
- More than one experiment is not allowed with the same name monitored in the same server so Harmonie will start the server and delete previous non-active suite for you.
- For deleting a suite manually using
ecflow_client --port XXXX --host XXXX --delete force yes /suiteor using the GUI: right-click on the suite, then click "Remove" (if you don't see the Remove option, go to Tools -> Preferences -> Menus, and make yourself Administrator) - If other manual intervention in server or client is needed you can use ecflow commands. See here.
ecFlow control
Finding the port and host of the ecFlow server
The server on which ecFlow is running is defined with variable $ECF_HOST, the port with ECF_PORT, set in Env_system or derived. On the VMs on ECMWF Atos machine in Bologna ECF_PORT=3141 for all users, and ECF_HOST=ecflow-gen-${USER}-001 or ECF_HOST=ecfg-${user}-1 for newer users. On ECMWF's Atos, Harmonie tries to select the appropriate ECFHOST in `Envsubmit`.
Information about server variables can be found by running:
- On ECMWF's Atos, e.g.:
ssh ecflow-gen-${USER}-001 ecflow_server status- Or if ecFlow is running on the machine you are logged into:
ecflow_server statusYou can also find ECF_PORT/ECF_HOST by checking the files under $ECF_HOME, like:
> ls -rlt ~/ecflow_server
total 12
-rw-r--r-- 1 hlam accord 2529 Jun 15 16:20 ecflow-gen-hlam-001.3141.ecf.check.b
-rw-r--r-- 1 hlam accord 2529 Jun 20 17:36 ecflow-gen-hlam-001.3141.ecf.check
-rw-r--r-- 1 hlam accord 3113 Jun 20 17:38 ecflow-gen-hlam-001.logCheck the status of your server
To check the status of your server you can use
ecflow_client --stats --port ECF_PORT --host ECF_HOSTor
ecflow_client --port ECF_PORT --host ECF_HOST --pingor go to the "Info" tab in the ecFlow viewer.
Open the viewer of a running ecFlow server
If you know that your ecFlow server is running but you have no viewer attached to it you can restart the viewer:
ecflow_ui &Stop your ecFlow server
If you are sure you're running the server on the login node of your machine you can simply run
ecflow_stop.shA more complete and robust way is
export ECF_PORT=<your port>
export ECF_HOST=<your server name>
ecflow_client --halt=yes
ecflow_client --check_pt
ecflow_client --terminate=yesRestart your ecFlow server
The ecFlow servers on the virtual machines a ECMWF should be restarted automatically. If it doesn't, you may need to restart it with:
ssh ecflow-gen-${USER}-001 sudo systemctl restart ecflow-serverOn other systems, if the server is not running you can start again using the script:
ecflow_start.sh [-d $ECF_HOME]If ecFlow is running on a different machine you have to login and start it on that machine:
ssh <your server name>
module load ecflow
ecflow_start.sh [-d $ECF_HOME]As an alternative you can let Harmonie start the server for you when starting your next experiment, or type
~hlam/Harmonie monKeep your ecFlow server alive
If not using ecFlow at the ECMWF's VMs, the ecFlow server will eventually die causing an unexpected disruption in you experiments. To prevent this you can add a cron job restarting the server e.g. every fifth minute.
> crontab -l
*/5 * * * * /home/$USER/bin/cronrun.sh ecflow_start.sh -d $ECF_HOME > ~/ecflow_start.out 2>&1where tthe small script cronrun.sh makes sure you get the right environment
#!/bin/bash
source ~/.bash_profile
module unload ecflow
module load ecflow/5.7.0
$@The ecFlow server version may change over time.
Add another user to your ecFlow viewer
Sometimes it's handy to be able to follow, and control, your colleagues experiments. To be able to do this do the following steps:
- Find the port number of your colleague as described above.
- In the ecFlow viewer choose Servers -> Manage servers, click on "Add server" and fill in the appropriate host and port and give it a useful name. Click on OK to save it.
- If you click on Servers in the viewer the name should appear and you can make it visible by clicking on it.
Changing the port
By default, the port is set by
export ECF_PORT=$((1500+usernumber))in mSMS.job (40h1.1), Start_ecFlow.sh (up to #b6d58dd), or Main (currently).
For the VMs at ECMWF it is set to 3141 in Env_system. If you want to change this number (for example, if that port is in use already), you will also need to add a -p flag when calling ecflow_start.sh as follows:
ecflow_start.sh -p $ECF_PORT -d $JOBOUTDIROtherwise, ecflow_start.sh tries to open the default port.
Note: if you already have an ecFlow server running at your new port number before launching an experiment, this won't be an issue.