7. Developers Guide for PION

7.1. Coding style and guides

PION is written in C++ and developed using git as the version control system. The PION development branch is on a DIAS gitlab repo https://git.dias.ie/compastro/pion, and the name of the development branch is devel. All active development of new features takes place on devel. Once a set of changes are complete and a new version of PION is to be released, devel will be merged/rebased back to the master branch and tagged as a release. The new master branch is then pushed to the public PION repository at https://git.dias.ie/massive-stars-software/pion.

7.1.1. Developing a new feature

Feature branches can be created off devel as follows, using the gitlab web interface:

  1. Create an issue with a short title describing the feature you want to work on.

  2. Open the issue page, and there is a dropdown menu for create merge request. Click the down arrow and choose a name for the new branch (the number automatically chosen is the number of the issue you have created) and select that the source branch should be devel. Then click create merge request.

  3. This brings you to a page for the merge request which is by default marked as Draft. You don’t need to do anything here, just note the name of the branch, e.g. 2-test-stuff.

  4. Pull the server updates to your local copy of the repo, and checkout the new branch: $ git checkout 2-test-stuff.

  5. Then you can work on this branch until you are finished with the new feature and it is time to merge the branch back into devel. There are at least two checks to make before merging:

    1. run $ ./tools/run_format.sh to autoformat your changes according to the PION conventions. Then commit and push changes.

    2. Pushing to a branch off devel should trigger the pipeline to run. This is a series of jobs that are sent to a server and run in a docker container somewhere. The first checks that the source code is formatted. The next 4 check that the PION executables compile without errors, and the last runs the Double Mach Reflection test and checks that the result has an L1 error norm of \(<10^{-10}\) compared with a reference solution (i.e. it checks for consistency with older code versions).

  6. Once these steps are completed successfully, navigate to the merge request page on your browser and click Mark as ready at the top right of the page.Beside the (hopefully green) Merge button you should select delete source branch and squash commits. You can add an informative squash commit message that explains what you did on this branch with all the commits. Then click Merge.

  7. Check that the merged changes now on devel pass the pipeline checks (this can take up to 20 mins).

7.1.2. Coding conventions

  • Set your code editor to use spaces not tabs for indentation (or run $ ./tools/run_format.sh before each commit).

  • Each indendation block is 2 spaces

  • Try to add comments to your code to explain what it does, including references to papers if possible.

  • Header files should have class and function descriptions in comment blocks denoted with /// so that automatic documentation generators (like doxygen) can parse the comments.

7.1.3. Generating documentation from source code

PION has plenty of in-place documentation that can be generated with doxygen. Here is how to compile the docs:

  • On debian/ubuntu systems you can get all the functionality by installing the packages doxygen doxygen-latex graphviz.

  • Add -DPION_BUILD_DOCUMENTATION=ON to the list of options in your cmake command (maybe in the build script).

  • When you build PION this should generate documentation in the directory build/doc/doc, the HTML docs in a subdirectory called html and the LaTeX docs in latex.

  • Navigate to this directory and run e.g. $ firefox html/index.html

  • Use the dropdown menus to explore the class hierarchy and the source code.

  • To generate a PDF you can navigate to the latex subdirectory of the generated docs and run $ make; this should generate a PDF called refman.pdf.

7.2. Computational Grid

The PION computational grid is a uniform rectilinear grid in 1, 2 or 3 spatial dimensions. PION can be run in “uniform-grid” mode or “nested-grid” mode, depending on whether static mesh-refinement is needed. The nested-grid mode sets up a hierarchy of grids with different levels of refinement, each refined grid fully enclosed by the coarser grid one level up in the hierarchy. The uniform-grid mode sets up a single grid and there is no mesh refinement. Grid cells are all the same diameter for a given refinement level, and they are squares (2D) or cubes (3D) in the space that is modelled.

Terminology used in this guide:
  • the terms cell and zone are used interchangeably and mean the same thing.

  • ghost cells and boundary cells have the same meaning.

  • static mesh-refinement and nested-grid are both used to denote the mesh-refinement strategy used in PION, sometimes interchangeably.

  • The six directions in 3D space are denoted \(\{\pm\hat{x},\pm\hat{y},\pm\hat{z}\}\)

7.2.1. Grid Geometry

There are 5 options for the combinations of coordinate system and dimensionality that have been implemented:

  1. Cartesian geometry, 1D simulations: this is a line in the \(\pm\hat{x}\) direction that represents an infinite slab (gradients of all quantities in the two perpenicular directions are set to zero).

  2. Spherical geometry 1D simulations: a spherically symmetric configuration is modelled, again with gradients in the angular directions set to zero. In this case geometric source terms are added to the fluid equations because each cell is a spherical shell and its volume increases with radius.

  3. Cartesian geometry, 2D simulations: each point on the 2D plane represents an infinite line in the 3rd dimension (gradients perpendicular to the plane are zero), and the divergence of a quantity is linear with distance rather than quadratic. This is of limited use for modelling physical systems, but is useful for development work and exploring parameter space.

  4. Cylindrical geometry, so-called 2.5D simulations: the \((R,z)\) plane is simulated assuming rotational symmetry about the axis \(R=0\). Again for cylindrical coordinates a geometric source term arises in the radial direction because each cell is a ring (of square cross section) in 3D space and its volume increases with cylindrical radius. Rotation and toroidal magnetic fields are possible in this configuration, but gradients in the angular directions are obviously zero.

  5. Cartesian geometry, 3D simulations: the 3D domain is represented by a block of cubic cells.

7.2.2. Mesh refinement

Static mesh-refinement (i.e., a multiply nested-grid structure) is implemented for the case where each refinement level has the same shape and number of cells as the coarser level above it, but the spatial resolution is a factor of 2 higher and so the refined grid covers 1/2 of the domain of the coarser grid in each dimension. The focus of the nested grid can be at the centre of the domain, the negative boundary or the positive boundary for each dimension. The number of refinement levels is in principle unlimited, but snapshots can only be written every coarse timestep and this imposes a practical limit: for 10 grid levels there are 512 finest-level timesteps per coarsest-level step, and one probably wants to save a snapshot at least this often. The coarsest level is denoted level 0, and each refined level has a higher level number.

7.2.3. Serial and Parallel grid setup

PION can be run in serial mode, in which case there is a single process that calculates everything (i.e., no parallel execution). In this case, the grid corresponding to the full domain is set up as a single block of data for each refinement level. PION can also be run in parallel mode, so that \(N_p\) MPI processes are started and these calculate in parallel. On each refinement level the full domain is split into \(N_p\) sub-domains and each MPI process sets up a grid of cells corresponding only to its sub-domain (plus some boundary data for synchronisation). Each MPI process has exactly 1 sub-domain on each refinement level.

7.2.4. Location of source files

The source files controlling the grid are located in source/grid/, particularly uniform_grid.cpp (single-domain mode) and uniform_grid_pllel.cpp (MPI parallel mode with multiple sub-domains). Classes for setting up grids are as follows:

  • Uniform grid, single level, single domain: setup_fixed_grid.cpp

  • Uniform grid, single level, MPI-parallel, multiple domains: setup_fixed_grid_MPI.cpp

  • Static mesh-refinement, single domain: setup_NG_grid.cpp

  • Static mesh-refinement, MPI-parallel, multiple domains: setup_grid_NG_MPI.cpp


7.3. Control Flow

The different versions of PION have slightly different control flow, depending on whether PION is run in single-core or parallel mode, and nested-grid or uniform-grid mode. The time-integration scheme is based on that of Falle (1991) and Falle et al. (1998); the uniform-grid implementation is described in some detail in Mackey (2012), and the nested-grid implementation in Mackey et al. (2021) (section 2.4). You are encouraged to read these papers to gain some understanding of what each step does. Here the most complicated routine will be described: the parallel implementation of the nested-grid method.

7.3.1. Outer loop

The overall control of PION comes from the main() function defined in the following files in the source directory of PION: + uniform grid, serial mode: main.cpp + uniform grid, parallel mode: main_MPI.cpp + nested grid, serial mode: main_NG.cpp + nested grid, parallel mode: main_NG_MPI.cpp

This code should rarely need to be updated, however, because it just checks that the command-line options are OK, then in turn calls Init(), Time_Int() and Finalise() functions from the relevant simulation-control class. Everything in Init() and Finalise() is only called once per simulation run and so efficiency is not crucial here. The function Time_Int() is responsible for integrating the initial conditions forward in time to a final state after a (usually large) number of timesteps.

The versions of Time_Int() can be found here:

  • uniform grid, serial mode: sim_control/sim_control.cpp

  • uniform grid, parallel mode: sim_control/sim_control_MPI.cpp

  • nested grid, serial mode: sim_control/sim_control_NG.cpp

  • nested grid, parallel mode: sim_control/sim_control_NG_MPI.cpp

This routine is responsible for

  1. setting the timestep for the next step;

  2. saving snapshots periodically according to the chosen criteria;

  3. stepping forward in time repeatedly until

  4. the execution is stopped when the criterion is reached for ending the simulation.

Each timestep is taken by the function advance_time().

  • The uniform-grid version of this (serial and parallel) is found in sim_control/time_integrator.cpp, and is split into a 1st-order step and a 2nd-order step. This refers to the order of accuracy of the integration scheme: the 1st-order scheme is basically forward Euler integration with piecewise-constant data, whereas the 2nd-order scheme involves two sub-steps to achieve better accuracy. Initially a 1st-order step is taken for half of the requested timestep, to obtain a time-centred intermediate state. A piecewise-linear interpolation of the data is then applied to this state, and the fluxes across cell boundaries are calculated. These fluxes are used to step the initial state forward by the full timestep (i.e. the intermediate state is discarded once the fluxes have been calculated). This step is accurate to 2nd order in space and time (see Falle et al. (1998)).

  • The nested-grid version (serial and parallel) is found in sim_control/sim_control_NG.cpp, and calls either advance_step_OA1() for a 1st-order step or advance_step_OA2() for a 2nd-order step.

Note that the 1st-order scheme is only used for testing and development purposes, and all scientific applications should use the 2nd-order scheme because it is always more accurate for a given computational expense.

7.3.2. 2nd-order timestep for nested-grid, parallel run

The most complicated (and most useful) timestepping routine in PION is advance_step_OA2() for the nested-grid, parallel execution, found in sim_control/sim_control_NG_MPI.cpp. This is a recursive function, first called on the coarsest grid level 0, which then calls itself repeatedly until the finest level is reached. In this way, calling the function a single time on level 0 will advance all levels one step on level 0 (2 steps on level 1, 4 steps on level 1, \(2^n\) steps on level \(n\)).

  • The first task is to ensure that data on all levels are consistent:
    • Receive any external boundary data from a coarser level, if required: BC_update_COARSE_TO_FINE_RECV(). This can involve MPI communication if the required data is on another MPI process.

    • Update external boundary conditions, including boundary data between sub-domains on the grid level associated with different MPI processes: TimeUpdateExternalBCs(). This usually involves MPI communication.

    • Send any boundary data from this level to the outer boundary of the next finer level, if required: BC_update_COARSE_TO_FINE_SEND(). Also usually involves MPI communication, if the destination grid is on another MPI process.

  • Next take a timestep on the next finer level, if it exists (i.e. call the advance_step_OA2() function on the next finer level).

  • Then calculate the 1st-order half-step on this level (none of this involves MPI communication):
    • Calculate changes associated with radiative heating and cooling, and chemical kinetics: calc_microphysics_dU()

    • Calculate changes from (magneto-)hydrodynamical fluxes across cell boundaries: calc_dynamics_dU()

    • Calculate changes due to thermal conduction of internal energy (not really working): calc_thermal_conduction_dU()

    • Calculate the half-step intermediate state based on the above changes: grid_update_state_vector()

  • Now update various quantities to the intermediate half-step values:
    • update internal boundaries (i.e. stellar winds): TimeUpdateInternalBCs()

    • Receive data from finer levels to replace calculated data (assuming finer level is more accurate): BC_update_FINE_TO_COARSE_RECV(). This generally involves MPI communication.

    • Update external boundaries, including boundary data between sub-domains on the grid level associated with different MPI processes: TimeUpdateExternalBCs(). This generally involves MPI communication.

    • Upate properties of radiation sources (if present) and do raytracing on this grid: update_evolving_RT_sources() and do_ongrid_raytracing(). This generally involves MPI communication.

  • Calculate the full-step 2nd-order update, again microphysics, dynamics, (thermal conduction), but this time make some corrections before finalising the update:
    • If not the coarsest level, save fluxes crossing the level boundary save_fine_fluxes()

    • If not the finest level, save fluxes crossing cell boundaries corresponding to the level boundary of the next finer level save_coarse_fluxes()

    • If level has data for outer boundary of finer level, send the data BC_update_COARSE_TO_FINE_SEND(). This generally involves MPI communication.

  • Calculate a second step on the finer level by recursively calling advance_step_OA2()

  • Receive flux values from level boundary of finer level, and correct fluxes on this level accordingly: recv_BC89_fluxes_F2C(). This generally involves MPI communication.

  • Update state on this level for the full step: grid_update_state_vector(), and advance the current time.

  • Update internal boundaries (i.e. stellar winds): TimeUpdateInternalBCs()

  • Replace data with value averaged from finer grids: BC_update_FINE_TO_COARSE_RECV(). This generally involves MPI communication.

  • Update properties of radiation sources (if present) to end of timestep and do raytracing on this grid: update_evolving_RT_sources() and do_ongrid_raytracing(). This generally involves MPI communication.

  • Send fluxes crossing level boundary to the next coarser level (for correcting fluxes at that level): send_BC89_fluxes_F2C(). This generally involves MPI communication.

  • Send level data to coarser level, to replace what was calculated there: BC_update_FINE_TO_COARSE_SEND(). This generally involves MPI communication.

  • Return so that the next step can take place.

Without the nested grid levels, the steps are much simpler and the following function calls are omitted: BC_update_COARSE_TO_FINE_RECV(), BC_update_COARSE_TO_FINE_SEND(), BC_update_FINE_TO_COARSE_RECV(), save_fine_fluxes(), save_coarse_fluxes(), BC_update_COARSE_TO_FINE_SEND(), recv_BC89_fluxes_F2C(), BC_update_FINE_TO_COARSE_RECV(), send_BC89_fluxes_F2C(), BC_update_FINE_TO_COARSE_SEND(). The only functions requiring MPI communication are TimeUpdateExternalBCs() and do_ongrid_raytracing().

7.4. Parallel communication

Parallel execution in PION is implemented using the Message Passing Interface (MPI) library with a multi-core and multiple domain model. This is based on domain splitting, where each level of the computational grid is recursively bisected \(i\) times along the longest dimension until there are \(N_P = 2^i\) domains, one for each of the \(N_P\) MPI processes. For nested-grid simulations the grid has the same shape and size (in terms of numbers of cells) on all levels, and so the same decomposition is performed at each level, and each MPI process has exactly 1 sub-domain (or patch) on each level.

Interprocess communication is in PION is coded such that the implementation and library used is completely encapsulated in a class, so that calls to MPI functions are not scattered throughout the code. The motivation for this is so that a different communication library could be used in future without significant modification of the PION source code. There is an abstract base class, comms_base, defined in source/comms/comms.h, which contains the API for interprocess communication. This contains function declarations for sending and receiving data between processes, for the Silo parallel I/O interface, and various other common tasks such as broadcasting and calculating miminum or maximum across sub-domains. There are two comms libraries implemented:

  • The class comm_files in source/comms/comm_files.h/.cpp is purely for code testing and debugging parallel communication. It is a very simple communication library that uses disk I/O to send and receive data between processes (and hence it is very slow). The advantage is that the communicated data can be parsed outside PION if an error occurs, helping with debugging. This library is, however, poorly maintained, and may no longer work as intended.

  • The class comm_mpi in source/comms/comm_mpi.h/.cpp is used in all other cases. This uses the system MPI library to call various MPI functions, enabling interprocess communication. It also uses the Silo PMPIO library for parallel I/O. The rest of this section describes the MPI implementation in more detail.

Whether one uses the file-comms or MPI is a compile-time decision, and a (global) pointer to the communications class called COMM is initialised with the correct class at run-time.

7.4.1. Domain decomposition

The decomposition of the domain is controlled by the “Multi-Core, Multiple-Domain (MCMD)” class in decomposition/MCMD_control.cpp. The function decomposeDomain() is called by Init() from either sim_control/sim_control_MPI.cpp (uniform grid, called on a single level) or sim_control/sim_control_NG_MPI.cpp (nested grid, called once for each level). This is called before each process sets up the grid for its sub-domain, and the function sets the extents and numbers of cells for each sub-domain based on the MPI rank of the process. It also sets the ranks of the neighbouring MPI processes, if applicable.

7.4.2. Calculating timestep

PION uses an explicit time-integration scheme for the ideal-MHD or Euler equations, which is stable as long as no signal can cross a full grid cell in a single timestep. The function calculate_timestep() in source/sim_control/calc_timestep.h/.cpp provides the single-process implementation, and this is re-implemented as sim_control_pllel::calculate_timestep() for multi-process simulations in source/sim_control/sim_control_MPI.h/.cpp. The only difference compared with the single-process version is that the global minimum timestep is calculated across all sub-domains, via calls to the COMM class pointer.

7.4.3. Exchange of boundary data on sub-domains

Layers of boundary (or ghost) cells are added to pad each sub-domain, constituting duplicated data that is calculated on the neighbouring sub-domains. The layers are 4 cells deep for boundaries between sub-domains, and 6 cells deep for the outer boundaries of each level’s domain.

for a 3D simulation the \(\pm\hat{x}\) boundaries have the smallest extent (extending to \([y_\mathrm{min},y_\mathrm{max}]\) and \([z_\mathrm{min},z_\mathrm{max}]\)) and \(\pm\hat{z}\) boundaries are the largest, extending to include the edge and corner ghost cells in the \(\pm\hat{x}\) and \(\pm\hat{y}\) directions. The layout of the boundaries is plotted in Fig. 7.1, showing how the shape of the sub-domain plus boundaries cuboid is the same as that of the sub-domain itself, with extra layers of cells on all sides.

Layout of boundary regions for a 3D simulation.

Fig. 7.1 Layout of boundary regions for a 3D simulation. A cross-section through a plane of constant \(z\) is shown on the left, with the layout of the \(x\) and \(y\) boundaries. The right-hand plot shows how the \(z\) boundaries cover both the \(x\) and \(y\) boundary regions, completing the cuboid shape.

There are two boundary exchanges that use the boundaries of the sub-domains:
  • Data are exchanged with neighbouring sub-domains each timestep for data on the same level, to ensure that the solution remains consistent across all sub-domains (and identical to a calculation run on a single process with a single domain). These boundaries are called “MPI” boundaries in PION, defined in boundaries/MCMD_boundaries.h/.cpp, called by TimeUpdateExternalBCs() in boundaries/assign_update_bcs_MPI.cpp at every timestep.

  • Raytracing to calculate photoionization and photoheating rates must be calculated when radiation sources are present. This involves communication of attenuation factors across sub-domains, with send and receive functions defined in boundaries/RT_MPI_boundaries.h/.cpp, called by raytracing/raytracer_SC_pllel.cpp every time a raytracing is required.

7.4.4. Exchange of data between levels of the nested grid

There are three further boundary communications needed between levels of a nested grid:

  1. Restriction: If a grid has a more refined grid covering part of its sub-domain, then at each timestep the refined grid averages the data from blocks of \(2^3\) cells and sends this to the coarser grid. The received data overwrites anything calculated by the coarse grid. This is called restriction, and is referred to in the PION source code as “FINE2COARSE”.

    • It is implemented in the classes NG_fine_to_coarse_bc (serial) and NG_MPI_fine_to_coarse_bc (parallel) that are defined in the files boundaries/NG_fine_to_coarse_boundaries.h/.cpp and boundaries/NG_MPI_fine_to_coarse_boundaries.h/.cpp, respectively.

    • the function setup_grid_NG_MPI::setup_boundary_structs() in grid/setup_grid_NG_MPI.cpp adds FINE_TO_COARSE_RECV and/or FINE_TO_COARSE_SEND boundaries to the list of boundaries if this sub-domain needs to receive smoothed data from a finer-level sub-domain or needs to send smoothed data to a coarser-level sub-domain.

    • There are functions to assign the boundary data to be sent/received. These are called only once, in the function assign_update_bcs_NG_MPI::assign_boundary_data() defined in boundaries/assign_update_bcs_NG_MPI.cpp.

    • There are function to update the boundary data, i.e. exchange data between levels, possibly on another MPI process. These are called every timestep by the functions Time_Int() and advance_step_OA1() or advance_step_OA2() in sim_control/sim_control_NG_MPI.cpp.

  2. Prolongation: All refined grids are enclosed by coarser grids, and the outer boundary of a refined grid therefore needs to be updated by interpolating the solution from the coarse grid onto the external ghost cells of the refined grid. This process is known as prolongation. It is trickier and more computationally intensive than restriction, because of the interpolation and the requirement to conserve mass, momentum and energy. PION calculates slopes in the primitive variables in each direction for each coarse cell, and sends the primitive vector and the 3 slope vectors to the refined grid. The refined grid then interpolates onto the cell-centres of its cells using these slopes, and applies a correction to ensure conservation is maintained. In PION source code this process is denoted “COARSE2FINE”.

    • It is implemented in the classes NG_coarse_to_fine_bc (serial) and NG_MPI_coarse_to_fine_bc (parallel) in the files boundaries/NG_MPI_coarse_to_fine_boundaries.h and boundaries/NG_MPI_coarse_to_fine_boundaries.cpp, respectively.

    • the function setup_grid_NG_MPI::setup_boundary_structs() in grid/setup_grid_NG_MPI.cpp sets external sub-domain boundaries to be COARSE_TO_FINE_RECV type if needed. It also adds COARSE_TO_FINE_SEND boundaries to the list of boundaries if finer-level sub-domains below this sub-domain has level boundaries.

    • Again there are functions to assign the boundary data to be sent/received, called only once, in the function assign_update_bcs_NG_MPI::assign_boundary_data() defined in boundaries/assign_update_bcs_NG_MPI.cpp.

    • There are function to update the boundary data, i.e. exchange data between levels, possibly on another MPI process. These are called every timestep by the functions Time_Int() and advance_step_OA1() or advance_step_OA2() in sim_control/sim_control_NG_MPI.cpp.

  3. Flux Conservation: To maintain conservation, the flux crossing the boundaries of refined grids must be kept consistent with the flux crossing the same surface in the coarser grids. This process was pioneered by Berger & Colella (1989), hence in PION the methods are referred to as “BC89”. The cells on the fine and coarse grids that have a face on the outer boundary of the fine grid are identified and an extra flux vector F is initialised only on these cells. Corner cells may have up to three of these flux vectors, so they are put in a vector of arrays, but most face cells only have one. Time-averaged fluxes over two fine-grid timesteps are sent to the coarse grid and these are used to correct the coarse-grid fluxes to be consistent with the fine grid (it is assumed that the fine grid is more accurate).

    • This is implemented in the class NG_BC89flux (serial) and NG_MPI_BC89flux (parallel), defined in boundaries/NG_BC89flux.cpp and boundaries/NG_MPI_BC89flux.h/.cpp, respectively.

    • The setup functions make a list of cells that are on a level boundary and need flux correction (both to send to coarser grid, and to receive from finer grid). The function NG_BC89flux::setup_flux_vectors() in boundaries/NG_BC89flux.cpp (serial) and boundaries/NG_MPI_BC89flux.cpp (parallel) is called by setup_grid_NG_MPI::setup_grid() in grid/setup_grid_NG_MPI.cpp. The parallel implementation decides whether the grid with the BC89 boundary that we need to exchange data with is on the same MPI process (call serial function) or on a different process (call the MPI-parallel function).

    • The update functions, recv_BC89_fluxes_F2C() and send_BC89_fluxes_F2C(), exchange data between levels, possibly on another MPI process. These are called every timestep by the functions Time_Int() and advance_step_OA1() or advance_step_OA2() in sim_control/sim_control_NG_MPI.cpp.

7.4.5. Parallel input/output (read from/write to disk)

The only parallel I/O implementation that is optimised for speed is using the Silo library and its PMPIO interface. This takes a round-robin approach to reading and writing data, where \(N\) MPI processes (about 8 or 16) can work simultaneously. Each MPI process is assigned to one of \(N\) groups of processes, and only one process per group can read or write at any one time. When it is finished, it passes on a baton to the next process in its group, which then reads/writes its data, and so on. The motivation is that most supercomputers support multiple simultaneous connections to the file system for parallel I/O, but the number of these is much smaller than the number of cores on the system. Each level of the nested grid is written to its own file, and all data on all levels is saved.

#.. include:: dev_winds.inc