Troubleshooting Guide

This guide covers debugging tools, common issues, and troubleshooting workflows for MFC.

Quick Reference

Debugging Flags

Flag	Command	Purpose
-v	build, run, test	Show commands being executed
-vv	build, run, test	Full compiler/cmake output
-vvv	build	Add cmake dependency debugging
-d	all	Write debug log to file
--debug	build	Build with debug symbols
--gcov	build	Build with code coverage
--no-gpu	build	Disable GPU to isolate issues
--no-mpi	build	Disable MPI to isolate issues

Profiling Flags

Flag	Command	Purpose
--ncu	run	NVIDIA Nsight Compute (kernel profiling)
--nsys	run	NVIDIA Nsight Systems (timeline profiling)
--rcu	run	AMD rocprof-compute
--rsys	run	AMD rocprof

Useful Commands

./mfc.sh validate case.py      # Check case for errors before running
./mfc.sh build --debug         # Build with debug symbols
./mfc.sh build -vv             # Build with full compiler output
./mfc.sh run case.py -v        # Run with verbose output
./mfc.sh test --only <UUID>    # Run a specific test
./mfc.sh clean                 # Clean and start fresh
./mfc.sh viz case_dir/ --list-vars --step 0  # Inspect post-processed data

Debugging Workflow

Build Fails?

1. Run with -vv to see full error output
   ./mfc.sh build -vv
 
2. If dependency issue, try -vvv for cmake debug
   ./mfc.sh build -vvv
 
3. Try disabling features to isolate:
   ./mfc.sh build --no-gpu      # Rule out GPU issues
   ./mfc.sh build --no-mpi      # Rule out MPI issues
 
4. Clean and rebuild:
   ./mfc.sh clean && ./mfc.sh build

Run Fails?

1. Validate your case first:
   ./mfc.sh validate case.py
 
2. Run with verbose output:
   ./mfc.sh run case.py -v
 
3. Check for NaN or constraint errors in output
 
4. Try with fewer MPI ranks:
   ./mfc.sh run case.py -n 1
 
5. Build in debug mode for better error messages:
   ./mfc.sh build --debug
   ./mfc.sh run case.py

Test Fails?

1. Run the specific failing test:
   ./mfc.sh test --only <UUID>
 
2. Run serially for clearer output:
   ./mfc.sh test --only <UUID> -j 1
 
3. Check test output in:
   build/tests/<UUID>/
 
4. If changes are intentional, regenerate golden files:
   ./mfc.sh test --generate --only <UUID>

Build Debugging

Verbosity Levels

MFC supports three verbosity levels for builds:

-v (Level 1): Shows build progress with file counts

./mfc.sh build -v

# Shows: [1/42] Compiling m_global_parameters.fpp.f90

-vv (Level 2): Full compiler commands and cmake output

./mfc.sh build -vv
# Shows: Full compiler invocations with all flags
# Useful for: Seeing exact compiler errors, checking flags

-vvv (Level 3): Adds cmake dependency debugging

./mfc.sh build -vvv
# Shows: cmake --debug-find output
# Useful for: Finding why cmake can't locate MPI, HDF5, etc.

Debug Builds

Build with debug symbols for better error messages and debugging:

./mfc.sh build --debug

This sets CMAKE_BUILD_TYPE=Debug, which:

Adds -g debug symbols
Reduces optimization (-O0 or -O1)
Enables runtime checks in some compilers
Makes stack traces more readable

Code Coverage

Build with code coverage instrumentation:

./mfc.sh build --gcov

Useful for identifying which code paths are exercised by tests.

Isolating Issues

When builds fail, isolate the problem by disabling features:

# Rule out GPU compiler issues
./mfc.sh build --no-gpu
 
# Rule out MPI issues
./mfc.sh build --no-mpi
 
# Rule out precision issues
./mfc.sh build --single    # Single precision
./mfc.sh build --mixed     # Mixed precision

Clean Rebuild

Sometimes a clean rebuild fixes issues:

./mfc.sh clean

./mfc.sh build -j $(nproc)

Or manually:

rm -rf build/

./mfc.sh build -j $(nproc)

Runtime Debugging

Case Validation

Always validate your case before running:

./mfc.sh validate case.py

This checks for:

Syntax errors in case.py
Invalid parameter values
Constraint violations (incompatible parameter combinations)
Typos in parameter names (with "did you mean?" suggestions)

Example output:

Errors:
  [red] weno_order=7 but must be one of: [1, 3, 5]
  [red] Unknown parameter 'weno_ordr' - did you mean 'weno_order'?
 
Warnings:
  [yellow] bubbles=True recommends setting: nb, polytropic

For more detailed validation output:

./mfc.sh validate case.py -d

Verbose Runs

Add verbosity to see what's happening:

# Show the exact command being run
./mfc.sh run case.py -v
 
# Show job script details
./mfc.sh run case.py -vv

Debug Log

Write detailed debug information to a log file:

./mfc.sh run case.py -d

This creates a debug log with detailed internal state, useful for reporting issues.

Running with Fewer Ranks

To isolate parallel issues, try running with fewer MPI ranks:

./mfc.sh run case.py -n 1 # Single rank

./mfc.sh run case.py -n 2 # Two ranks

Test Debugging

Running Specific Tests

Run only a specific test by UUID:

./mfc.sh test --only 3D_sphbubcollapse

Run multiple specific tests:

./mfc.sh test --only 3D_sphbubcollapse 2D_shockbubble

Serial Test Execution

Run tests serially for clearer output:

./mfc.sh test -j 1

Test Output Location

Test outputs are written to:

build/tests/<UUID>/

This directory contains:

Input files generated from case.py
Output files from each stage (pre_process, simulation, post_process)
Any error messages or logs

Regenerating Golden Files

If you intentionally changed physics/numerics, regenerate reference files:

# Regenerate for specific tests
./mfc.sh test --generate --only <UUID>
 
# Regenerate for a range of tests
./mfc.sh test --generate --from <UUID1> --to <UUID2>

Understanding Test Failures

"Golden file mismatch" means numerical results differ from reference values.

Possible causes:

Intentional changes: Regenerate golden files with --generate
Compiler differences: Different compilers produce slightly different results
Precision settings: Single vs double precision
Platform differences: CPU architecture, GPU differences

Performance Profiling

NVIDIA GPU Profiling

Nsight Compute (kernel-level analysis):

./mfc.sh run case.py --ncu
# Or with additional ncu flags:
./mfc.sh run case.py --ncu --set full

Nsight Systems (timeline/system analysis):

./mfc.sh run case.py --nsys
# Or with additional nsys flags:
./mfc.sh run case.py --nsys -o profile_output

AMD GPU Profiling

rocprof-compute (kernel analysis):

./mfc.sh run case.py --rcu

rocprof (system profiling):

./mfc.sh run case.py --rsys

Performance vs Correctness

If results seem wrong, first verify correctness:

Run ./mfc.sh validate case.py
Compare against a known working case
Try --debug build for better error detection

If results are correct but slow:

Use profiling tools above to identify bottlenecks
Try --case-optimization for production runs (10x speedup)
Check GPU utilization with nvidia-smi or rocm-smi

Common Build Errors

"CMake could not find MPI"

Cause: MPI is not installed or not in PATH.

Fix:

Ubuntu/Debian: sudo apt install libopenmpi-dev openmpi-bin
macOS (Homebrew): brew install open-mpi
HPC systems: module load openmpi or use ./mfc.sh load

Verify: mpirun --version

"CMake could not find a Fortran compiler"

Cause: No Fortran compiler installed or not in PATH.

Fix:

Ubuntu/Debian: sudo apt install gfortran
macOS (Homebrew): brew install gcc
HPC systems: module load gcc or module load nvhpc

"Fypp preprocessing failed"

Cause: Syntax error in .fpp files or missing Fypp.

Fix:

Ensure Fypp is installed: pip install fypp
Check the specific error line in the output
Run ./mfc.sh format to check for formatting issues

Build fails with GPU/OpenACC errors

Cause: GPU compiler not properly configured.

Fix:

Verify GPU compiler is available:
- NVIDIA: nvfortran --version
- AMD: ftn --version or amdflang --version
Load appropriate modules: module load nvhpc or module load rocm
Set CUDA compute capability if needed: export MFC_CUDA_CC=80
Try building without GPU first: ./mfc.sh build --no-gpu

"Module not found" on HPC

Cause: Required modules not loaded.

Fix:

Use MFC's module loader if available for your system:
source ./mfc.sh load -c <cluster> -m <mode>
Or manually load modules:
module load gcc openmpi cmake python

Common Runtime Errors

"Case parameter constraint violations"

Cause: Invalid combination of simulation parameters.

Fix:

Run ./mfc.sh validate case.py for detailed diagnostics
Check the specific constraint mentioned
Review similar examples in examples/

Common issues:

Grid dimensions (m, n, p) not matching dimensionality
weno_order too high for grid size
Boundary conditions not matching domain setup

"NaN detected"

Cause: Numerical instability.

Fix:

Reduce time step (dt)
Check initial conditions for unphysical values
Verify boundary conditions are appropriate
Try a more diffusive scheme initially
Check CFL condition

GPU not detected

Cause: GPU drivers or runtime not configured.

Fix:

Verify GPU is visible:
- NVIDIA: nvidia-smi
- AMD: rocm-smi
Check compiler supports GPU offloading
Ensure correct modules are loaded
Set OMP_TARGET_OFFLOAD=MANDATORY to force GPU (fails fast if unavailable)

MPI errors or hangs

Cause: MPI configuration issues or deadlocks.

Fix:

Verify MPI works: mpirun -n 2 hostname
Try with fewer ranks: ./mfc.sh run case.py -n 1
Check for incompatible MPI/compiler combinations
On clusters, ensure you're using the correct MPI for the interconnect

Visualization Issues

"No 'binary/' or 'silo_hdf5/' directory found"

Cause: Post-processing has not been run, or the case directory path is wrong.

Fix:

Run post_process first:
./mfc.sh run case.py -t post_process
Verify the path points to the case directory (containing binary/ or silo_hdf5/)

"Variable 'X' not found"

Cause: The requested variable was not written during post-processing.

Fix:

List available variables:
./mfc.sh viz case_dir/ --list-vars --step 0
Ensure your case file enables the desired output (e.g., prim_vars_wrt = 'T', cons_vars_wrt = 'T')

"h5py is required to read Silo-HDF5 files"

Cause: The case was post-processed with format=1 (Silo-HDF5) but h5py is not installed.

Fix:

Install h5py: pip install h5py
Or re-run post_process with format=2 in your case file to produce binary output

Visualization looks wrong or has artifacts

Possible causes and fixes:

Color range: Try setting explicit --vmin and --vmax values
Wrong variable: Use --list-vars to check available variables
3D slice position: Adjust --slice-axis and --slice-value to view the correct plane

Getting Help

If you can't resolve an issue:

Search existing issues: GitHub Issues
Check documentation: MFC Documentation
Open a new issue with:
- Your OS and compiler versions
- The exact command you ran
- Complete error output (use -vv for builds)
- Your case file (if applicable)
- Output of ./mfc.sh validate case.py
Use the CLI help:
./mfc.sh help debugging

./mfc.sh help gpu

./mfc.sh <command> -h

Page last updated: 2026-02-04