Troubleshooting Guide
This guide covers debugging tools, common issues, and troubleshooting workflows for MFC.
Quick Reference
Debugging Flags
| Flag | Command | Purpose |
| -v | build, run, test | Show commands being executed |
| -vv | build, run, test | Full compiler/cmake output |
| -vvv | build | Add cmake dependency debugging |
| -d | all | Write debug log to file |
| --debug | build | Build with debug symbols |
| --gcov | build | Build with code coverage |
| --no-gpu | build | Disable GPU to isolate issues |
| --no-mpi | build | Disable MPI to isolate issues |
Profiling Flags
| Flag | Command | Purpose |
| --ncu | run | NVIDIA Nsight Compute (kernel profiling) |
| --nsys | run | NVIDIA Nsight Systems (timeline profiling) |
| --rcu | run | AMD rocprof-compute |
| --rsys | run | AMD rocprof |
Useful Commands
./mfc.sh validate case.py # Check case for errors before running
./mfc.sh build --debug # Build with debug symbols
./mfc.sh build -vv # Build with full compiler output
./mfc.sh run case.py -v # Run with verbose output
./mfc.sh test --only <UUID> # Run a specific test
./mfc.sh clean # Clean and start fresh
Debugging Workflow
Build Fails?
1. Run with -vv to see full error output
./mfc.sh build -vv
2. If dependency issue, try -vvv for cmake debug
./mfc.sh build -vvv
3. Try disabling features to isolate:
./mfc.sh build --no-gpu # Rule out GPU issues
./mfc.sh build --no-mpi # Rule out MPI issues
4. Clean and rebuild:
./mfc.sh clean && ./mfc.sh build
Run Fails?
1. Validate your case first:
./mfc.sh validate case.py
2. Run with verbose output:
./mfc.sh run case.py -v
3. Check for NaN or constraint errors in output
4. Try with fewer MPI ranks:
./mfc.sh run case.py -n 1
5. Build in debug mode for better error messages:
./mfc.sh build --debug
./mfc.sh run case.py
Test Fails?
1. Run the specific failing test:
./mfc.sh test --only <UUID>
2. Run serially for clearer output:
./mfc.sh test --only <UUID> -j 1
3. Check test output in:
build/tests/<UUID>/
4. If changes are intentional, regenerate golden files:
./mfc.sh test --generate --only <UUID>
Build Debugging
Verbosity Levels
MFC supports three verbosity levels for builds:
-v (Level 1): Shows build progress with file counts
./mfc.sh build -v
# Shows: [1/42] Compiling m_global_parameters.fpp.f90
-vv (Level 2): Full compiler commands and cmake output
./mfc.sh build -vv
# Shows: Full compiler invocations with all flags
# Useful for: Seeing exact compiler errors, checking flags
-vvv (Level 3): Adds cmake dependency debugging
./mfc.sh build -vvv
# Shows: cmake --debug-find output
# Useful for: Finding why cmake can't locate MPI, HDF5, etc.
Debug Builds
Build with debug symbols for better error messages and debugging:
This sets CMAKE_BUILD_TYPE=Debug, which:
- Adds -g debug symbols
- Reduces optimization (-O0 or -O1)
- Enables runtime checks in some compilers
- Makes stack traces more readable
Code Coverage
Build with code coverage instrumentation:
Useful for identifying which code paths are exercised by tests.
Isolating Issues
When builds fail, isolate the problem by disabling features:
# Rule out GPU compiler issues
./mfc.sh build --no-gpu
# Rule out MPI issues
./mfc.sh build --no-mpi
# Rule out precision issues
./mfc.sh build --single # Single precision
./mfc.sh build --mixed # Mixed precision
Clean Rebuild
Sometimes a clean rebuild fixes issues:
./mfc.sh clean
./mfc.sh build -j $(nproc)
Or manually:
rm -rf build/
./mfc.sh build -j $(nproc)
Runtime Debugging
Case Validation
Always validate your case before running:
./mfc.sh validate case.py
This checks for:
- Syntax errors in case.py
- Invalid parameter values
- Constraint violations (incompatible parameter combinations)
- Typos in parameter names (with "did you mean?" suggestions)
Example output:
Errors:
[red] weno_order=7 but must be one of: [1, 3, 5]
[red] Unknown parameter 'weno_ordr' - did you mean 'weno_order'?
Warnings:
[yellow] bubbles=True recommends setting: nb, polytropic
For more detailed validation output:
./mfc.sh validate case.py -d
Verbose Runs
Add verbosity to see what's happening:
# Show the exact command being run
./mfc.sh run case.py -v
# Show job script details
./mfc.sh run case.py -vv
Debug Log
Write detailed debug information to a log file:
This creates a debug log with detailed internal state, useful for reporting issues.
Running with Fewer Ranks
To isolate parallel issues, try running with fewer MPI ranks:
./mfc.sh run case.py -n 1 # Single rank
./mfc.sh run case.py -n 2 # Two ranks
Test Debugging
Running Specific Tests
Run only a specific test by UUID:
./mfc.sh test --only 3D_sphbubcollapse
Run multiple specific tests:
./mfc.sh test --only 3D_sphbubcollapse 2D_shockbubble
Serial Test Execution
Run tests serially for clearer output:
Test Output Location
Test outputs are written to:
This directory contains:
- Input files generated from case.py
- Output files from each stage (pre_process, simulation, post_process)
- Any error messages or logs
Regenerating Golden Files
If you intentionally changed physics/numerics, regenerate reference files:
# Regenerate for specific tests
./mfc.sh test --generate --only <UUID>
# Regenerate for a range of tests
./mfc.sh test --generate --from <UUID1> --to <UUID2>
Understanding Test Failures
"Golden file mismatch" means numerical results differ from reference values.
Possible causes:
- Intentional changes: Regenerate golden files with --generate
- Compiler differences: Different compilers produce slightly different results
- Precision settings: Single vs double precision
- Platform differences: CPU architecture, GPU differences
Performance Profiling
NVIDIA GPU Profiling
Nsight Compute (kernel-level analysis):
./mfc.sh run case.py --ncu
# Or with additional ncu flags:
./mfc.sh run case.py --ncu --set full
Nsight Systems (timeline/system analysis):
./mfc.sh run case.py --nsys
# Or with additional nsys flags:
./mfc.sh run case.py --nsys -o profile_output
AMD GPU Profiling
rocprof-compute (kernel analysis):
./mfc.sh run case.py --rcu
rocprof (system profiling):
./mfc.sh run case.py --rsys
Performance vs Correctness
If results seem wrong, first verify correctness:
- Run ./mfc.sh validate case.py
- Compare against a known working case
- Try --debug build for better error detection
If results are correct but slow:
- Use profiling tools above to identify bottlenecks
- Try --case-optimization for production runs (10x speedup)
- Check GPU utilization with nvidia-smi or rocm-smi
Common Build Errors
"CMake could not find MPI"
Cause: MPI is not installed or not in PATH.
Fix:
- Ubuntu/Debian: sudo apt install libopenmpi-dev openmpi-bin
- macOS (Homebrew): brew install open-mpi
- HPC systems: module load openmpi or use ./mfc.sh load
Verify: mpirun --version
"CMake could not find a Fortran compiler"
Cause: No Fortran compiler installed or not in PATH.
Fix:
- Ubuntu/Debian: sudo apt install gfortran
- macOS (Homebrew): brew install gcc
- HPC systems: module load gcc or module load nvhpc
"Fypp preprocessing failed"
Cause: Syntax error in .fpp files or missing Fypp.
Fix:
- Ensure Fypp is installed: pip install fypp
- Check the specific error line in the output
- Run ./mfc.sh format to check for formatting issues
Build fails with GPU/OpenACC errors
Cause: GPU compiler not properly configured.
Fix:
- Verify GPU compiler is available:
- NVIDIA: nvfortran --version
- AMD: ftn --version or amdflang --version
- Load appropriate modules: module load nvhpc or module load rocm
- Set CUDA compute capability if needed: export MFC_CUDA_CC=80
- Try building without GPU first: ./mfc.sh build --no-gpu
"Module not found" on HPC
Cause: Required modules not loaded.
Fix:
- Use MFC's module loader if available for your system:
source ./mfc.sh load -c <cluster> -m <mode>
- Or manually load modules:
module load gcc openmpi cmake python
Common Runtime Errors
"Case parameter constraint violations"
Cause: Invalid combination of simulation parameters.
Fix:
- Run ./mfc.sh validate case.py for detailed diagnostics
- Check the specific constraint mentioned
- Review similar examples in examples/
Common issues:
- Grid dimensions (m, n, p) not matching dimensionality
- weno_order too high for grid size
- Boundary conditions not matching domain setup
"NaN detected"
Cause: Numerical instability.
Fix:
- Reduce time step (dt)
- Check initial conditions for unphysical values
- Verify boundary conditions are appropriate
- Try a more diffusive scheme initially
- Check CFL condition
GPU not detected
Cause: GPU drivers or runtime not configured.
Fix:
- Verify GPU is visible:
- NVIDIA: nvidia-smi
- AMD: rocm-smi
- Check compiler supports GPU offloading
- Ensure correct modules are loaded
- Set OMP_TARGET_OFFLOAD=MANDATORY to force GPU (fails fast if unavailable)
MPI errors or hangs
Cause: MPI configuration issues or deadlocks.
Fix:
- Verify MPI works: mpirun -n 2 hostname
- Try with fewer ranks: ./mfc.sh run case.py -n 1
- Check for incompatible MPI/compiler combinations
- On clusters, ensure you're using the correct MPI for the interconnect
Getting Help
If you can't resolve an issue:
- Search existing issues: GitHub Issues
- Check documentation: MFC Documentation
- Open a new issue with:
- Your OS and compiler versions
- The exact command you ran
- Complete error output (use -vv for builds)
- Your case file (if applicable)
- Output of ./mfc.sh validate case.py
- Use the CLI help:
./mfc.sh help debugging
./mfc.sh help gpu
./mfc.sh <command> -h