MFC
Exascale flow solver
Loading...
Searching...
No Matches
m_mpi_common.fpp.f90
Go to the documentation of this file.
1# 1 "/home/runner/work/MFC/MFC/src/common/m_mpi_common.fpp"
2!>
3!! @file
4!! @brief Contains module m_mpi_common
5
6# 1 "/home/runner/work/MFC/MFC/src/common/include/case.fpp" 1
7! This file exists so that Fypp can be run without generating case.fpp files for
8! each target. This is useful when generating documentation, for example. This
9! should also let MFC be built with CMake directly, without invoking mfc.sh.
10
11! For pre-process.
12# 9 "/home/runner/work/MFC/MFC/src/common/include/case.fpp"
13
14! For moving immersed boundaries in simulation
15# 14 "/home/runner/work/MFC/MFC/src/common/include/case.fpp"
16# 6 "/home/runner/work/MFC/MFC/src/common/m_mpi_common.fpp" 2
17# 1 "/home/runner/work/MFC/MFC/src/common/include/macros.fpp" 1
18# 1 "/home/runner/work/MFC/MFC/src/common/include/parallel_macros.fpp" 1
19# 1 "/home/runner/work/MFC/MFC/src/common/include/shared_parallel_macros.fpp" 1
20# 2 "/home/runner/work/MFC/MFC/src/common/include/shared_parallel_macros.fpp"
21# 3 "/home/runner/work/MFC/MFC/src/common/include/shared_parallel_macros.fpp"
22# 4 "/home/runner/work/MFC/MFC/src/common/include/shared_parallel_macros.fpp"
23# 5 "/home/runner/work/MFC/MFC/src/common/include/shared_parallel_macros.fpp"
24# 6 "/home/runner/work/MFC/MFC/src/common/include/shared_parallel_macros.fpp"
25
26# 8 "/home/runner/work/MFC/MFC/src/common/include/shared_parallel_macros.fpp"
27# 9 "/home/runner/work/MFC/MFC/src/common/include/shared_parallel_macros.fpp"
28# 10 "/home/runner/work/MFC/MFC/src/common/include/shared_parallel_macros.fpp"
29
30# 17 "/home/runner/work/MFC/MFC/src/common/include/shared_parallel_macros.fpp"
31
32# 46 "/home/runner/work/MFC/MFC/src/common/include/shared_parallel_macros.fpp"
33
34# 58 "/home/runner/work/MFC/MFC/src/common/include/shared_parallel_macros.fpp"
35
36# 68 "/home/runner/work/MFC/MFC/src/common/include/shared_parallel_macros.fpp"
37
38# 98 "/home/runner/work/MFC/MFC/src/common/include/shared_parallel_macros.fpp"
39
40# 110 "/home/runner/work/MFC/MFC/src/common/include/shared_parallel_macros.fpp"
41
42# 120 "/home/runner/work/MFC/MFC/src/common/include/shared_parallel_macros.fpp"
43! New line at end of file is required for FYPP
44# 2 "/home/runner/work/MFC/MFC/src/common/include/parallel_macros.fpp" 2
45# 1 "/home/runner/work/MFC/MFC/src/common/include/omp_macros.fpp" 1
46# 1 "/home/runner/work/MFC/MFC/src/common/include/shared_parallel_macros.fpp" 1
47# 2 "/home/runner/work/MFC/MFC/src/common/include/shared_parallel_macros.fpp"
48# 3 "/home/runner/work/MFC/MFC/src/common/include/shared_parallel_macros.fpp"
49# 4 "/home/runner/work/MFC/MFC/src/common/include/shared_parallel_macros.fpp"
50# 5 "/home/runner/work/MFC/MFC/src/common/include/shared_parallel_macros.fpp"
51# 6 "/home/runner/work/MFC/MFC/src/common/include/shared_parallel_macros.fpp"
52
53# 8 "/home/runner/work/MFC/MFC/src/common/include/shared_parallel_macros.fpp"
54# 9 "/home/runner/work/MFC/MFC/src/common/include/shared_parallel_macros.fpp"
55# 10 "/home/runner/work/MFC/MFC/src/common/include/shared_parallel_macros.fpp"
56
57# 17 "/home/runner/work/MFC/MFC/src/common/include/shared_parallel_macros.fpp"
58
59# 46 "/home/runner/work/MFC/MFC/src/common/include/shared_parallel_macros.fpp"
60
61# 58 "/home/runner/work/MFC/MFC/src/common/include/shared_parallel_macros.fpp"
62
63# 68 "/home/runner/work/MFC/MFC/src/common/include/shared_parallel_macros.fpp"
64
65# 98 "/home/runner/work/MFC/MFC/src/common/include/shared_parallel_macros.fpp"
66
67# 110 "/home/runner/work/MFC/MFC/src/common/include/shared_parallel_macros.fpp"
68
69# 120 "/home/runner/work/MFC/MFC/src/common/include/shared_parallel_macros.fpp"
70! New line at end of file is required for FYPP
71# 2 "/home/runner/work/MFC/MFC/src/common/include/omp_macros.fpp" 2
72
73# 4 "/home/runner/work/MFC/MFC/src/common/include/omp_macros.fpp"
74# 5 "/home/runner/work/MFC/MFC/src/common/include/omp_macros.fpp"
75# 6 "/home/runner/work/MFC/MFC/src/common/include/omp_macros.fpp"
76# 7 "/home/runner/work/MFC/MFC/src/common/include/omp_macros.fpp"
77# 8 "/home/runner/work/MFC/MFC/src/common/include/omp_macros.fpp"
78
79# 20 "/home/runner/work/MFC/MFC/src/common/include/omp_macros.fpp"
80
81# 43 "/home/runner/work/MFC/MFC/src/common/include/omp_macros.fpp"
82
83# 48 "/home/runner/work/MFC/MFC/src/common/include/omp_macros.fpp"
84
85# 53 "/home/runner/work/MFC/MFC/src/common/include/omp_macros.fpp"
86
87# 58 "/home/runner/work/MFC/MFC/src/common/include/omp_macros.fpp"
88
89# 63 "/home/runner/work/MFC/MFC/src/common/include/omp_macros.fpp"
90
91# 68 "/home/runner/work/MFC/MFC/src/common/include/omp_macros.fpp"
92
93# 76 "/home/runner/work/MFC/MFC/src/common/include/omp_macros.fpp"
94
95# 81 "/home/runner/work/MFC/MFC/src/common/include/omp_macros.fpp"
96
97# 86 "/home/runner/work/MFC/MFC/src/common/include/omp_macros.fpp"
98
99# 91 "/home/runner/work/MFC/MFC/src/common/include/omp_macros.fpp"
100
101# 96 "/home/runner/work/MFC/MFC/src/common/include/omp_macros.fpp"
102
103# 101 "/home/runner/work/MFC/MFC/src/common/include/omp_macros.fpp"
104
105# 106 "/home/runner/work/MFC/MFC/src/common/include/omp_macros.fpp"
106
107# 111 "/home/runner/work/MFC/MFC/src/common/include/omp_macros.fpp"
108
109# 116 "/home/runner/work/MFC/MFC/src/common/include/omp_macros.fpp"
110
111# 121 "/home/runner/work/MFC/MFC/src/common/include/omp_macros.fpp"
112
113# 151 "/home/runner/work/MFC/MFC/src/common/include/omp_macros.fpp"
114
115# 192 "/home/runner/work/MFC/MFC/src/common/include/omp_macros.fpp"
116
117# 206 "/home/runner/work/MFC/MFC/src/common/include/omp_macros.fpp"
118
119# 231 "/home/runner/work/MFC/MFC/src/common/include/omp_macros.fpp"
120
121# 242 "/home/runner/work/MFC/MFC/src/common/include/omp_macros.fpp"
122
123# 244 "/home/runner/work/MFC/MFC/src/common/include/omp_macros.fpp"
124# 255 "/home/runner/work/MFC/MFC/src/common/include/omp_macros.fpp"
125
126# 284 "/home/runner/work/MFC/MFC/src/common/include/omp_macros.fpp"
127
128# 294 "/home/runner/work/MFC/MFC/src/common/include/omp_macros.fpp"
129
130# 304 "/home/runner/work/MFC/MFC/src/common/include/omp_macros.fpp"
131
132# 313 "/home/runner/work/MFC/MFC/src/common/include/omp_macros.fpp"
133
134# 330 "/home/runner/work/MFC/MFC/src/common/include/omp_macros.fpp"
135
136# 340 "/home/runner/work/MFC/MFC/src/common/include/omp_macros.fpp"
137
138# 347 "/home/runner/work/MFC/MFC/src/common/include/omp_macros.fpp"
139
140# 353 "/home/runner/work/MFC/MFC/src/common/include/omp_macros.fpp"
141
142# 359 "/home/runner/work/MFC/MFC/src/common/include/omp_macros.fpp"
143
144# 365 "/home/runner/work/MFC/MFC/src/common/include/omp_macros.fpp"
145
146# 371 "/home/runner/work/MFC/MFC/src/common/include/omp_macros.fpp"
147
148# 377 "/home/runner/work/MFC/MFC/src/common/include/omp_macros.fpp"
149! New line at end of file is required for FYPP
150# 3 "/home/runner/work/MFC/MFC/src/common/include/parallel_macros.fpp" 2
151# 1 "/home/runner/work/MFC/MFC/src/common/include/acc_macros.fpp" 1
152# 1 "/home/runner/work/MFC/MFC/src/common/include/shared_parallel_macros.fpp" 1
153# 2 "/home/runner/work/MFC/MFC/src/common/include/shared_parallel_macros.fpp"
154# 3 "/home/runner/work/MFC/MFC/src/common/include/shared_parallel_macros.fpp"
155# 4 "/home/runner/work/MFC/MFC/src/common/include/shared_parallel_macros.fpp"
156# 5 "/home/runner/work/MFC/MFC/src/common/include/shared_parallel_macros.fpp"
157# 6 "/home/runner/work/MFC/MFC/src/common/include/shared_parallel_macros.fpp"
158
159# 8 "/home/runner/work/MFC/MFC/src/common/include/shared_parallel_macros.fpp"
160# 9 "/home/runner/work/MFC/MFC/src/common/include/shared_parallel_macros.fpp"
161# 10 "/home/runner/work/MFC/MFC/src/common/include/shared_parallel_macros.fpp"
162
163# 17 "/home/runner/work/MFC/MFC/src/common/include/shared_parallel_macros.fpp"
164
165# 46 "/home/runner/work/MFC/MFC/src/common/include/shared_parallel_macros.fpp"
166
167# 58 "/home/runner/work/MFC/MFC/src/common/include/shared_parallel_macros.fpp"
168
169# 68 "/home/runner/work/MFC/MFC/src/common/include/shared_parallel_macros.fpp"
170
171# 98 "/home/runner/work/MFC/MFC/src/common/include/shared_parallel_macros.fpp"
172
173# 110 "/home/runner/work/MFC/MFC/src/common/include/shared_parallel_macros.fpp"
174
175# 120 "/home/runner/work/MFC/MFC/src/common/include/shared_parallel_macros.fpp"
176! New line at end of file is required for FYPP
177# 2 "/home/runner/work/MFC/MFC/src/common/include/acc_macros.fpp" 2
178
179# 7 "/home/runner/work/MFC/MFC/src/common/include/acc_macros.fpp"
180
181# 17 "/home/runner/work/MFC/MFC/src/common/include/acc_macros.fpp"
182
183# 22 "/home/runner/work/MFC/MFC/src/common/include/acc_macros.fpp"
184
185# 27 "/home/runner/work/MFC/MFC/src/common/include/acc_macros.fpp"
186
187# 32 "/home/runner/work/MFC/MFC/src/common/include/acc_macros.fpp"
188
189# 37 "/home/runner/work/MFC/MFC/src/common/include/acc_macros.fpp"
190
191# 42 "/home/runner/work/MFC/MFC/src/common/include/acc_macros.fpp"
192
193# 47 "/home/runner/work/MFC/MFC/src/common/include/acc_macros.fpp"
194
195# 52 "/home/runner/work/MFC/MFC/src/common/include/acc_macros.fpp"
196
197# 57 "/home/runner/work/MFC/MFC/src/common/include/acc_macros.fpp"
198
199# 62 "/home/runner/work/MFC/MFC/src/common/include/acc_macros.fpp"
200
201# 73 "/home/runner/work/MFC/MFC/src/common/include/acc_macros.fpp"
202
203# 78 "/home/runner/work/MFC/MFC/src/common/include/acc_macros.fpp"
204
205# 83 "/home/runner/work/MFC/MFC/src/common/include/acc_macros.fpp"
206
207# 88 "/home/runner/work/MFC/MFC/src/common/include/acc_macros.fpp"
208
209# 103 "/home/runner/work/MFC/MFC/src/common/include/acc_macros.fpp"
210
211# 131 "/home/runner/work/MFC/MFC/src/common/include/acc_macros.fpp"
212
213# 160 "/home/runner/work/MFC/MFC/src/common/include/acc_macros.fpp"
214
215# 175 "/home/runner/work/MFC/MFC/src/common/include/acc_macros.fpp"
216
217# 193 "/home/runner/work/MFC/MFC/src/common/include/acc_macros.fpp"
218
219# 215 "/home/runner/work/MFC/MFC/src/common/include/acc_macros.fpp"
220
221# 244 "/home/runner/work/MFC/MFC/src/common/include/acc_macros.fpp"
222
223# 259 "/home/runner/work/MFC/MFC/src/common/include/acc_macros.fpp"
224
225# 269 "/home/runner/work/MFC/MFC/src/common/include/acc_macros.fpp"
226
227# 278 "/home/runner/work/MFC/MFC/src/common/include/acc_macros.fpp"
228
229# 294 "/home/runner/work/MFC/MFC/src/common/include/acc_macros.fpp"
230
231# 304 "/home/runner/work/MFC/MFC/src/common/include/acc_macros.fpp"
232
233# 311 "/home/runner/work/MFC/MFC/src/common/include/acc_macros.fpp"
234! New line at end of file is required for FYPP
235# 4 "/home/runner/work/MFC/MFC/src/common/include/parallel_macros.fpp" 2
236
237! GPU parallel region (scalar reductions, maxval/minval)
238# 23 "/home/runner/work/MFC/MFC/src/common/include/parallel_macros.fpp"
239
240! GPU parallel loop over threads (most common GPU macro)
241# 43 "/home/runner/work/MFC/MFC/src/common/include/parallel_macros.fpp"
242
243! Required closing for GPU_PARALLEL_LOOP
244# 55 "/home/runner/work/MFC/MFC/src/common/include/parallel_macros.fpp"
245
246! Mark routine for device compilation
247# 112 "/home/runner/work/MFC/MFC/src/common/include/parallel_macros.fpp"
248
249! Declare device-resident data
250# 130 "/home/runner/work/MFC/MFC/src/common/include/parallel_macros.fpp"
251
252! Inner loop within a GPU parallel region
253# 145 "/home/runner/work/MFC/MFC/src/common/include/parallel_macros.fpp"
254
255! Scoped GPU data region
256# 164 "/home/runner/work/MFC/MFC/src/common/include/parallel_macros.fpp"
257
258! Host code with device pointers (for MPI with GPU buffers)
259# 193 "/home/runner/work/MFC/MFC/src/common/include/parallel_macros.fpp"
260
261! Allocate device memory (unscoped)
262# 207 "/home/runner/work/MFC/MFC/src/common/include/parallel_macros.fpp"
263
264! Free device memory
265# 219 "/home/runner/work/MFC/MFC/src/common/include/parallel_macros.fpp"
266
267! Atomic operation on device
268# 231 "/home/runner/work/MFC/MFC/src/common/include/parallel_macros.fpp"
269
270! End atomic capture block
271# 242 "/home/runner/work/MFC/MFC/src/common/include/parallel_macros.fpp"
272
273! Copy data between host and device
274# 254 "/home/runner/work/MFC/MFC/src/common/include/parallel_macros.fpp"
275
276! Synchronization barrier
277# 266 "/home/runner/work/MFC/MFC/src/common/include/parallel_macros.fpp"
278
279! Import GPU library module (openacc or omp_lib)
280# 275 "/home/runner/work/MFC/MFC/src/common/include/parallel_macros.fpp"
281
282! Emit code only for AMD compiler
283# 282 "/home/runner/work/MFC/MFC/src/common/include/parallel_macros.fpp"
284
285! Emit code for non-Cray compilers
286# 289 "/home/runner/work/MFC/MFC/src/common/include/parallel_macros.fpp"
287
288! Emit code only for Cray compiler
289# 296 "/home/runner/work/MFC/MFC/src/common/include/parallel_macros.fpp"
290
291! Emit code for non-NVIDIA compilers
292# 303 "/home/runner/work/MFC/MFC/src/common/include/parallel_macros.fpp"
293
294# 305 "/home/runner/work/MFC/MFC/src/common/include/parallel_macros.fpp"
295# 306 "/home/runner/work/MFC/MFC/src/common/include/parallel_macros.fpp"
296! New line at end of file is required for FYPP
297# 2 "/home/runner/work/MFC/MFC/src/common/include/macros.fpp" 2
298
299# 14 "/home/runner/work/MFC/MFC/src/common/include/macros.fpp"
300
301! Caution: This macro requires the use of a binding script to set CUDA_VISIBLE_DEVICES, such that we have one GPU device per MPI
302! rank. That's because for both cudaMemAdvise (preferred location) and cudaMemPrefetchAsync we use location = device_id = 0. For an
303! example see misc/nvidia_uvm/bind.sh. NVIDIA unified memory page placement hint
304# 57 "/home/runner/work/MFC/MFC/src/common/include/macros.fpp"
305
306! Allocate and create GPU device memory
307# 77 "/home/runner/work/MFC/MFC/src/common/include/macros.fpp"
308
309! Free GPU device memory and deallocate
310# 85 "/home/runner/work/MFC/MFC/src/common/include/macros.fpp"
311
312! Cray-specific GPU pointer setup for vector fields
313# 109 "/home/runner/work/MFC/MFC/src/common/include/macros.fpp"
314
315! Cray-specific GPU pointer setup for scalar fields
316# 125 "/home/runner/work/MFC/MFC/src/common/include/macros.fpp"
317
318! Cray-specific GPU pointer setup for acoustic source spatials
319# 150 "/home/runner/work/MFC/MFC/src/common/include/macros.fpp"
320
321# 156 "/home/runner/work/MFC/MFC/src/common/include/macros.fpp"
322
323# 163 "/home/runner/work/MFC/MFC/src/common/include/macros.fpp"
324! New line at end of file is required for FYPP
325# 7 "/home/runner/work/MFC/MFC/src/common/m_mpi_common.fpp" 2
326
327!> @brief MPI communication layer: domain decomposition, halo exchange, reductions, and parallel I/O setup
329
330#ifdef MFC_MPI
331 use mpi !< message passing interface (mpi) module
332#endif
333
336 use m_helper
337 use ieee_arithmetic
338 use m_nvtx
339
340 implicit none
341
342 integer, private :: v_size
343
344# 24 "/home/runner/work/MFC/MFC/src/common/m_mpi_common.fpp"
345#if defined(MFC_OpenACC)
346# 24 "/home/runner/work/MFC/MFC/src/common/m_mpi_common.fpp"
347!$acc declare create(v_size)
348# 24 "/home/runner/work/MFC/MFC/src/common/m_mpi_common.fpp"
349#elif defined(MFC_OpenMP)
350# 24 "/home/runner/work/MFC/MFC/src/common/m_mpi_common.fpp"
351!$omp declare target (v_size)
352# 24 "/home/runner/work/MFC/MFC/src/common/m_mpi_common.fpp"
353#endif
354
355 real(wp), private, allocatable, dimension(:) :: buff_send !< Primitive variable send buffer for halo exchange
356 real(wp), private, allocatable, dimension(:) :: buff_recv !< Primitive variable receive buffer for halo exchange
357#ifndef __NVCOMPILER_GPU_UNIFIED_MEM
358
359# 29 "/home/runner/work/MFC/MFC/src/common/m_mpi_common.fpp"
360#if defined(MFC_OpenACC)
361# 29 "/home/runner/work/MFC/MFC/src/common/m_mpi_common.fpp"
362!$acc declare create(buff_send, buff_recv)
363# 29 "/home/runner/work/MFC/MFC/src/common/m_mpi_common.fpp"
364#elif defined(MFC_OpenMP)
365# 29 "/home/runner/work/MFC/MFC/src/common/m_mpi_common.fpp"
366!$omp declare target (buff_send, buff_recv)
367# 29 "/home/runner/work/MFC/MFC/src/common/m_mpi_common.fpp"
368#endif
369#endif
370
371 integer(kind=8) :: halo_size
372
373# 33 "/home/runner/work/MFC/MFC/src/common/m_mpi_common.fpp"
374#if defined(MFC_OpenACC)
375# 33 "/home/runner/work/MFC/MFC/src/common/m_mpi_common.fpp"
376!$acc declare create(halo_size)
377# 33 "/home/runner/work/MFC/MFC/src/common/m_mpi_common.fpp"
378#elif defined(MFC_OpenMP)
379# 33 "/home/runner/work/MFC/MFC/src/common/m_mpi_common.fpp"
380!$omp declare target (halo_size)
381# 33 "/home/runner/work/MFC/MFC/src/common/m_mpi_common.fpp"
382#endif
383
384contains
385
386 !> Initialize the module.
388
389#ifdef MFC_MPI
390 ! Allocating buff_send/recv and. Please note that for the sake of simplicity, both variables are provided sufficient storage
391 ! to hold the largest buffer in the computational domain.
392
393 if (qbmm .and. .not. polytropic) then
394 v_size = sys_size + 2*nb*nnode
395 else
397 end if
398
399 if (n > 0) then
400 if (p > 0) then
401 halo_size = nint(-1._wp + 1._wp*buff_size*(v_size)*(m + 2*buff_size + 1)*(n + 2*buff_size + 1)*(p + 2*buff_size &
402 & + 1)/(cells_bounds%mnp_min + 2*buff_size + 1))
403 else
404 halo_size = -1 + buff_size*(v_size)*(cells_bounds%mn_max + 2*buff_size + 1)
405 end if
406 else
407 halo_size = -1 + buff_size*(v_size)
408 end if
409
410
411# 61 "/home/runner/work/MFC/MFC/src/common/m_mpi_common.fpp"
412#if defined(MFC_OpenACC)
413# 61 "/home/runner/work/MFC/MFC/src/common/m_mpi_common.fpp"
414!$acc update device(halo_size, v_size)
415# 61 "/home/runner/work/MFC/MFC/src/common/m_mpi_common.fpp"
416#elif defined(MFC_OpenMP)
417# 61 "/home/runner/work/MFC/MFC/src/common/m_mpi_common.fpp"
418!$omp target update to(halo_size, v_size)
419# 61 "/home/runner/work/MFC/MFC/src/common/m_mpi_common.fpp"
420#endif
421
422#ifndef __NVCOMPILER_GPU_UNIFIED_MEM
423#ifdef MFC_DEBUG
424# 64 "/home/runner/work/MFC/MFC/src/common/m_mpi_common.fpp"
425 block
426# 64 "/home/runner/work/MFC/MFC/src/common/m_mpi_common.fpp"
427 use iso_fortran_env, only: output_unit
428# 64 "/home/runner/work/MFC/MFC/src/common/m_mpi_common.fpp"
429
430# 64 "/home/runner/work/MFC/MFC/src/common/m_mpi_common.fpp"
431 print *, 'm_mpi_common.fpp:64: ', '@:ALLOCATE(buff_send(0:halo_size), buff_recv(0:halo_size))'
432# 64 "/home/runner/work/MFC/MFC/src/common/m_mpi_common.fpp"
433
434# 64 "/home/runner/work/MFC/MFC/src/common/m_mpi_common.fpp"
435 call flush (output_unit)
436# 64 "/home/runner/work/MFC/MFC/src/common/m_mpi_common.fpp"
437 end block
438# 64 "/home/runner/work/MFC/MFC/src/common/m_mpi_common.fpp"
439#endif
440# 64 "/home/runner/work/MFC/MFC/src/common/m_mpi_common.fpp"
441 allocate (buff_send(0:halo_size), buff_recv(0:halo_size))
442# 64 "/home/runner/work/MFC/MFC/src/common/m_mpi_common.fpp"
443
444# 64 "/home/runner/work/MFC/MFC/src/common/m_mpi_common.fpp"
445
446# 64 "/home/runner/work/MFC/MFC/src/common/m_mpi_common.fpp"
447
448# 64 "/home/runner/work/MFC/MFC/src/common/m_mpi_common.fpp"
449#if defined(MFC_OpenACC)
450# 64 "/home/runner/work/MFC/MFC/src/common/m_mpi_common.fpp"
451!$acc enter data create(buff_send, buff_recv)
452# 64 "/home/runner/work/MFC/MFC/src/common/m_mpi_common.fpp"
453#elif defined(MFC_OpenMP)
454# 64 "/home/runner/work/MFC/MFC/src/common/m_mpi_common.fpp"
455!$omp target enter data map(always,alloc:buff_send, buff_recv)
456# 64 "/home/runner/work/MFC/MFC/src/common/m_mpi_common.fpp"
457#endif
458#else
459 allocate (buff_send(0:halo_size), buff_recv(0:halo_size))
460
461# 67 "/home/runner/work/MFC/MFC/src/common/m_mpi_common.fpp"
462#if defined(MFC_OpenACC)
463# 67 "/home/runner/work/MFC/MFC/src/common/m_mpi_common.fpp"
464!$acc enter data create(capture:buff_send)
465# 67 "/home/runner/work/MFC/MFC/src/common/m_mpi_common.fpp"
466#elif defined(MFC_OpenMP)
467# 67 "/home/runner/work/MFC/MFC/src/common/m_mpi_common.fpp"
468!$omp target enter data map(always,alloc:capture:buff_send)
469# 67 "/home/runner/work/MFC/MFC/src/common/m_mpi_common.fpp"
470#endif
471
472# 68 "/home/runner/work/MFC/MFC/src/common/m_mpi_common.fpp"
473#if defined(MFC_OpenACC)
474# 68 "/home/runner/work/MFC/MFC/src/common/m_mpi_common.fpp"
475!$acc enter data create(capture:buff_recv)
476# 68 "/home/runner/work/MFC/MFC/src/common/m_mpi_common.fpp"
477#elif defined(MFC_OpenMP)
478# 68 "/home/runner/work/MFC/MFC/src/common/m_mpi_common.fpp"
479!$omp target enter data map(always,alloc:capture:buff_recv)
480# 68 "/home/runner/work/MFC/MFC/src/common/m_mpi_common.fpp"
481#endif
482#endif
483#endif
484
485 end subroutine s_initialize_mpi_common_module
486
487 !> Initialize the MPI execution environment and query the number of processors and local rank.
488 impure subroutine s_mpi_initialize
489
490#ifdef MFC_MPI
491 integer :: ierr !< Generic flag used to identify and report MPI errors
492
493 call mpi_init(ierr)
494
495 if (ierr /= mpi_success) then
496 print '(A)', 'Unable to initialize MPI environment. Exiting.'
497 call mpi_abort(mpi_comm_world, 1, ierr)
498 end if
499
500 call mpi_comm_size(mpi_comm_world, num_procs, ierr)
501
502 call mpi_comm_rank(mpi_comm_world, proc_rank, ierr)
503#else
504 num_procs = 1
505 proc_rank = 0
506#endif
507
508 end subroutine s_mpi_initialize
509
510 !> Set up MPI I/O data views and variable pointers for parallel file output.
511 impure subroutine s_initialize_mpi_data(q_cons_vf, ib_markers, beta)
512
513 type(scalar_field), dimension(sys_size), intent(in) :: q_cons_vf
514 type(integer_field), optional, intent(in) :: ib_markers
515 type(scalar_field), intent(in), optional :: beta
516 integer, dimension(num_dims) :: sizes_glb, sizes_loc
517 integer, dimension(1) :: airfoil_glb, airfoil_loc, airfoil_start
518
519#ifdef MFC_MPI
520 integer :: i, j
521 integer :: ierr !< Generic flag used to identify and report MPI errors
522 integer :: alt_sys
523
524 if (present(beta)) then
525 alt_sys = sys_size + 1
526 else
527 alt_sys = sys_size
528 end if
529
530 do i = 1, sys_size
531 mpi_io_data%var(i)%sf => q_cons_vf(i)%sf(0:m,0:n,0:p)
532 end do
533
534 if (present(beta)) then
535 mpi_io_data%var(alt_sys)%sf => beta%sf(0:m,0:n,0:p)
536 end if
537
538 ! Additional variables pb and mv for non-polytropic qbmm
539 if (qbmm .and. .not. polytropic) then
540 do i = 1, nb
541 do j = 1, nnode
542#ifdef MFC_PRE_PROCESS
543 mpi_io_data%var(sys_size + (i - 1)*nnode + j)%sf => pb%sf(0:m,0:n,0:p,j, i)
544 mpi_io_data%var(sys_size + (i - 1)*nnode + j + nb*nnode)%sf => mv%sf(0:m,0:n,0:p,j, i)
545#elif defined (MFC_SIMULATION)
546 mpi_io_data%var(sys_size + (i - 1)*nnode + j)%sf => pb_ts(1)%sf(0:m,0:n,0:p,j, i)
547 mpi_io_data%var(sys_size + (i - 1)*nnode + j + nb*nnode)%sf => mv_ts(1)%sf(0:m,0:n,0:p,j, i)
548#endif
549 end do
550 end do
551 end if
552
553 ! Define global(g) and local(l) sizes for flow variables
554 sizes_glb(1) = m_glb + 1; sizes_loc(1) = m + 1
555 if (n > 0) then
556 sizes_glb(2) = n_glb + 1; sizes_loc(2) = n + 1
557 if (p > 0) then
558 sizes_glb(num_dims) = p_glb + 1; sizes_loc(num_dims) = p + 1
559 end if
560 end if
561
562 ! Define the view for each variable
563 do i = 1, alt_sys
564 call mpi_type_create_subarray(num_dims, sizes_glb, sizes_loc, start_idx, mpi_order_fortran, mpi_p, &
565 & mpi_io_data%view(i), ierr)
566 call mpi_type_commit(mpi_io_data%view(i), ierr)
567 end do
568
569#ifndef MFC_POST_PROCESS
570 if (qbmm .and. .not. polytropic) then
571 do i = sys_size + 1, sys_size + 2*nb*nnode
572 call mpi_type_create_subarray(num_dims, sizes_glb, sizes_loc, start_idx, mpi_order_fortran, mpi_p, &
573 & mpi_io_data%view(i), ierr)
574 call mpi_type_commit(mpi_io_data%view(i), ierr)
575 end do
576 end if
577#endif
578
579#ifndef MFC_PRE_PROCESS
580 if (present(ib_markers)) then
581 mpi_io_ib_data%var%sf => ib_markers%sf(0:m,0:n,0:p)
582
583 call mpi_type_create_subarray(num_dims, sizes_glb, sizes_loc, start_idx, mpi_order_fortran, mpi_integer, &
584 & mpi_io_ib_data%view, ierr)
585 call mpi_type_commit(mpi_io_ib_data%view, ierr)
586 end if
587#endif
588#endif
589
590 end subroutine s_initialize_mpi_data
591
592 !> Set up MPI I/O data views for downsampled (coarsened) parallel file output.
593 subroutine s_initialize_mpi_data_ds(q_cons_vf)
594
595 type(scalar_field), dimension(sys_size), intent(in) :: q_cons_vf
596 integer, dimension(num_dims) :: sizes_glb, sizes_loc
597 integer, dimension(3) :: sf_start_idx
598
599#ifdef MFC_MPI
600 integer :: i, j, q, k, l, m_ds, n_ds, p_ds, ierr
601
602 sf_start_idx = (/0, 0, 0/)
603
604#ifndef MFC_POST_PROCESS
605 m_ds = int((m + 1)/3) - 1
606 n_ds = int((n + 1)/3) - 1
607 p_ds = int((p + 1)/3) - 1
608#else
609 m_ds = m
610 n_ds = n
611 p_ds = p
612#endif
613
614#ifdef MFC_POST_PROCESS
615 do i = 1, sys_size
616 mpi_io_data%var(i)%sf => q_cons_vf(i)%sf(-1:m_ds + 1,-1:n_ds + 1,-1:p_ds + 1)
617 end do
618#endif
619 ! Define global(g) and local(l) sizes for flow variables
620 sizes_loc(1) = m_ds + 3
621 if (n > 0) then
622 sizes_loc(2) = n_ds + 3
623 if (p > 0) then
624 sizes_loc(num_dims) = p_ds + 3
625 end if
626 end if
627
628 ! Define the view for each variable
629 do i = 1, sys_size
630 call mpi_type_create_subarray(num_dims, sizes_loc, sizes_loc, sf_start_idx, mpi_order_fortran, mpi_p, &
631 & mpi_io_data%view(i), ierr)
632 call mpi_type_commit(mpi_io_data%view(i), ierr)
633 end do
634#endif
635
636 end subroutine s_initialize_mpi_data_ds
637
638 !> Gather variable-length real vectors from all MPI ranks onto the root process.
639 impure subroutine s_mpi_gather_data(my_vector, counts, gathered_vector, root)
640
641 integer, intent(in) :: counts !< Array of vector lengths for each process
642 real(wp), intent(in), dimension(counts) :: my_vector !< Input vector on each process
643 integer, intent(in) :: root !< Rank of the root process
644 real(wp), allocatable, intent(out) :: gathered_vector(:) !< Gathered vector on the root process
645 integer :: i
646 integer :: ierr !< Generic flag used to identify and report MPI errors
647 integer, allocatable :: recounts(:), displs(:)
648
649#ifdef MFC_MPI
650 allocate (recounts(num_procs))
651
652 call mpi_gather(counts, 1, mpi_integer, recounts, 1, mpi_integer, root, mpi_comm_world, ierr)
653
654 allocate (displs(size(recounts)))
655
656 displs(1) = 0
657
658 do i = 2, size(recounts)
659 displs(i) = displs(i - 1) + recounts(i - 1)
660 end do
661
662 allocate (gathered_vector(sum(recounts)))
663 call mpi_gatherv(my_vector, counts, mpi_p, gathered_vector, recounts, displs, mpi_p, root, mpi_comm_world, ierr)
664#endif
665
666 end subroutine s_mpi_gather_data
667
668 !> Gather per-rank time step wall-clock times onto rank 0 for performance reporting.
669 impure subroutine mpi_bcast_time_step_values(proc_time, time_avg)
670
671 real(wp), dimension(0:num_procs - 1), intent(inout) :: proc_time
672 real(wp), intent(inout) :: time_avg
673
674#ifdef MFC_MPI
675 integer :: ierr !< Generic flag used to identify and report MPI errors
676
677 call mpi_gather(time_avg, 1, mpi_p, proc_time(0), 1, mpi_p, 0, mpi_comm_world, ierr)
678#endif
679
680 end subroutine mpi_bcast_time_step_values
681
682 !> Print a case file error with the prohibited condition and message, then abort execution.
683 impure subroutine s_prohibit_abort(condition, message)
684
685 character(len=*), intent(in) :: condition, message
686
687 print *, ""
688 print *, "CASE FILE ERROR"
689 print *, " - Prohibited condition: ", trim(condition)
690 if (len_trim(message) > 0) then
691 print *, " - Note: ", trim(message)
692 end if
693 print *, ""
694 call s_mpi_abort(code=case_file_error_code)
695
696 end subroutine s_prohibit_abort
697
698 !> The goal of this subroutine is to determine the global extrema of the stability criteria in the computational domain. This is
699 !! performed by sifting through the local extrema of each stability criterion. Note that each of the local extrema is from a
700 !! single process, within its assigned section of the computational domain. Finally, note that the global extrema values are
701 !! only bookkeept on the rank 0 processor.
702 impure subroutine s_mpi_reduce_stability_criteria_extrema(icfl_max_loc, vcfl_max_loc, Rc_min_loc, icfl_max_glb, vcfl_max_glb, &
703
704 & Rc_min_glb)
705
706 real(wp), intent(in) :: icfl_max_loc
707 real(wp), intent(in) :: vcfl_max_loc
708 real(wp), intent(in) :: rc_min_loc
709 real(wp), intent(out) :: icfl_max_glb
710 real(wp), intent(out) :: vcfl_max_glb
711 real(wp), intent(out) :: rc_min_glb
712
713#ifdef MFC_SIMULATION
714#ifdef MFC_MPI
715 integer :: ierr !< Generic flag used to identify and report MPI errors
716
717 call mpi_reduce(icfl_max_loc, icfl_max_glb, 1, mpi_p, mpi_max, 0, mpi_comm_world, ierr)
718
719 if (viscous) then
720 call mpi_reduce(vcfl_max_loc, vcfl_max_glb, 1, mpi_p, mpi_max, 0, mpi_comm_world, ierr)
721 call mpi_reduce(rc_min_loc, rc_min_glb, 1, mpi_p, mpi_min, 0, mpi_comm_world, ierr)
722 end if
723#else
724 icfl_max_glb = icfl_max_loc
725
726 if (viscous) then
727 vcfl_max_glb = vcfl_max_loc
728 rc_min_glb = rc_min_loc
729 end if
730#endif
731#endif
732
734
735 !> Reduce a local real value to its global sum across all MPI ranks.
736 impure subroutine s_mpi_allreduce_sum(var_loc, var_glb)
737
738 real(wp), intent(in) :: var_loc
739 real(wp), intent(out) :: var_glb
740
741#ifdef MFC_MPI
742 integer :: ierr !< Generic flag used to identify and report MPI errors
743
744 call mpi_allreduce(var_loc, var_glb, 1, mpi_p, mpi_sum, mpi_comm_world, ierr)
745#endif
746
747 end subroutine s_mpi_allreduce_sum
748
749 !> Reduce an array of vectors to their global sums across all MPI ranks.
750 impure subroutine s_mpi_allreduce_vectors_sum(var_loc, var_glb, num_vectors, vector_length)
751
752 integer, intent(in) :: num_vectors, vector_length
753 real(wp), dimension(:,:), intent(in) :: var_loc
754 real(wp), dimension(:,:), intent(out) :: var_glb
755
756#ifdef MFC_MPI
757 integer :: ierr !< Generic flag used to identify and report MPI errors
758
759 if (loc(var_loc) == loc(var_glb)) then
760 call mpi_allreduce(mpi_in_place, var_glb, num_vectors*vector_length, mpi_p, mpi_sum, mpi_comm_world, ierr)
761 else
762 call mpi_allreduce(var_loc, var_glb, num_vectors*vector_length, mpi_p, mpi_sum, mpi_comm_world, ierr)
763 end if
764#else
765 var_glb(1:num_vectors,1:vector_length) = var_loc(1:num_vectors,1:vector_length)
766#endif
767
768 end subroutine s_mpi_allreduce_vectors_sum
769
770 !> Reduce a local integer value to its global sum across all MPI ranks.
771 impure subroutine s_mpi_allreduce_integer_sum(var_loc, var_glb)
772
773 integer, intent(in) :: var_loc
774 integer, intent(out) :: var_glb
775
776#ifdef MFC_MPI
777 integer :: ierr !< Generic flag used to identify and report MPI errors
778
779 call mpi_allreduce(var_loc, var_glb, 1, mpi_integer, mpi_sum, mpi_comm_world, ierr)
780#else
781 var_glb = var_loc
782#endif
783
784 end subroutine s_mpi_allreduce_integer_sum
785
786 !> Reduce a local real value to its global minimum across all MPI ranks.
787 impure subroutine s_mpi_allreduce_min(var_loc, var_glb)
788
789 real(wp), intent(in) :: var_loc
790 real(wp), intent(out) :: var_glb
791
792#ifdef MFC_MPI
793 integer :: ierr !< Generic flag used to identify and report MPI errors
794
795 call mpi_allreduce(var_loc, var_glb, 1, mpi_p, mpi_min, mpi_comm_world, ierr)
796#endif
797
798 end subroutine s_mpi_allreduce_min
799
800 !> Reduce a local real value to its global maximum across all MPI ranks.
801 impure subroutine s_mpi_allreduce_max(var_loc, var_glb)
802
803 real(wp), intent(in) :: var_loc
804 real(wp), intent(out) :: var_glb
805
806#ifdef MFC_MPI
807 integer :: ierr !< Generic flag used to identify and report MPI errors
808
809 call mpi_allreduce(var_loc, var_glb, 1, mpi_p, mpi_max, mpi_comm_world, ierr)
810#endif
811
812 end subroutine s_mpi_allreduce_max
813
814 !> Reduce a local real value to its global minimum across all ranks
815 impure subroutine s_mpi_reduce_min(var_loc)
816
817 real(wp), intent(inout) :: var_loc
818
819#ifdef MFC_MPI
820 integer :: ierr !< Generic flag used to identify and report MPI errors
821 real(wp) :: var_glb
822
823 call mpi_reduce(var_loc, var_glb, 1, mpi_p, mpi_min, 0, mpi_comm_world, ierr)
824
825 call mpi_bcast(var_glb, 1, mpi_p, 0, mpi_comm_world, ierr)
826
827 var_loc = var_glb
828#endif
829
830 end subroutine s_mpi_reduce_min
831
832 !> Reduce a 2-element variable to its global maximum value with the owning processor rank (MPI_MAXLOC).
833 !> Reduce a local value to its global maximum with location (rank) across all ranks
834 impure subroutine s_mpi_reduce_maxloc(var_loc)
835
836 real(wp), dimension(2), intent(inout) :: var_loc
837
838#ifdef MFC_MPI
839 integer :: ierr !< Generic flag used to identify and report MPI errors
840 real(wp), dimension(2) :: var_glb !< Reduced (max value, rank) pair
841 call mpi_reduce(var_loc, var_glb, 1, mpi_2p, mpi_maxloc, 0, mpi_comm_world, ierr)
842
843 call mpi_bcast(var_glb, 1, mpi_2p, 0, mpi_comm_world, ierr)
844
845 var_loc = var_glb
846#endif
847
848 end subroutine s_mpi_reduce_maxloc
849
850 !> The subroutine terminates the MPI execution environment.
851 impure subroutine s_mpi_abort(prnt, code)
852
853 character(len=*), intent(in), optional :: prnt
854 integer, intent(in), optional :: code
855
856#ifdef MFC_MPI
857 integer :: ierr !< Generic flag used to identify and report MPI errors
858#endif
859
860 if (present(prnt)) then
861 print *, prnt
862 call flush (6)
863 end if
864
865#ifndef MFC_MPI
866 if (present(code)) then
867 stop code
868 else
869 stop 1
870 end if
871#else
872 if (present(code)) then
873 call mpi_abort(mpi_comm_world, code, ierr)
874 else
875 call mpi_abort(mpi_comm_world, 1, ierr)
876 end if
877#endif
878
879 end subroutine s_mpi_abort
880
881 !> Halts all processes until all have reached barrier.
882 impure subroutine s_mpi_barrier
883
884#ifdef MFC_MPI
885 integer :: ierr !< Generic flag used to identify and report MPI errors
886
887 call mpi_barrier(mpi_comm_world, ierr)
888#endif
889
890 end subroutine s_mpi_barrier
891
892 !> The subroutine finalizes the MPI execution environment.
893 impure subroutine s_mpi_finalize
894
895#ifdef MFC_MPI
896 integer :: ierr !< Generic flag used to identify and report MPI errors
897
898 call mpi_finalize(ierr)
899#endif
900
901 end subroutine s_mpi_finalize
902
903 !> The goal of this procedure is to populate the buffers of the cell-average conservative variables by communicating with the
904 !! neighboring processors.
905 subroutine s_mpi_sendrecv_variables_buffers(q_comm, mpi_dir, pbc_loc, nVar, pb_in, mv_in)
906
907 type(scalar_field), dimension(1:), intent(inout) :: q_comm
908 real(stp), optional, dimension(idwbuff(1)%beg:,idwbuff(2)%beg:,idwbuff(3)%beg:,1:,1:), intent(inout) :: pb_in, mv_in
909 integer, intent(in) :: mpi_dir, pbc_loc, nVar
910 integer :: i, j, k, l, r, q !< Generic loop iterators
911 integer :: buffer_counts(1:3), buffer_count
912 type(int_bounds_info) :: boundary_conditions(1:3)
913 integer :: beg_end(1:2), grid_dims(1:3)
914 integer :: dst_proc, src_proc, recv_tag, send_tag
915 logical :: beg_end_geq_0, qbmm_comm
916 integer :: pack_offset, unpack_offset
917
918#ifdef MFC_MPI
919 integer :: ierr !< Generic flag used to identify and report MPI errors
920
921 call nvtxstartrange("RHS-COMM-PACKBUF")
922
923 qbmm_comm = .false.
924
925 if (present(pb_in) .and. present(mv_in) .and. qbmm .and. .not. polytropic) then
926 qbmm_comm = .true.
927 v_size = nvar + 2*nb*nnode
928 buffer_counts = (/buff_size*v_size*(n + 1)*(p + 1), buff_size*v_size*(m + 2*buff_size + 1)*(p + 1), &
929 & buff_size*v_size*(m + 2*buff_size + 1)*(n + 2*buff_size + 1)/)
930 else
931 v_size = nvar
932 buffer_counts = (/buff_size*v_size*(n + 1)*(p + 1), buff_size*v_size*(m + 2*buff_size + 1)*(p + 1), &
933 & buff_size*v_size*(m + 2*buff_size + 1)*(n + 2*buff_size + 1)/)
934 end if
935
936
937# 523 "/home/runner/work/MFC/MFC/src/common/m_mpi_common.fpp"
938#if defined(MFC_OpenACC)
939# 523 "/home/runner/work/MFC/MFC/src/common/m_mpi_common.fpp"
940!$acc update device(v_size)
941# 523 "/home/runner/work/MFC/MFC/src/common/m_mpi_common.fpp"
942#elif defined(MFC_OpenMP)
943# 523 "/home/runner/work/MFC/MFC/src/common/m_mpi_common.fpp"
944!$omp target update to(v_size)
945# 523 "/home/runner/work/MFC/MFC/src/common/m_mpi_common.fpp"
946#endif
947
948 buffer_count = buffer_counts(mpi_dir)
949 boundary_conditions = (/bc_x, bc_y, bc_z/)
950 beg_end = (/boundary_conditions(mpi_dir)%beg, boundary_conditions(mpi_dir)%end/)
951 beg_end_geq_0 = beg_end(max(pbc_loc, 0) - pbc_loc + 1) >= 0
952
953 ! Implements: pbc_loc bc_x >= 0 -> [send/recv]_tag [dst/src]_proc -1 (=0) 0 -> [1,0] [0,0] | 0 0 [1,0] [beg,beg] -1 (=0) 1
954 ! -> [0,0] [1,0] | 0 1 [0,0] [end,beg] +1 (=1) 0 -> [0,1] [1,1] | 1 0 [0,1] [end,end] +1 (=1) 1 -> [1,1] [0,1] | 1 1 [1,1]
955 ! [beg,end]
956
957 send_tag = f_logical_to_int(.not. f_xor(beg_end_geq_0, pbc_loc == 1))
958 recv_tag = f_logical_to_int(pbc_loc == 1)
959
960 dst_proc = beg_end(1 + f_logical_to_int(f_xor(pbc_loc == 1, beg_end_geq_0)))
961 src_proc = beg_end(1 + f_logical_to_int(pbc_loc == 1))
962
963 grid_dims = (/m, n, p/)
964
965 pack_offset = 0
966 if (f_xor(pbc_loc == 1, beg_end_geq_0)) then
967 pack_offset = grid_dims(mpi_dir) - buff_size + 1
968 end if
969
970 unpack_offset = 0
971 if (pbc_loc == 1) then
972 unpack_offset = grid_dims(mpi_dir) + buff_size + 1
973 end if
974
975 ! Pack Buffer to Send
976# 554 "/home/runner/work/MFC/MFC/src/common/m_mpi_common.fpp"
977 if (mpi_dir == 1) then
978# 556 "/home/runner/work/MFC/MFC/src/common/m_mpi_common.fpp"
979
980# 556 "/home/runner/work/MFC/MFC/src/common/m_mpi_common.fpp"
981
982# 556 "/home/runner/work/MFC/MFC/src/common/m_mpi_common.fpp"
983#if defined(MFC_OpenACC)
984# 556 "/home/runner/work/MFC/MFC/src/common/m_mpi_common.fpp"
985!$acc parallel loop collapse(4) gang vector default(present) private(r)
986# 556 "/home/runner/work/MFC/MFC/src/common/m_mpi_common.fpp"
987#elif defined(MFC_OpenMP)
988# 556 "/home/runner/work/MFC/MFC/src/common/m_mpi_common.fpp"
989
990# 556 "/home/runner/work/MFC/MFC/src/common/m_mpi_common.fpp"
991
992# 556 "/home/runner/work/MFC/MFC/src/common/m_mpi_common.fpp"
993
994# 556 "/home/runner/work/MFC/MFC/src/common/m_mpi_common.fpp"
995!$omp target teams loop defaultmap(firstprivate:scalar) bind(teams,parallel) collapse(4) defaultmap(tofrom:aggregate) defaultmap(tofrom:allocatable) defaultmap(tofrom:pointer) private(r)
996# 556 "/home/runner/work/MFC/MFC/src/common/m_mpi_common.fpp"
997#endif
998 do l = 0, p
999 do k = 0, n
1000 do j = 0, buff_size - 1
1001 do i = 1, nvar
1002 r = (i - 1) + v_size*(j + buff_size*(k + (n + 1)*l))
1003 buff_send(r) = real(q_comm(i)%sf(j + pack_offset, k, l), kind=wp)
1004 end do
1005 end do
1006 end do
1007 end do
1008
1009# 567 "/home/runner/work/MFC/MFC/src/common/m_mpi_common.fpp"
1010#if defined(MFC_OpenACC)
1011# 567 "/home/runner/work/MFC/MFC/src/common/m_mpi_common.fpp"
1012!$acc end parallel loop
1013# 567 "/home/runner/work/MFC/MFC/src/common/m_mpi_common.fpp"
1014#elif defined(MFC_OpenMP)
1015# 567 "/home/runner/work/MFC/MFC/src/common/m_mpi_common.fpp"
1016
1017# 567 "/home/runner/work/MFC/MFC/src/common/m_mpi_common.fpp"
1018!$omp end target teams loop
1019# 567 "/home/runner/work/MFC/MFC/src/common/m_mpi_common.fpp"
1020#endif
1021
1022 if (qbmm_comm) then
1023
1024# 570 "/home/runner/work/MFC/MFC/src/common/m_mpi_common.fpp"
1025
1026# 570 "/home/runner/work/MFC/MFC/src/common/m_mpi_common.fpp"
1027#if defined(MFC_OpenACC)
1028# 570 "/home/runner/work/MFC/MFC/src/common/m_mpi_common.fpp"
1029!$acc parallel loop collapse(4) gang vector default(present) private(r)
1030# 570 "/home/runner/work/MFC/MFC/src/common/m_mpi_common.fpp"
1031#elif defined(MFC_OpenMP)
1032# 570 "/home/runner/work/MFC/MFC/src/common/m_mpi_common.fpp"
1033
1034# 570 "/home/runner/work/MFC/MFC/src/common/m_mpi_common.fpp"
1035
1036# 570 "/home/runner/work/MFC/MFC/src/common/m_mpi_common.fpp"
1037
1038# 570 "/home/runner/work/MFC/MFC/src/common/m_mpi_common.fpp"
1039!$omp target teams loop defaultmap(firstprivate:scalar) bind(teams,parallel) collapse(4) defaultmap(tofrom:aggregate) defaultmap(tofrom:allocatable) defaultmap(tofrom:pointer) private(r)
1040# 570 "/home/runner/work/MFC/MFC/src/common/m_mpi_common.fpp"
1041#endif
1042 do l = 0, p
1043 do k = 0, n
1044 do j = 0, buff_size - 1
1045 do i = nvar + 1, nvar + nnode
1046 do q = 1, nb
1047 r = (i - 1) + (q - 1)*nnode + v_size*(j + buff_size*(k + (n + 1)*l))
1048 buff_send(r) = real(pb_in(j + pack_offset, k, l, i - nvar, q), kind=wp)
1049 end do
1050 end do
1051 end do
1052 end do
1053 end do
1054
1055# 583 "/home/runner/work/MFC/MFC/src/common/m_mpi_common.fpp"
1056#if defined(MFC_OpenACC)
1057# 583 "/home/runner/work/MFC/MFC/src/common/m_mpi_common.fpp"
1058!$acc end parallel loop
1059# 583 "/home/runner/work/MFC/MFC/src/common/m_mpi_common.fpp"
1060#elif defined(MFC_OpenMP)
1061# 583 "/home/runner/work/MFC/MFC/src/common/m_mpi_common.fpp"
1062
1063# 583 "/home/runner/work/MFC/MFC/src/common/m_mpi_common.fpp"
1064!$omp end target teams loop
1065# 583 "/home/runner/work/MFC/MFC/src/common/m_mpi_common.fpp"
1066#endif
1067
1068
1069# 585 "/home/runner/work/MFC/MFC/src/common/m_mpi_common.fpp"
1070
1071# 585 "/home/runner/work/MFC/MFC/src/common/m_mpi_common.fpp"
1072#if defined(MFC_OpenACC)
1073# 585 "/home/runner/work/MFC/MFC/src/common/m_mpi_common.fpp"
1074!$acc parallel loop collapse(5) gang vector default(present) private(r)
1075# 585 "/home/runner/work/MFC/MFC/src/common/m_mpi_common.fpp"
1076#elif defined(MFC_OpenMP)
1077# 585 "/home/runner/work/MFC/MFC/src/common/m_mpi_common.fpp"
1078
1079# 585 "/home/runner/work/MFC/MFC/src/common/m_mpi_common.fpp"
1080
1081# 585 "/home/runner/work/MFC/MFC/src/common/m_mpi_common.fpp"
1082
1083# 585 "/home/runner/work/MFC/MFC/src/common/m_mpi_common.fpp"
1084!$omp target teams loop defaultmap(firstprivate:scalar) bind(teams,parallel) collapse(5) defaultmap(tofrom:aggregate) defaultmap(tofrom:allocatable) defaultmap(tofrom:pointer) private(r)
1085# 585 "/home/runner/work/MFC/MFC/src/common/m_mpi_common.fpp"
1086#endif
1087 do l = 0, p
1088 do k = 0, n
1089 do j = 0, buff_size - 1
1090 do i = nvar + 1, nvar + nnode
1091 do q = 1, nb
1092 r = (i - 1) + (q - 1)*nnode + nb*nnode + v_size*(j + buff_size*(k + (n + 1)*l))
1093 buff_send(r) = real(mv_in(j + pack_offset, k, l, i - nvar, q), kind=wp)
1094 end do
1095 end do
1096 end do
1097 end do
1098 end do
1099
1100# 598 "/home/runner/work/MFC/MFC/src/common/m_mpi_common.fpp"
1101#if defined(MFC_OpenACC)
1102# 598 "/home/runner/work/MFC/MFC/src/common/m_mpi_common.fpp"
1103!$acc end parallel loop
1104# 598 "/home/runner/work/MFC/MFC/src/common/m_mpi_common.fpp"
1105#elif defined(MFC_OpenMP)
1106# 598 "/home/runner/work/MFC/MFC/src/common/m_mpi_common.fpp"
1107
1108# 598 "/home/runner/work/MFC/MFC/src/common/m_mpi_common.fpp"
1109!$omp end target teams loop
1110# 598 "/home/runner/work/MFC/MFC/src/common/m_mpi_common.fpp"
1111#endif
1112 end if
1113# 696 "/home/runner/work/MFC/MFC/src/common/m_mpi_common.fpp"
1114 end if
1115# 554 "/home/runner/work/MFC/MFC/src/common/m_mpi_common.fpp"
1116 if (mpi_dir == 2) then
1117# 601 "/home/runner/work/MFC/MFC/src/common/m_mpi_common.fpp"
1118
1119# 601 "/home/runner/work/MFC/MFC/src/common/m_mpi_common.fpp"
1120
1121# 601 "/home/runner/work/MFC/MFC/src/common/m_mpi_common.fpp"
1122#if defined(MFC_OpenACC)
1123# 601 "/home/runner/work/MFC/MFC/src/common/m_mpi_common.fpp"
1124!$acc parallel loop collapse(4) gang vector default(present) private(r)
1125# 601 "/home/runner/work/MFC/MFC/src/common/m_mpi_common.fpp"
1126#elif defined(MFC_OpenMP)
1127# 601 "/home/runner/work/MFC/MFC/src/common/m_mpi_common.fpp"
1128
1129# 601 "/home/runner/work/MFC/MFC/src/common/m_mpi_common.fpp"
1130
1131# 601 "/home/runner/work/MFC/MFC/src/common/m_mpi_common.fpp"
1132
1133# 601 "/home/runner/work/MFC/MFC/src/common/m_mpi_common.fpp"
1134!$omp target teams loop defaultmap(firstprivate:scalar) bind(teams,parallel) collapse(4) defaultmap(tofrom:aggregate) defaultmap(tofrom:allocatable) defaultmap(tofrom:pointer) private(r)
1135# 601 "/home/runner/work/MFC/MFC/src/common/m_mpi_common.fpp"
1136#endif
1137 do i = 1, nvar
1138 do l = 0, p
1139 do k = 0, buff_size - 1
1140 do j = -buff_size, m + buff_size
1141 r = (i - 1) + v_size*((j + buff_size) + (m + 2*buff_size + 1)*(k + buff_size*l))
1142 buff_send(r) = real(q_comm(i)%sf(j, k + pack_offset, l), kind=wp)
1143 end do
1144 end do
1145 end do
1146 end do
1147
1148# 612 "/home/runner/work/MFC/MFC/src/common/m_mpi_common.fpp"
1149#if defined(MFC_OpenACC)
1150# 612 "/home/runner/work/MFC/MFC/src/common/m_mpi_common.fpp"
1151!$acc end parallel loop
1152# 612 "/home/runner/work/MFC/MFC/src/common/m_mpi_common.fpp"
1153#elif defined(MFC_OpenMP)
1154# 612 "/home/runner/work/MFC/MFC/src/common/m_mpi_common.fpp"
1155
1156# 612 "/home/runner/work/MFC/MFC/src/common/m_mpi_common.fpp"
1157!$omp end target teams loop
1158# 612 "/home/runner/work/MFC/MFC/src/common/m_mpi_common.fpp"
1159#endif
1160
1161 if (qbmm_comm) then
1162
1163# 615 "/home/runner/work/MFC/MFC/src/common/m_mpi_common.fpp"
1164
1165# 615 "/home/runner/work/MFC/MFC/src/common/m_mpi_common.fpp"
1166#if defined(MFC_OpenACC)
1167# 615 "/home/runner/work/MFC/MFC/src/common/m_mpi_common.fpp"
1168!$acc parallel loop collapse(5) gang vector default(present) private(r)
1169# 615 "/home/runner/work/MFC/MFC/src/common/m_mpi_common.fpp"
1170#elif defined(MFC_OpenMP)
1171# 615 "/home/runner/work/MFC/MFC/src/common/m_mpi_common.fpp"
1172
1173# 615 "/home/runner/work/MFC/MFC/src/common/m_mpi_common.fpp"
1174
1175# 615 "/home/runner/work/MFC/MFC/src/common/m_mpi_common.fpp"
1176
1177# 615 "/home/runner/work/MFC/MFC/src/common/m_mpi_common.fpp"
1178!$omp target teams loop defaultmap(firstprivate:scalar) bind(teams,parallel) collapse(5) defaultmap(tofrom:aggregate) defaultmap(tofrom:allocatable) defaultmap(tofrom:pointer) private(r)
1179# 615 "/home/runner/work/MFC/MFC/src/common/m_mpi_common.fpp"
1180#endif
1181 do i = nvar + 1, nvar + nnode
1182 do l = 0, p
1183 do k = 0, buff_size - 1
1184 do j = -buff_size, m + buff_size
1185 do q = 1, nb
1186 r = (i - 1) + (q - 1)*nnode + v_size*((j + buff_size) + (m + 2*buff_size + 1)*(k &
1187 & + buff_size*l))
1188 buff_send(r) = real(pb_in(j, k + pack_offset, l, i - nvar, q), kind=wp)
1189 end do
1190 end do
1191 end do
1192 end do
1193 end do
1194
1195# 629 "/home/runner/work/MFC/MFC/src/common/m_mpi_common.fpp"
1196#if defined(MFC_OpenACC)
1197# 629 "/home/runner/work/MFC/MFC/src/common/m_mpi_common.fpp"
1198!$acc end parallel loop
1199# 629 "/home/runner/work/MFC/MFC/src/common/m_mpi_common.fpp"
1200#elif defined(MFC_OpenMP)
1201# 629 "/home/runner/work/MFC/MFC/src/common/m_mpi_common.fpp"
1202
1203# 629 "/home/runner/work/MFC/MFC/src/common/m_mpi_common.fpp"
1204!$omp end target teams loop
1205# 629 "/home/runner/work/MFC/MFC/src/common/m_mpi_common.fpp"
1206#endif
1207
1208
1209# 631 "/home/runner/work/MFC/MFC/src/common/m_mpi_common.fpp"
1210
1211# 631 "/home/runner/work/MFC/MFC/src/common/m_mpi_common.fpp"
1212#if defined(MFC_OpenACC)
1213# 631 "/home/runner/work/MFC/MFC/src/common/m_mpi_common.fpp"
1214!$acc parallel loop collapse(5) gang vector default(present) private(r)
1215# 631 "/home/runner/work/MFC/MFC/src/common/m_mpi_common.fpp"
1216#elif defined(MFC_OpenMP)
1217# 631 "/home/runner/work/MFC/MFC/src/common/m_mpi_common.fpp"
1218
1219# 631 "/home/runner/work/MFC/MFC/src/common/m_mpi_common.fpp"
1220
1221# 631 "/home/runner/work/MFC/MFC/src/common/m_mpi_common.fpp"
1222
1223# 631 "/home/runner/work/MFC/MFC/src/common/m_mpi_common.fpp"
1224!$omp target teams loop defaultmap(firstprivate:scalar) bind(teams,parallel) collapse(5) defaultmap(tofrom:aggregate) defaultmap(tofrom:allocatable) defaultmap(tofrom:pointer) private(r)
1225# 631 "/home/runner/work/MFC/MFC/src/common/m_mpi_common.fpp"
1226#endif
1227 do i = nvar + 1, nvar + nnode
1228 do l = 0, p
1229 do k = 0, buff_size - 1
1230 do j = -buff_size, m + buff_size
1231 do q = 1, nb
1232 r = (i - 1) + (q - 1)*nnode + nb*nnode + v_size*((j + buff_size) + (m + 2*buff_size &
1233 & + 1)*(k + buff_size*l))
1234 buff_send(r) = real(mv_in(j, k + pack_offset, l, i - nvar, q), kind=wp)
1235 end do
1236 end do
1237 end do
1238 end do
1239 end do
1240
1241# 645 "/home/runner/work/MFC/MFC/src/common/m_mpi_common.fpp"
1242#if defined(MFC_OpenACC)
1243# 645 "/home/runner/work/MFC/MFC/src/common/m_mpi_common.fpp"
1244!$acc end parallel loop
1245# 645 "/home/runner/work/MFC/MFC/src/common/m_mpi_common.fpp"
1246#elif defined(MFC_OpenMP)
1247# 645 "/home/runner/work/MFC/MFC/src/common/m_mpi_common.fpp"
1248
1249# 645 "/home/runner/work/MFC/MFC/src/common/m_mpi_common.fpp"
1250!$omp end target teams loop
1251# 645 "/home/runner/work/MFC/MFC/src/common/m_mpi_common.fpp"
1252#endif
1253 end if
1254# 696 "/home/runner/work/MFC/MFC/src/common/m_mpi_common.fpp"
1255 end if
1256# 554 "/home/runner/work/MFC/MFC/src/common/m_mpi_common.fpp"
1257 if (mpi_dir == 3) then
1258# 648 "/home/runner/work/MFC/MFC/src/common/m_mpi_common.fpp"
1259
1260# 648 "/home/runner/work/MFC/MFC/src/common/m_mpi_common.fpp"
1261
1262# 648 "/home/runner/work/MFC/MFC/src/common/m_mpi_common.fpp"
1263#if defined(MFC_OpenACC)
1264# 648 "/home/runner/work/MFC/MFC/src/common/m_mpi_common.fpp"
1265!$acc parallel loop collapse(4) gang vector default(present) private(r)
1266# 648 "/home/runner/work/MFC/MFC/src/common/m_mpi_common.fpp"
1267#elif defined(MFC_OpenMP)
1268# 648 "/home/runner/work/MFC/MFC/src/common/m_mpi_common.fpp"
1269
1270# 648 "/home/runner/work/MFC/MFC/src/common/m_mpi_common.fpp"
1271
1272# 648 "/home/runner/work/MFC/MFC/src/common/m_mpi_common.fpp"
1273
1274# 648 "/home/runner/work/MFC/MFC/src/common/m_mpi_common.fpp"
1275!$omp target teams loop defaultmap(firstprivate:scalar) bind(teams,parallel) collapse(4) defaultmap(tofrom:aggregate) defaultmap(tofrom:allocatable) defaultmap(tofrom:pointer) private(r)
1276# 648 "/home/runner/work/MFC/MFC/src/common/m_mpi_common.fpp"
1277#endif
1278 do i = 1, nvar
1279 do l = 0, buff_size - 1
1280 do k = -buff_size, n + buff_size
1281 do j = -buff_size, m + buff_size
1282 r = (i - 1) + v_size*((j + buff_size) + (m + 2*buff_size + 1)*((k + buff_size) + (n &
1283 & + 2*buff_size + 1)*l))
1284 buff_send(r) = real(q_comm(i)%sf(j, k, l + pack_offset), kind=wp)
1285 end do
1286 end do
1287 end do
1288 end do
1289
1290# 660 "/home/runner/work/MFC/MFC/src/common/m_mpi_common.fpp"
1291#if defined(MFC_OpenACC)
1292# 660 "/home/runner/work/MFC/MFC/src/common/m_mpi_common.fpp"
1293!$acc end parallel loop
1294# 660 "/home/runner/work/MFC/MFC/src/common/m_mpi_common.fpp"
1295#elif defined(MFC_OpenMP)
1296# 660 "/home/runner/work/MFC/MFC/src/common/m_mpi_common.fpp"
1297
1298# 660 "/home/runner/work/MFC/MFC/src/common/m_mpi_common.fpp"
1299!$omp end target teams loop
1300# 660 "/home/runner/work/MFC/MFC/src/common/m_mpi_common.fpp"
1301#endif
1302
1303 if (qbmm_comm) then
1304
1305# 663 "/home/runner/work/MFC/MFC/src/common/m_mpi_common.fpp"
1306
1307# 663 "/home/runner/work/MFC/MFC/src/common/m_mpi_common.fpp"
1308#if defined(MFC_OpenACC)
1309# 663 "/home/runner/work/MFC/MFC/src/common/m_mpi_common.fpp"
1310!$acc parallel loop collapse(5) gang vector default(present) private(r)
1311# 663 "/home/runner/work/MFC/MFC/src/common/m_mpi_common.fpp"
1312#elif defined(MFC_OpenMP)
1313# 663 "/home/runner/work/MFC/MFC/src/common/m_mpi_common.fpp"
1314
1315# 663 "/home/runner/work/MFC/MFC/src/common/m_mpi_common.fpp"
1316
1317# 663 "/home/runner/work/MFC/MFC/src/common/m_mpi_common.fpp"
1318
1319# 663 "/home/runner/work/MFC/MFC/src/common/m_mpi_common.fpp"
1320!$omp target teams loop defaultmap(firstprivate:scalar) bind(teams,parallel) collapse(5) defaultmap(tofrom:aggregate) defaultmap(tofrom:allocatable) defaultmap(tofrom:pointer) private(r)
1321# 663 "/home/runner/work/MFC/MFC/src/common/m_mpi_common.fpp"
1322#endif
1323 do i = nvar + 1, nvar + nnode
1324 do l = 0, buff_size - 1
1325 do k = -buff_size, n + buff_size
1326 do j = -buff_size, m + buff_size
1327 do q = 1, nb
1328 r = (i - 1) + (q - 1)*nnode + v_size*((j + buff_size) + (m + 2*buff_size + 1)*((k &
1329 & + buff_size) + (n + 2*buff_size + 1)*l))
1330 buff_send(r) = real(pb_in(j, k, l + pack_offset, i - nvar, q), kind=wp)
1331 end do
1332 end do
1333 end do
1334 end do
1335 end do
1336
1337# 677 "/home/runner/work/MFC/MFC/src/common/m_mpi_common.fpp"
1338#if defined(MFC_OpenACC)
1339# 677 "/home/runner/work/MFC/MFC/src/common/m_mpi_common.fpp"
1340!$acc end parallel loop
1341# 677 "/home/runner/work/MFC/MFC/src/common/m_mpi_common.fpp"
1342#elif defined(MFC_OpenMP)
1343# 677 "/home/runner/work/MFC/MFC/src/common/m_mpi_common.fpp"
1344
1345# 677 "/home/runner/work/MFC/MFC/src/common/m_mpi_common.fpp"
1346!$omp end target teams loop
1347# 677 "/home/runner/work/MFC/MFC/src/common/m_mpi_common.fpp"
1348#endif
1349
1350
1351# 679 "/home/runner/work/MFC/MFC/src/common/m_mpi_common.fpp"
1352
1353# 679 "/home/runner/work/MFC/MFC/src/common/m_mpi_common.fpp"
1354#if defined(MFC_OpenACC)
1355# 679 "/home/runner/work/MFC/MFC/src/common/m_mpi_common.fpp"
1356!$acc parallel loop collapse(5) gang vector default(present) private(r)
1357# 679 "/home/runner/work/MFC/MFC/src/common/m_mpi_common.fpp"
1358#elif defined(MFC_OpenMP)
1359# 679 "/home/runner/work/MFC/MFC/src/common/m_mpi_common.fpp"
1360
1361# 679 "/home/runner/work/MFC/MFC/src/common/m_mpi_common.fpp"
1362
1363# 679 "/home/runner/work/MFC/MFC/src/common/m_mpi_common.fpp"
1364
1365# 679 "/home/runner/work/MFC/MFC/src/common/m_mpi_common.fpp"
1366!$omp target teams loop defaultmap(firstprivate:scalar) bind(teams,parallel) collapse(5) defaultmap(tofrom:aggregate) defaultmap(tofrom:allocatable) defaultmap(tofrom:pointer) private(r)
1367# 679 "/home/runner/work/MFC/MFC/src/common/m_mpi_common.fpp"
1368#endif
1369 do i = nvar + 1, nvar + nnode
1370 do l = 0, buff_size - 1
1371 do k = -buff_size, n + buff_size
1372 do j = -buff_size, m + buff_size
1373 do q = 1, nb
1374 r = (i - 1) + (q - 1)*nnode + nb*nnode + v_size*((j + buff_size) + (m + 2*buff_size &
1375 & + 1)*((k + buff_size) + (n + 2*buff_size + 1)*l))
1376 buff_send(r) = real(mv_in(j, k, l + pack_offset, i - nvar, q), kind=wp)
1377 end do
1378 end do
1379 end do
1380 end do
1381 end do
1382
1383# 693 "/home/runner/work/MFC/MFC/src/common/m_mpi_common.fpp"
1384#if defined(MFC_OpenACC)
1385# 693 "/home/runner/work/MFC/MFC/src/common/m_mpi_common.fpp"
1386!$acc end parallel loop
1387# 693 "/home/runner/work/MFC/MFC/src/common/m_mpi_common.fpp"
1388#elif defined(MFC_OpenMP)
1389# 693 "/home/runner/work/MFC/MFC/src/common/m_mpi_common.fpp"
1390
1391# 693 "/home/runner/work/MFC/MFC/src/common/m_mpi_common.fpp"
1392!$omp end target teams loop
1393# 693 "/home/runner/work/MFC/MFC/src/common/m_mpi_common.fpp"
1394#endif
1395 end if
1396# 696 "/home/runner/work/MFC/MFC/src/common/m_mpi_common.fpp"
1397 end if
1398# 698 "/home/runner/work/MFC/MFC/src/common/m_mpi_common.fpp"
1399 call nvtxendrange ! Packbuf
1400
1401 ! Send/Recv
1402#ifdef MFC_SIMULATION
1403# 703 "/home/runner/work/MFC/MFC/src/common/m_mpi_common.fpp"
1404 if (rdma_mpi .eqv. .false.) then
1405# 715 "/home/runner/work/MFC/MFC/src/common/m_mpi_common.fpp"
1406 call nvtxstartrange("RHS-COMM-DEV2HOST")
1407
1408# 716 "/home/runner/work/MFC/MFC/src/common/m_mpi_common.fpp"
1409#if defined(MFC_OpenACC)
1410# 716 "/home/runner/work/MFC/MFC/src/common/m_mpi_common.fpp"
1411!$acc update host(buff_send)
1412# 716 "/home/runner/work/MFC/MFC/src/common/m_mpi_common.fpp"
1413#elif defined(MFC_OpenMP)
1414# 716 "/home/runner/work/MFC/MFC/src/common/m_mpi_common.fpp"
1415!$omp target update from(buff_send)
1416# 716 "/home/runner/work/MFC/MFC/src/common/m_mpi_common.fpp"
1417#endif
1418 call nvtxendrange
1419 call nvtxstartrange("RHS-COMM-SENDRECV-NO-RMDA")
1420
1421 call mpi_sendrecv(buff_send, buffer_count, mpi_p, dst_proc, send_tag, buff_recv, buffer_count, mpi_p, &
1422 & src_proc, recv_tag, mpi_comm_world, mpi_status_ignore, ierr)
1423
1424 call nvtxendrange ! RHS-MPI-SENDRECV-(NO)-RDMA
1425
1426 call nvtxstartrange("RHS-COMM-HOST2DEV")
1427
1428# 726 "/home/runner/work/MFC/MFC/src/common/m_mpi_common.fpp"
1429#if defined(MFC_OpenACC)
1430# 726 "/home/runner/work/MFC/MFC/src/common/m_mpi_common.fpp"
1431!$acc update device(buff_recv)
1432# 726 "/home/runner/work/MFC/MFC/src/common/m_mpi_common.fpp"
1433#elif defined(MFC_OpenMP)
1434# 726 "/home/runner/work/MFC/MFC/src/common/m_mpi_common.fpp"
1435!$omp target update to(buff_recv)
1436# 726 "/home/runner/work/MFC/MFC/src/common/m_mpi_common.fpp"
1437#endif
1438 call nvtxendrange
1439# 729 "/home/runner/work/MFC/MFC/src/common/m_mpi_common.fpp"
1440 end if
1441# 703 "/home/runner/work/MFC/MFC/src/common/m_mpi_common.fpp"
1442 if (rdma_mpi .eqv. .true.) then
1443# 705 "/home/runner/work/MFC/MFC/src/common/m_mpi_common.fpp"
1444
1445# 705 "/home/runner/work/MFC/MFC/src/common/m_mpi_common.fpp"
1446#if defined(MFC_OpenACC)
1447# 705 "/home/runner/work/MFC/MFC/src/common/m_mpi_common.fpp"
1448!$acc host_data use_device(buff_send, buff_recv)
1449# 705 "/home/runner/work/MFC/MFC/src/common/m_mpi_common.fpp"
1450 call nvtxstartrange("RHS-COMM-SENDRECV-RDMA")
1451# 705 "/home/runner/work/MFC/MFC/src/common/m_mpi_common.fpp"
1452
1453# 705 "/home/runner/work/MFC/MFC/src/common/m_mpi_common.fpp"
1454 call mpi_sendrecv(buff_send, buffer_count, mpi_p, dst_proc, send_tag, buff_recv, buffer_count, mpi_p, &
1455# 705 "/home/runner/work/MFC/MFC/src/common/m_mpi_common.fpp"
1456 & src_proc, recv_tag, mpi_comm_world, mpi_status_ignore, ierr)
1457# 705 "/home/runner/work/MFC/MFC/src/common/m_mpi_common.fpp"
1458
1459# 705 "/home/runner/work/MFC/MFC/src/common/m_mpi_common.fpp"
1460 call nvtxendrange ! RHS-MPI-SENDRECV-(NO)-RDMA
1461# 705 "/home/runner/work/MFC/MFC/src/common/m_mpi_common.fpp"
1462!$acc end host_data
1463# 705 "/home/runner/work/MFC/MFC/src/common/m_mpi_common.fpp"
1464#elif defined(MFC_OpenMP)
1465# 705 "/home/runner/work/MFC/MFC/src/common/m_mpi_common.fpp"
1466!$omp target data use_device_addr(buff_send, buff_recv)
1467# 705 "/home/runner/work/MFC/MFC/src/common/m_mpi_common.fpp"
1468 call nvtxstartrange("RHS-COMM-SENDRECV-RDMA")
1469# 705 "/home/runner/work/MFC/MFC/src/common/m_mpi_common.fpp"
1470
1471# 705 "/home/runner/work/MFC/MFC/src/common/m_mpi_common.fpp"
1472 call mpi_sendrecv(buff_send, buffer_count, mpi_p, dst_proc, send_tag, buff_recv, buffer_count, mpi_p, &
1473# 705 "/home/runner/work/MFC/MFC/src/common/m_mpi_common.fpp"
1474 & src_proc, recv_tag, mpi_comm_world, mpi_status_ignore, ierr)
1475# 705 "/home/runner/work/MFC/MFC/src/common/m_mpi_common.fpp"
1476
1477# 705 "/home/runner/work/MFC/MFC/src/common/m_mpi_common.fpp"
1478 call nvtxendrange ! RHS-MPI-SENDRECV-(NO)-RDMA
1479# 705 "/home/runner/work/MFC/MFC/src/common/m_mpi_common.fpp"
1480!$omp end target data
1481# 705 "/home/runner/work/MFC/MFC/src/common/m_mpi_common.fpp"
1482#else
1483# 705 "/home/runner/work/MFC/MFC/src/common/m_mpi_common.fpp"
1484 call nvtxstartrange("RHS-COMM-SENDRECV-RDMA")
1485# 705 "/home/runner/work/MFC/MFC/src/common/m_mpi_common.fpp"
1486
1487# 705 "/home/runner/work/MFC/MFC/src/common/m_mpi_common.fpp"
1488 call mpi_sendrecv(buff_send, buffer_count, mpi_p, dst_proc, send_tag, buff_recv, buffer_count, mpi_p, &
1489# 705 "/home/runner/work/MFC/MFC/src/common/m_mpi_common.fpp"
1490 & src_proc, recv_tag, mpi_comm_world, mpi_status_ignore, ierr)
1491# 705 "/home/runner/work/MFC/MFC/src/common/m_mpi_common.fpp"
1492
1493# 705 "/home/runner/work/MFC/MFC/src/common/m_mpi_common.fpp"
1494 call nvtxendrange ! RHS-MPI-SENDRECV-(NO)-RDMA
1495# 705 "/home/runner/work/MFC/MFC/src/common/m_mpi_common.fpp"
1496#endif
1497# 713 "/home/runner/work/MFC/MFC/src/common/m_mpi_common.fpp"
1498
1499# 713 "/home/runner/work/MFC/MFC/src/common/m_mpi_common.fpp"
1500#if defined(MFC_OpenACC)
1501# 713 "/home/runner/work/MFC/MFC/src/common/m_mpi_common.fpp"
1502!$acc wait
1503# 713 "/home/runner/work/MFC/MFC/src/common/m_mpi_common.fpp"
1504#elif defined(MFC_OpenMP)
1505# 713 "/home/runner/work/MFC/MFC/src/common/m_mpi_common.fpp"
1506!$omp barrier
1507# 713 "/home/runner/work/MFC/MFC/src/common/m_mpi_common.fpp"
1508#endif
1509# 729 "/home/runner/work/MFC/MFC/src/common/m_mpi_common.fpp"
1510 end if
1511# 731 "/home/runner/work/MFC/MFC/src/common/m_mpi_common.fpp"
1512#else
1513 call mpi_sendrecv(buff_send, buffer_count, mpi_p, dst_proc, send_tag, buff_recv, buffer_count, mpi_p, src_proc, recv_tag, &
1514 & mpi_comm_world, mpi_status_ignore, ierr)
1515#endif
1516
1517 ! Unpack Received Buffer
1518 call nvtxstartrange("RHS-COMM-UNPACKBUF")
1519# 739 "/home/runner/work/MFC/MFC/src/common/m_mpi_common.fpp"
1520 if (mpi_dir == 1) then
1521# 741 "/home/runner/work/MFC/MFC/src/common/m_mpi_common.fpp"
1522
1523# 741 "/home/runner/work/MFC/MFC/src/common/m_mpi_common.fpp"
1524
1525# 741 "/home/runner/work/MFC/MFC/src/common/m_mpi_common.fpp"
1526#if defined(MFC_OpenACC)
1527# 741 "/home/runner/work/MFC/MFC/src/common/m_mpi_common.fpp"
1528!$acc parallel loop collapse(4) gang vector default(present) private(r)
1529# 741 "/home/runner/work/MFC/MFC/src/common/m_mpi_common.fpp"
1530#elif defined(MFC_OpenMP)
1531# 741 "/home/runner/work/MFC/MFC/src/common/m_mpi_common.fpp"
1532
1533# 741 "/home/runner/work/MFC/MFC/src/common/m_mpi_common.fpp"
1534
1535# 741 "/home/runner/work/MFC/MFC/src/common/m_mpi_common.fpp"
1536
1537# 741 "/home/runner/work/MFC/MFC/src/common/m_mpi_common.fpp"
1538!$omp target teams loop defaultmap(firstprivate:scalar) bind(teams,parallel) collapse(4) defaultmap(tofrom:aggregate) defaultmap(tofrom:allocatable) defaultmap(tofrom:pointer) private(r)
1539# 741 "/home/runner/work/MFC/MFC/src/common/m_mpi_common.fpp"
1540#endif
1541 do l = 0, p
1542 do k = 0, n
1543 do j = -buff_size, -1
1544 do i = 1, nvar
1545 r = (i - 1) + v_size*(j + buff_size*((k + 1) + (n + 1)*l))
1546 q_comm(i)%sf(j + unpack_offset, k, l) = real(buff_recv(r), kind=stp)
1547#if defined(__INTEL_COMPILER)
1548 if (ieee_is_nan(q_comm(i)%sf(j + unpack_offset, k, l))) then
1549 print *, "Error", j, k, l, i
1550 call s_mpi_abort("NaN(s) in recv")
1551 end if
1552#endif
1553 end do
1554 end do
1555 end do
1556 end do
1557
1558# 758 "/home/runner/work/MFC/MFC/src/common/m_mpi_common.fpp"
1559#if defined(MFC_OpenACC)
1560# 758 "/home/runner/work/MFC/MFC/src/common/m_mpi_common.fpp"
1561!$acc end parallel loop
1562# 758 "/home/runner/work/MFC/MFC/src/common/m_mpi_common.fpp"
1563#elif defined(MFC_OpenMP)
1564# 758 "/home/runner/work/MFC/MFC/src/common/m_mpi_common.fpp"
1565
1566# 758 "/home/runner/work/MFC/MFC/src/common/m_mpi_common.fpp"
1567!$omp end target teams loop
1568# 758 "/home/runner/work/MFC/MFC/src/common/m_mpi_common.fpp"
1569#endif
1570
1571 if (qbmm_comm) then
1572
1573# 761 "/home/runner/work/MFC/MFC/src/common/m_mpi_common.fpp"
1574
1575# 761 "/home/runner/work/MFC/MFC/src/common/m_mpi_common.fpp"
1576#if defined(MFC_OpenACC)
1577# 761 "/home/runner/work/MFC/MFC/src/common/m_mpi_common.fpp"
1578!$acc parallel loop collapse(5) gang vector default(present) private(r)
1579# 761 "/home/runner/work/MFC/MFC/src/common/m_mpi_common.fpp"
1580#elif defined(MFC_OpenMP)
1581# 761 "/home/runner/work/MFC/MFC/src/common/m_mpi_common.fpp"
1582
1583# 761 "/home/runner/work/MFC/MFC/src/common/m_mpi_common.fpp"
1584
1585# 761 "/home/runner/work/MFC/MFC/src/common/m_mpi_common.fpp"
1586
1587# 761 "/home/runner/work/MFC/MFC/src/common/m_mpi_common.fpp"
1588!$omp target teams loop defaultmap(firstprivate:scalar) bind(teams,parallel) collapse(5) defaultmap(tofrom:aggregate) defaultmap(tofrom:allocatable) defaultmap(tofrom:pointer) private(r)
1589# 761 "/home/runner/work/MFC/MFC/src/common/m_mpi_common.fpp"
1590#endif
1591 do l = 0, p
1592 do k = 0, n
1593 do j = -buff_size, -1
1594 do i = nvar + 1, nvar + nnode
1595 do q = 1, nb
1596 r = (i - 1) + (q - 1)*nnode + v_size*(j + buff_size*((k + 1) + (n + 1)*l))
1597 pb_in(j + unpack_offset, k, l, i - nvar, q) = real(buff_recv(r), kind=stp)
1598 end do
1599 end do
1600 end do
1601 end do
1602 end do
1603
1604# 774 "/home/runner/work/MFC/MFC/src/common/m_mpi_common.fpp"
1605#if defined(MFC_OpenACC)
1606# 774 "/home/runner/work/MFC/MFC/src/common/m_mpi_common.fpp"
1607!$acc end parallel loop
1608# 774 "/home/runner/work/MFC/MFC/src/common/m_mpi_common.fpp"
1609#elif defined(MFC_OpenMP)
1610# 774 "/home/runner/work/MFC/MFC/src/common/m_mpi_common.fpp"
1611
1612# 774 "/home/runner/work/MFC/MFC/src/common/m_mpi_common.fpp"
1613!$omp end target teams loop
1614# 774 "/home/runner/work/MFC/MFC/src/common/m_mpi_common.fpp"
1615#endif
1616
1617
1618# 776 "/home/runner/work/MFC/MFC/src/common/m_mpi_common.fpp"
1619
1620# 776 "/home/runner/work/MFC/MFC/src/common/m_mpi_common.fpp"
1621#if defined(MFC_OpenACC)
1622# 776 "/home/runner/work/MFC/MFC/src/common/m_mpi_common.fpp"
1623!$acc parallel loop collapse(5) gang vector default(present) private(r)
1624# 776 "/home/runner/work/MFC/MFC/src/common/m_mpi_common.fpp"
1625#elif defined(MFC_OpenMP)
1626# 776 "/home/runner/work/MFC/MFC/src/common/m_mpi_common.fpp"
1627
1628# 776 "/home/runner/work/MFC/MFC/src/common/m_mpi_common.fpp"
1629
1630# 776 "/home/runner/work/MFC/MFC/src/common/m_mpi_common.fpp"
1631
1632# 776 "/home/runner/work/MFC/MFC/src/common/m_mpi_common.fpp"
1633!$omp target teams loop defaultmap(firstprivate:scalar) bind(teams,parallel) collapse(5) defaultmap(tofrom:aggregate) defaultmap(tofrom:allocatable) defaultmap(tofrom:pointer) private(r)
1634# 776 "/home/runner/work/MFC/MFC/src/common/m_mpi_common.fpp"
1635#endif
1636 do l = 0, p
1637 do k = 0, n
1638 do j = -buff_size, -1
1639 do i = nvar + 1, nvar + nnode
1640 do q = 1, nb
1641 r = (i - 1) + (q - 1)*nnode + nb*nnode + v_size*(j + buff_size*((k + 1) + (n + 1)*l))
1642 mv_in(j + unpack_offset, k, l, i - nvar, q) = real(buff_recv(r), kind=stp)
1643 end do
1644 end do
1645 end do
1646 end do
1647 end do
1648
1649# 789 "/home/runner/work/MFC/MFC/src/common/m_mpi_common.fpp"
1650#if defined(MFC_OpenACC)
1651# 789 "/home/runner/work/MFC/MFC/src/common/m_mpi_common.fpp"
1652!$acc end parallel loop
1653# 789 "/home/runner/work/MFC/MFC/src/common/m_mpi_common.fpp"
1654#elif defined(MFC_OpenMP)
1655# 789 "/home/runner/work/MFC/MFC/src/common/m_mpi_common.fpp"
1656
1657# 789 "/home/runner/work/MFC/MFC/src/common/m_mpi_common.fpp"
1658!$omp end target teams loop
1659# 789 "/home/runner/work/MFC/MFC/src/common/m_mpi_common.fpp"
1660#endif
1661 end if
1662# 899 "/home/runner/work/MFC/MFC/src/common/m_mpi_common.fpp"
1663 end if
1664# 739 "/home/runner/work/MFC/MFC/src/common/m_mpi_common.fpp"
1665 if (mpi_dir == 2) then
1666# 792 "/home/runner/work/MFC/MFC/src/common/m_mpi_common.fpp"
1667
1668# 792 "/home/runner/work/MFC/MFC/src/common/m_mpi_common.fpp"
1669
1670# 792 "/home/runner/work/MFC/MFC/src/common/m_mpi_common.fpp"
1671#if defined(MFC_OpenACC)
1672# 792 "/home/runner/work/MFC/MFC/src/common/m_mpi_common.fpp"
1673!$acc parallel loop collapse(4) gang vector default(present) private(r)
1674# 792 "/home/runner/work/MFC/MFC/src/common/m_mpi_common.fpp"
1675#elif defined(MFC_OpenMP)
1676# 792 "/home/runner/work/MFC/MFC/src/common/m_mpi_common.fpp"
1677
1678# 792 "/home/runner/work/MFC/MFC/src/common/m_mpi_common.fpp"
1679
1680# 792 "/home/runner/work/MFC/MFC/src/common/m_mpi_common.fpp"
1681
1682# 792 "/home/runner/work/MFC/MFC/src/common/m_mpi_common.fpp"
1683!$omp target teams loop defaultmap(firstprivate:scalar) bind(teams,parallel) collapse(4) defaultmap(tofrom:aggregate) defaultmap(tofrom:allocatable) defaultmap(tofrom:pointer) private(r)
1684# 792 "/home/runner/work/MFC/MFC/src/common/m_mpi_common.fpp"
1685#endif
1686 do i = 1, nvar
1687 do l = 0, p
1688 do k = -buff_size, -1
1689 do j = -buff_size, m + buff_size
1690 r = (i - 1) + v_size*((j + buff_size) + (m + 2*buff_size + 1)*((k + buff_size) + buff_size*l))
1691 q_comm(i)%sf(j, k + unpack_offset, l) = real(buff_recv(r), kind=stp)
1692#if defined(__INTEL_COMPILER)
1693 if (ieee_is_nan(q_comm(i)%sf(j, k + unpack_offset, l))) then
1694 print *, "Error", j, k, l, i
1695 call s_mpi_abort("NaN(s) in recv")
1696 end if
1697#endif
1698 end do
1699 end do
1700 end do
1701 end do
1702
1703# 809 "/home/runner/work/MFC/MFC/src/common/m_mpi_common.fpp"
1704#if defined(MFC_OpenACC)
1705# 809 "/home/runner/work/MFC/MFC/src/common/m_mpi_common.fpp"
1706!$acc end parallel loop
1707# 809 "/home/runner/work/MFC/MFC/src/common/m_mpi_common.fpp"
1708#elif defined(MFC_OpenMP)
1709# 809 "/home/runner/work/MFC/MFC/src/common/m_mpi_common.fpp"
1710
1711# 809 "/home/runner/work/MFC/MFC/src/common/m_mpi_common.fpp"
1712!$omp end target teams loop
1713# 809 "/home/runner/work/MFC/MFC/src/common/m_mpi_common.fpp"
1714#endif
1715
1716 if (qbmm_comm) then
1717
1718# 812 "/home/runner/work/MFC/MFC/src/common/m_mpi_common.fpp"
1719
1720# 812 "/home/runner/work/MFC/MFC/src/common/m_mpi_common.fpp"
1721#if defined(MFC_OpenACC)
1722# 812 "/home/runner/work/MFC/MFC/src/common/m_mpi_common.fpp"
1723!$acc parallel loop collapse(5) gang vector default(present) private(r)
1724# 812 "/home/runner/work/MFC/MFC/src/common/m_mpi_common.fpp"
1725#elif defined(MFC_OpenMP)
1726# 812 "/home/runner/work/MFC/MFC/src/common/m_mpi_common.fpp"
1727
1728# 812 "/home/runner/work/MFC/MFC/src/common/m_mpi_common.fpp"
1729
1730# 812 "/home/runner/work/MFC/MFC/src/common/m_mpi_common.fpp"
1731
1732# 812 "/home/runner/work/MFC/MFC/src/common/m_mpi_common.fpp"
1733!$omp target teams loop defaultmap(firstprivate:scalar) bind(teams,parallel) collapse(5) defaultmap(tofrom:aggregate) defaultmap(tofrom:allocatable) defaultmap(tofrom:pointer) private(r)
1734# 812 "/home/runner/work/MFC/MFC/src/common/m_mpi_common.fpp"
1735#endif
1736 do i = nvar + 1, nvar + nnode
1737 do l = 0, p
1738 do k = -buff_size, -1
1739 do j = -buff_size, m + buff_size
1740 do q = 1, nb
1741 r = (i - 1) + (q - 1)*nnode + v_size*((j + buff_size) + (m + 2*buff_size + 1)*((k &
1742 & + buff_size) + buff_size*l))
1743 pb_in(j, k + unpack_offset, l, i - nvar, q) = real(buff_recv(r), kind=stp)
1744 end do
1745 end do
1746 end do
1747 end do
1748 end do
1749
1750# 826 "/home/runner/work/MFC/MFC/src/common/m_mpi_common.fpp"
1751#if defined(MFC_OpenACC)
1752# 826 "/home/runner/work/MFC/MFC/src/common/m_mpi_common.fpp"
1753!$acc end parallel loop
1754# 826 "/home/runner/work/MFC/MFC/src/common/m_mpi_common.fpp"
1755#elif defined(MFC_OpenMP)
1756# 826 "/home/runner/work/MFC/MFC/src/common/m_mpi_common.fpp"
1757
1758# 826 "/home/runner/work/MFC/MFC/src/common/m_mpi_common.fpp"
1759!$omp end target teams loop
1760# 826 "/home/runner/work/MFC/MFC/src/common/m_mpi_common.fpp"
1761#endif
1762
1763
1764# 828 "/home/runner/work/MFC/MFC/src/common/m_mpi_common.fpp"
1765
1766# 828 "/home/runner/work/MFC/MFC/src/common/m_mpi_common.fpp"
1767#if defined(MFC_OpenACC)
1768# 828 "/home/runner/work/MFC/MFC/src/common/m_mpi_common.fpp"
1769!$acc parallel loop collapse(5) gang vector default(present) private(r)
1770# 828 "/home/runner/work/MFC/MFC/src/common/m_mpi_common.fpp"
1771#elif defined(MFC_OpenMP)
1772# 828 "/home/runner/work/MFC/MFC/src/common/m_mpi_common.fpp"
1773
1774# 828 "/home/runner/work/MFC/MFC/src/common/m_mpi_common.fpp"
1775
1776# 828 "/home/runner/work/MFC/MFC/src/common/m_mpi_common.fpp"
1777
1778# 828 "/home/runner/work/MFC/MFC/src/common/m_mpi_common.fpp"
1779!$omp target teams loop defaultmap(firstprivate:scalar) bind(teams,parallel) collapse(5) defaultmap(tofrom:aggregate) defaultmap(tofrom:allocatable) defaultmap(tofrom:pointer) private(r)
1780# 828 "/home/runner/work/MFC/MFC/src/common/m_mpi_common.fpp"
1781#endif
1782 do i = nvar + 1, nvar + nnode
1783 do l = 0, p
1784 do k = -buff_size, -1
1785 do j = -buff_size, m + buff_size
1786 do q = 1, nb
1787 r = (i - 1) + (q - 1)*nnode + nb*nnode + v_size*((j + buff_size) + (m + 2*buff_size &
1788 & + 1)*((k + buff_size) + buff_size*l))
1789 mv_in(j, k + unpack_offset, l, i - nvar, q) = real(buff_recv(r), kind=stp)
1790 end do
1791 end do
1792 end do
1793 end do
1794 end do
1795
1796# 842 "/home/runner/work/MFC/MFC/src/common/m_mpi_common.fpp"
1797#if defined(MFC_OpenACC)
1798# 842 "/home/runner/work/MFC/MFC/src/common/m_mpi_common.fpp"
1799!$acc end parallel loop
1800# 842 "/home/runner/work/MFC/MFC/src/common/m_mpi_common.fpp"
1801#elif defined(MFC_OpenMP)
1802# 842 "/home/runner/work/MFC/MFC/src/common/m_mpi_common.fpp"
1803
1804# 842 "/home/runner/work/MFC/MFC/src/common/m_mpi_common.fpp"
1805!$omp end target teams loop
1806# 842 "/home/runner/work/MFC/MFC/src/common/m_mpi_common.fpp"
1807#endif
1808 end if
1809# 899 "/home/runner/work/MFC/MFC/src/common/m_mpi_common.fpp"
1810 end if
1811# 739 "/home/runner/work/MFC/MFC/src/common/m_mpi_common.fpp"
1812 if (mpi_dir == 3) then
1813# 845 "/home/runner/work/MFC/MFC/src/common/m_mpi_common.fpp"
1814
1815# 845 "/home/runner/work/MFC/MFC/src/common/m_mpi_common.fpp"
1816
1817# 845 "/home/runner/work/MFC/MFC/src/common/m_mpi_common.fpp"
1818#if defined(MFC_OpenACC)
1819# 845 "/home/runner/work/MFC/MFC/src/common/m_mpi_common.fpp"
1820!$acc parallel loop collapse(4) gang vector default(present) private(r)
1821# 845 "/home/runner/work/MFC/MFC/src/common/m_mpi_common.fpp"
1822#elif defined(MFC_OpenMP)
1823# 845 "/home/runner/work/MFC/MFC/src/common/m_mpi_common.fpp"
1824
1825# 845 "/home/runner/work/MFC/MFC/src/common/m_mpi_common.fpp"
1826
1827# 845 "/home/runner/work/MFC/MFC/src/common/m_mpi_common.fpp"
1828
1829# 845 "/home/runner/work/MFC/MFC/src/common/m_mpi_common.fpp"
1830!$omp target teams loop defaultmap(firstprivate:scalar) bind(teams,parallel) collapse(4) defaultmap(tofrom:aggregate) defaultmap(tofrom:allocatable) defaultmap(tofrom:pointer) private(r)
1831# 845 "/home/runner/work/MFC/MFC/src/common/m_mpi_common.fpp"
1832#endif
1833 do i = 1, nvar
1834 do l = -buff_size, -1
1835 do k = -buff_size, n + buff_size
1836 do j = -buff_size, m + buff_size
1837 r = (i - 1) + v_size*((j + buff_size) + (m + 2*buff_size + 1)*((k + buff_size) + (n &
1838 & + 2*buff_size + 1)*(l + buff_size)))
1839 q_comm(i)%sf(j, k, l + unpack_offset) = real(buff_recv(r), kind=stp)
1840#if defined(__INTEL_COMPILER)
1841 if (ieee_is_nan(q_comm(i)%sf(j, k, l + unpack_offset))) then
1842 print *, "Error", j, k, l, i
1843 call s_mpi_abort("NaN(s) in recv")
1844 end if
1845#endif
1846 end do
1847 end do
1848 end do
1849 end do
1850
1851# 863 "/home/runner/work/MFC/MFC/src/common/m_mpi_common.fpp"
1852#if defined(MFC_OpenACC)
1853# 863 "/home/runner/work/MFC/MFC/src/common/m_mpi_common.fpp"
1854!$acc end parallel loop
1855# 863 "/home/runner/work/MFC/MFC/src/common/m_mpi_common.fpp"
1856#elif defined(MFC_OpenMP)
1857# 863 "/home/runner/work/MFC/MFC/src/common/m_mpi_common.fpp"
1858
1859# 863 "/home/runner/work/MFC/MFC/src/common/m_mpi_common.fpp"
1860!$omp end target teams loop
1861# 863 "/home/runner/work/MFC/MFC/src/common/m_mpi_common.fpp"
1862#endif
1863
1864 if (qbmm_comm) then
1865
1866# 866 "/home/runner/work/MFC/MFC/src/common/m_mpi_common.fpp"
1867
1868# 866 "/home/runner/work/MFC/MFC/src/common/m_mpi_common.fpp"
1869#if defined(MFC_OpenACC)
1870# 866 "/home/runner/work/MFC/MFC/src/common/m_mpi_common.fpp"
1871!$acc parallel loop collapse(5) gang vector default(present) private(r)
1872# 866 "/home/runner/work/MFC/MFC/src/common/m_mpi_common.fpp"
1873#elif defined(MFC_OpenMP)
1874# 866 "/home/runner/work/MFC/MFC/src/common/m_mpi_common.fpp"
1875
1876# 866 "/home/runner/work/MFC/MFC/src/common/m_mpi_common.fpp"
1877
1878# 866 "/home/runner/work/MFC/MFC/src/common/m_mpi_common.fpp"
1879
1880# 866 "/home/runner/work/MFC/MFC/src/common/m_mpi_common.fpp"
1881!$omp target teams loop defaultmap(firstprivate:scalar) bind(teams,parallel) collapse(5) defaultmap(tofrom:aggregate) defaultmap(tofrom:allocatable) defaultmap(tofrom:pointer) private(r)
1882# 866 "/home/runner/work/MFC/MFC/src/common/m_mpi_common.fpp"
1883#endif
1884 do i = nvar + 1, nvar + nnode
1885 do l = -buff_size, -1
1886 do k = -buff_size, n + buff_size
1887 do j = -buff_size, m + buff_size
1888 do q = 1, nb
1889 r = (i - 1) + (q - 1)*nnode + v_size*((j + buff_size) + (m + 2*buff_size + 1)*((k &
1890 & + buff_size) + (n + 2*buff_size + 1)*(l + buff_size)))
1891 pb_in(j, k, l + unpack_offset, i - nvar, q) = real(buff_recv(r), kind=stp)
1892 end do
1893 end do
1894 end do
1895 end do
1896 end do
1897
1898# 880 "/home/runner/work/MFC/MFC/src/common/m_mpi_common.fpp"
1899#if defined(MFC_OpenACC)
1900# 880 "/home/runner/work/MFC/MFC/src/common/m_mpi_common.fpp"
1901!$acc end parallel loop
1902# 880 "/home/runner/work/MFC/MFC/src/common/m_mpi_common.fpp"
1903#elif defined(MFC_OpenMP)
1904# 880 "/home/runner/work/MFC/MFC/src/common/m_mpi_common.fpp"
1905
1906# 880 "/home/runner/work/MFC/MFC/src/common/m_mpi_common.fpp"
1907!$omp end target teams loop
1908# 880 "/home/runner/work/MFC/MFC/src/common/m_mpi_common.fpp"
1909#endif
1910
1911
1912# 882 "/home/runner/work/MFC/MFC/src/common/m_mpi_common.fpp"
1913
1914# 882 "/home/runner/work/MFC/MFC/src/common/m_mpi_common.fpp"
1915#if defined(MFC_OpenACC)
1916# 882 "/home/runner/work/MFC/MFC/src/common/m_mpi_common.fpp"
1917!$acc parallel loop collapse(5) gang vector default(present) private(r)
1918# 882 "/home/runner/work/MFC/MFC/src/common/m_mpi_common.fpp"
1919#elif defined(MFC_OpenMP)
1920# 882 "/home/runner/work/MFC/MFC/src/common/m_mpi_common.fpp"
1921
1922# 882 "/home/runner/work/MFC/MFC/src/common/m_mpi_common.fpp"
1923
1924# 882 "/home/runner/work/MFC/MFC/src/common/m_mpi_common.fpp"
1925
1926# 882 "/home/runner/work/MFC/MFC/src/common/m_mpi_common.fpp"
1927!$omp target teams loop defaultmap(firstprivate:scalar) bind(teams,parallel) collapse(5) defaultmap(tofrom:aggregate) defaultmap(tofrom:allocatable) defaultmap(tofrom:pointer) private(r)
1928# 882 "/home/runner/work/MFC/MFC/src/common/m_mpi_common.fpp"
1929#endif
1930 do i = nvar + 1, nvar + nnode
1931 do l = -buff_size, -1
1932 do k = -buff_size, n + buff_size
1933 do j = -buff_size, m + buff_size
1934 do q = 1, nb
1935 r = (i - 1) + (q - 1)*nnode + nb*nnode + v_size*((j + buff_size) + (m + 2*buff_size &
1936 & + 1)*((k + buff_size) + (n + 2*buff_size + 1)*(l + buff_size)))
1937 mv_in(j, k, l + unpack_offset, i - nvar, q) = real(buff_recv(r), kind=stp)
1938 end do
1939 end do
1940 end do
1941 end do
1942 end do
1943
1944# 896 "/home/runner/work/MFC/MFC/src/common/m_mpi_common.fpp"
1945#if defined(MFC_OpenACC)
1946# 896 "/home/runner/work/MFC/MFC/src/common/m_mpi_common.fpp"
1947!$acc end parallel loop
1948# 896 "/home/runner/work/MFC/MFC/src/common/m_mpi_common.fpp"
1949#elif defined(MFC_OpenMP)
1950# 896 "/home/runner/work/MFC/MFC/src/common/m_mpi_common.fpp"
1951
1952# 896 "/home/runner/work/MFC/MFC/src/common/m_mpi_common.fpp"
1953!$omp end target teams loop
1954# 896 "/home/runner/work/MFC/MFC/src/common/m_mpi_common.fpp"
1955#endif
1956 end if
1957# 899 "/home/runner/work/MFC/MFC/src/common/m_mpi_common.fpp"
1958 end if
1959# 901 "/home/runner/work/MFC/MFC/src/common/m_mpi_common.fpp"
1960 call nvtxendrange
1961#endif
1962
1964
1965 !> Decompose the computational domain among processors by balancing cells per rank in each coordinate direction.
1967
1968#ifdef MFC_MPI
1969 integer :: num_procs_x, num_procs_y, num_procs_z !< Optimal number of processors in the x-, y- and z-directions
1970 !> Non-optimal number of processors in the x-, y- and z-directions
1971 real(wp) :: tmp_num_procs_x, tmp_num_procs_y, tmp_num_procs_z
1972 real(wp) :: fct_min !< Processor factorization (fct) minimization parameter
1973 integer :: MPI_COMM_CART !< Cartesian processor topology communicator
1974 integer :: rem_cells !< Remaining cells after distribution among processors
1975 integer :: recon_order !< WENO or MUSCL reconstruction order
1976 integer :: i, j !< Generic loop iterators
1977 integer :: ierr !< Generic flag used to identify and report MPI errors
1978
1979 if (recon_type == weno_type) then
1980 recon_order = weno_order
1981 else
1982 recon_order = muscl_order
1983 end if
1984
1985 if (num_procs == 1 .and. parallel_io) then
1986 do i = 1, num_dims
1987 start_idx(i) = 0
1988 end do
1989 return
1990 end if
1991
1992 if (igr) then
1993 recon_order = igr_order
1994 end if
1995
1996 ! 3D Cartesian Processor Topology
1997 if (n > 0) then
1998 if (p > 0) then
1999 if (fft_wrt) then
2000 ! Initial estimate of optimal processor topology
2001 num_procs_x = 1
2002 num_procs_y = 1
2003 num_procs_z = num_procs
2004 ierr = -1
2005
2006 ! Benchmarking the quality of this initial guess
2007 tmp_num_procs_y = num_procs_y
2008 tmp_num_procs_z = num_procs_z
2009 fct_min = 10._wp*abs((n + 1)/tmp_num_procs_y - (p + 1)/tmp_num_procs_z)
2010
2011 ! Optimization of the initial processor topology
2012 do i = 1, num_procs
2013 if (mod(num_procs, i) == 0 .and. (n + 1)/i >= num_stcls_min*recon_order) then
2014 tmp_num_procs_y = i
2015 tmp_num_procs_z = num_procs/i
2016
2017 if (fct_min >= abs((n + 1)/tmp_num_procs_y - (p + 1)/tmp_num_procs_z) .and. (p + 1) &
2018 & /tmp_num_procs_z >= num_stcls_min*recon_order) then
2019 num_procs_y = i
2020 num_procs_z = num_procs/i
2021 fct_min = abs((n + 1)/tmp_num_procs_y - (p + 1)/tmp_num_procs_z)
2022 ierr = 0
2023 end if
2024 end if
2025 end do
2026 else
2027 if (cyl_coord .and. p > 0) then
2028 ! Pencil blocking for cylindrical coordinates (Fourier filter near axis)
2029
2030 ! Initial values of the processor factorization optimization
2031 num_procs_x = 1
2032 num_procs_y = num_procs
2033 num_procs_z = 1
2034 ierr = -1
2035
2036 ! Computing minimization variable for these initial values
2037 tmp_num_procs_x = num_procs_x
2038 tmp_num_procs_y = num_procs_y
2039 tmp_num_procs_z = num_procs_z
2040 fct_min = 10._wp*abs((m + 1)/tmp_num_procs_x - (n + 1)/tmp_num_procs_y)
2041
2042 ! Searching for optimal computational domain distribution
2043 do i = 1, num_procs
2044 if (mod(num_procs, i) == 0 .and. (m + 1)/i >= num_stcls_min*recon_order) then
2045 tmp_num_procs_x = i
2046 tmp_num_procs_y = num_procs/i
2047
2048 if (fct_min >= abs((m + 1)/tmp_num_procs_x - (n + 1)/tmp_num_procs_y) .and. (n + 1) &
2049 & /tmp_num_procs_y >= num_stcls_min*recon_order) then
2050 num_procs_x = i
2051 num_procs_y = num_procs/i
2052 fct_min = abs((m + 1)/tmp_num_procs_x - (n + 1)/tmp_num_procs_y)
2053 ierr = 0
2054 end if
2055 end if
2056 end do
2057 else
2058 ! Initial estimate of optimal processor topology
2059 num_procs_x = 1
2060 num_procs_y = 1
2061 num_procs_z = num_procs
2062 ierr = -1
2063
2064 ! Benchmarking the quality of this initial guess
2065 tmp_num_procs_x = num_procs_x
2066 tmp_num_procs_y = num_procs_y
2067 tmp_num_procs_z = num_procs_z
2068 fct_min = 10._wp*abs((m + 1)/tmp_num_procs_x - (n + 1)/tmp_num_procs_y) + 10._wp*abs((n + 1) &
2069 & /tmp_num_procs_y - (p + 1)/tmp_num_procs_z)
2070
2071 ! Optimization of the initial processor topology
2072 do i = 1, num_procs
2073 if (mod(num_procs, i) == 0 .and. (m + 1)/i >= num_stcls_min*recon_order) then
2074 do j = 1, num_procs/i
2075 if (mod(num_procs/i, j) == 0 .and. (n + 1)/j >= num_stcls_min*recon_order) then
2076 tmp_num_procs_x = i
2077 tmp_num_procs_y = j
2078 tmp_num_procs_z = num_procs/(i*j)
2079
2080 if (fct_min >= abs((m + 1)/tmp_num_procs_x - (n + 1)/tmp_num_procs_y) + abs((n + 1) &
2081 & /tmp_num_procs_y - (p + 1)/tmp_num_procs_z) .and. (p + 1) &
2082 & /tmp_num_procs_z >= num_stcls_min*recon_order) then
2083 num_procs_x = i
2084 num_procs_y = j
2085 num_procs_z = num_procs/(i*j)
2086 fct_min = abs((m + 1)/tmp_num_procs_x - (n + 1)/tmp_num_procs_y) + abs((n + 1) &
2087 & /tmp_num_procs_y - (p + 1)/tmp_num_procs_z)
2088 ierr = 0
2089 end if
2090 end if
2091 end do
2092 end if
2093 end do
2094 end if
2095 end if
2096
2097 ! Verifying that a valid decomposition of the computational domain has been established. If not, the simulation
2098 ! exits.
2099 if (proc_rank == 0 .and. ierr == -1) then
2100 call s_mpi_abort('Unsupported combination of values ' // 'of num_procs, m, n, p and ' &
2101 & // 'weno/muscl/igr_order. Exiting.')
2102 end if
2103
2104 ! Creating new communicator using the Cartesian topology
2105 call mpi_cart_create(mpi_comm_world, 3, (/num_procs_x, num_procs_y, num_procs_z/), (/.true., .true., .true./), &
2106 & .false., mpi_comm_cart, ierr)
2107
2108 ! Finding the Cartesian coordinates of the local process
2109 call mpi_cart_coords(mpi_comm_cart, proc_rank, 3, proc_coords, ierr)
2110
2111 ! Global Parameters for z-direction
2112
2113 ! Number of remaining cells
2114 rem_cells = mod(p + 1, num_procs_z)
2115
2116 ! Optimal number of cells per processor
2117 p = (p + 1)/num_procs_z - 1
2118
2119 ! Distributing the remaining cells
2120 do i = 1, rem_cells
2121 if (proc_coords(3) == i - 1) then
2122 p = p + 1; exit
2123 end if
2124 end do
2125
2126 ! Boundary condition at the beginning
2127 if (proc_coords(3) > 0 .or. (bc_z%beg == bc_periodic .and. num_procs_z > 1)) then
2128 proc_coords(3) = proc_coords(3) - 1
2129 call mpi_cart_rank(mpi_comm_cart, proc_coords, bc_z%beg, ierr)
2130 proc_coords(3) = proc_coords(3) + 1
2131 end if
2132
2133 ! Boundary condition at the end
2134 if (proc_coords(3) < num_procs_z - 1 .or. (bc_z%end == bc_periodic .and. num_procs_z > 1)) then
2135 proc_coords(3) = proc_coords(3) + 1
2136 call mpi_cart_rank(mpi_comm_cart, proc_coords, bc_z%end, ierr)
2137 proc_coords(3) = proc_coords(3) - 1
2138 end if
2139
2140#ifdef MFC_POST_PROCESS
2141 ! Ghost zone at the beginning
2142 if (proc_coords(3) > 0 .and. format == 1) then
2143 offset_z%beg = 2
2144 else
2145 offset_z%beg = 0
2146 end if
2147
2148 ! Ghost zone at the end
2149 if (proc_coords(3) < num_procs_z - 1 .and. format == 1) then
2150 offset_z%end = 2
2151 else
2152 offset_z%end = 0
2153 end if
2154#endif
2155
2156 ! Beginning and end sub-domain boundary locations
2157 if (parallel_io) then
2158 if (proc_coords(3) < rem_cells) then
2159 start_idx(3) = (p + 1)*proc_coords(3)
2160 else
2161 start_idx(3) = (p + 1)*proc_coords(3) + rem_cells
2162 end if
2163 else
2164#ifdef MFC_PRE_PROCESS
2165 if (old_grid .neqv. .true.) then
2166 dz = (z_domain%end - z_domain%beg)/real(p_glb + 1, wp)
2167
2168 if (proc_coords(3) < rem_cells) then
2169 z_domain%beg = z_domain%beg + dz*real((p + 1)*proc_coords(3))
2170 z_domain%end = z_domain%end - dz*real((p + 1)*(num_procs_z - proc_coords(3) - 1) - (num_procs_z &
2171 & - rem_cells))
2172 else
2173 z_domain%beg = z_domain%beg + dz*real((p + 1)*proc_coords(3) + rem_cells)
2174 z_domain%end = z_domain%end - dz*real((p + 1)*(num_procs_z - proc_coords(3) - 1))
2175 end if
2176 end if
2177#endif
2178 end if
2179
2180 ! 2D Cartesian Processor Topology
2181 else
2182 ! Initial estimate of optimal processor topology
2183 num_procs_x = 1
2184 num_procs_y = num_procs
2185 ierr = -1
2186
2187 ! Benchmarking the quality of this initial guess
2188 tmp_num_procs_x = num_procs_x
2189 tmp_num_procs_y = num_procs_y
2190 fct_min = 10._wp*abs((m + 1)/tmp_num_procs_x - (n + 1)/tmp_num_procs_y)
2191
2192 ! Optimization of the initial processor topology
2193 do i = 1, num_procs
2194 if (mod(num_procs, i) == 0 .and. (m + 1)/i >= num_stcls_min*recon_order) then
2195 tmp_num_procs_x = i
2196 tmp_num_procs_y = num_procs/i
2197
2198 if (fct_min >= abs((m + 1)/tmp_num_procs_x - (n + 1)/tmp_num_procs_y) .and. (n + 1) &
2199 & /tmp_num_procs_y >= num_stcls_min*recon_order) then
2200 num_procs_x = i
2201 num_procs_y = num_procs/i
2202 fct_min = abs((m + 1)/tmp_num_procs_x - (n + 1)/tmp_num_procs_y)
2203 ierr = 0
2204 end if
2205 end if
2206 end do
2207
2208 ! Verifying that a valid decomposition of the computational domain has been established. If not, the simulation
2209 ! exits.
2210 if (proc_rank == 0 .and. ierr == -1) then
2211 call s_mpi_abort('Unsupported combination of values ' // 'of num_procs, m, n and ' &
2212 & // 'weno/muscl/igr_order. Exiting.')
2213 end if
2214
2215 ! Creating new communicator using the Cartesian topology
2216 call mpi_cart_create(mpi_comm_world, 2, (/num_procs_x, num_procs_y/), (/.true., .true./), .false., mpi_comm_cart, &
2217 & ierr)
2218
2219 ! Finding the Cartesian coordinates of the local process
2220 call mpi_cart_coords(mpi_comm_cart, proc_rank, 2, proc_coords, ierr)
2221 end if
2222
2223 ! Global Parameters for y-direction
2224
2225 ! Number of remaining cells
2226 rem_cells = mod(n + 1, num_procs_y)
2227
2228 ! Optimal number of cells per processor
2229 n = (n + 1)/num_procs_y - 1
2230
2231 ! Distributing the remaining cells
2232 do i = 1, rem_cells
2233 if (proc_coords(2) == i - 1) then
2234 n = n + 1; exit
2235 end if
2236 end do
2237
2238 ! Boundary condition at the beginning
2239 if (proc_coords(2) > 0 .or. (bc_y%beg == bc_periodic .and. num_procs_y > 1)) then
2240 proc_coords(2) = proc_coords(2) - 1
2241 call mpi_cart_rank(mpi_comm_cart, proc_coords, bc_y%beg, ierr)
2242 proc_coords(2) = proc_coords(2) + 1
2243 end if
2244
2245 ! Boundary condition at the end
2246 if (proc_coords(2) < num_procs_y - 1 .or. (bc_y%end == bc_periodic .and. num_procs_y > 1)) then
2247 proc_coords(2) = proc_coords(2) + 1
2248 call mpi_cart_rank(mpi_comm_cart, proc_coords, bc_y%end, ierr)
2249 proc_coords(2) = proc_coords(2) - 1
2250 end if
2251
2252#ifdef MFC_POST_PROCESS
2253 ! Ghost zone at the beginning
2254 if (proc_coords(2) > 0 .and. format == 1) then
2255 offset_y%beg = 2
2256 else
2257 offset_y%beg = 0
2258 end if
2259
2260 ! Ghost zone at the end
2261 if (proc_coords(2) < num_procs_y - 1 .and. format == 1) then
2262 offset_y%end = 2
2263 else
2264 offset_y%end = 0
2265 end if
2266#endif
2267
2268 ! Beginning and end sub-domain boundary locations
2269 if (parallel_io) then
2270 if (proc_coords(2) < rem_cells) then
2271 start_idx(2) = (n + 1)*proc_coords(2)
2272 else
2273 start_idx(2) = (n + 1)*proc_coords(2) + rem_cells
2274 end if
2275 else
2276#ifdef MFC_PRE_PROCESS
2277 if (old_grid .neqv. .true.) then
2278 dy = (y_domain%end - y_domain%beg)/real(n_glb + 1, wp)
2279
2280 if (proc_coords(2) < rem_cells) then
2281 y_domain%beg = y_domain%beg + dy*real((n + 1)*proc_coords(2))
2282 y_domain%end = y_domain%end - dy*real((n + 1)*(num_procs_y - proc_coords(2) - 1) - (num_procs_y &
2283 & - rem_cells))
2284 else
2285 y_domain%beg = y_domain%beg + dy*real((n + 1)*proc_coords(2) + rem_cells)
2286 y_domain%end = y_domain%end - dy*real((n + 1)*(num_procs_y - proc_coords(2) - 1))
2287 end if
2288 end if
2289#endif
2290 end if
2291
2292 ! 1D Cartesian Processor Topology
2293 else
2294 ! Optimal processor topology
2295 num_procs_x = num_procs
2296
2297 ! Creating new communicator using the Cartesian topology
2298 call mpi_cart_create(mpi_comm_world, 1, (/num_procs_x/), (/.true./), .false., mpi_comm_cart, ierr)
2299
2300 ! Finding the Cartesian coordinates of the local process
2301 call mpi_cart_coords(mpi_comm_cart, proc_rank, 1, proc_coords, ierr)
2302 end if
2303
2304 ! Global Parameters for x-direction
2305
2306 ! Number of remaining cells
2307 rem_cells = mod(m + 1, num_procs_x)
2308
2309 ! Optimal number of cells per processor
2310 m = (m + 1)/num_procs_x - 1
2311
2312 ! Distributing the remaining cells
2313 do i = 1, rem_cells
2314 if (proc_coords(1) == i - 1) then
2315 m = m + 1; exit
2316 end if
2317 end do
2318
2319 call s_update_cell_bounds(cells_bounds, m, n, p)
2320
2321 ! Boundary condition at the beginning
2322 if (proc_coords(1) > 0 .or. (bc_x%beg == bc_periodic .and. num_procs_x > 1)) then
2323 proc_coords(1) = proc_coords(1) - 1
2324 call mpi_cart_rank(mpi_comm_cart, proc_coords, bc_x%beg, ierr)
2325 proc_coords(1) = proc_coords(1) + 1
2326 end if
2327
2328 ! Boundary condition at the end
2329 if (proc_coords(1) < num_procs_x - 1 .or. (bc_x%end == bc_periodic .and. num_procs_x > 1)) then
2330 proc_coords(1) = proc_coords(1) + 1
2331 call mpi_cart_rank(mpi_comm_cart, proc_coords, bc_x%end, ierr)
2332 proc_coords(1) = proc_coords(1) - 1
2333 end if
2334
2335#ifdef MFC_POST_PROCESS
2336 ! Ghost zone at the beginning
2337 if (proc_coords(1) > 0 .and. format == 1) then
2338 offset_x%beg = 2
2339 else
2340 offset_x%beg = 0
2341 end if
2342
2343 ! Ghost zone at the end
2344 if (proc_coords(1) < num_procs_x - 1 .and. format == 1) then
2345 offset_x%end = 2
2346 else
2347 offset_x%end = 0
2348 end if
2349#endif
2350
2351 ! Beginning and end sub-domain boundary locations
2352 if (parallel_io) then
2353 if (proc_coords(1) < rem_cells) then
2354 start_idx(1) = (m + 1)*proc_coords(1)
2355 else
2356 start_idx(1) = (m + 1)*proc_coords(1) + rem_cells
2357 end if
2358 else
2359#ifdef MFC_PRE_PROCESS
2360 if (old_grid .neqv. .true.) then
2361 dx = (x_domain%end - x_domain%beg)/real(m_glb + 1, wp)
2362
2363 if (proc_coords(1) < rem_cells) then
2364 x_domain%beg = x_domain%beg + dx*real((m + 1)*proc_coords(1))
2365 x_domain%end = x_domain%end - dx*real((m + 1)*(num_procs_x - proc_coords(1) - 1) - (num_procs_x - rem_cells))
2366 else
2367 x_domain%beg = x_domain%beg + dx*real((m + 1)*proc_coords(1) + rem_cells)
2368 x_domain%end = x_domain%end - dx*real((m + 1)*(num_procs_x - proc_coords(1) - 1))
2369 end if
2370 end if
2371#endif
2372 end if
2373#endif
2374
2376
2377 !> The goal of this procedure is to populate the buffers of the grid variables by communicating with the neighboring processors.
2378 !! Note that only the buffers of the cell-width distributions are handled in such a way. This is because the buffers of
2379 !! cell-boundary locations may be calculated directly from those of the cell-width distributions.
2380#ifndef MFC_PRE_PROCESS
2381 subroutine s_mpi_sendrecv_grid_variables_buffers(mpi_dir, pbc_loc)
2382
2383 integer, intent(in) :: mpi_dir
2384 integer, intent(in) :: pbc_loc
2385
2386#ifdef MFC_MPI
2387 integer :: ierr !< Generic flag used to identify and report MPI errors
2388
2389 if (mpi_dir == 1) then
2390 if (pbc_loc == -1) then ! PBC at the beginning
2391
2392 if (bc_x%end >= 0) then ! PBC at the beginning and end
2393 call mpi_sendrecv(dx(m - buff_size + 1), buff_size, mpi_p, bc_x%end, 0, dx(-buff_size), buff_size, mpi_p, &
2394 & bc_x%beg, 0, mpi_comm_world, mpi_status_ignore, ierr)
2395 else ! PBC at the beginning only
2396 call mpi_sendrecv(dx(0), buff_size, mpi_p, bc_x%beg, 1, dx(-buff_size), buff_size, mpi_p, bc_x%beg, 0, &
2397 & mpi_comm_world, mpi_status_ignore, ierr)
2398 end if
2399 else ! PBC at the end
2400 if (bc_x%beg >= 0) then ! PBC at the end and beginning
2401 call mpi_sendrecv(dx(0), buff_size, mpi_p, bc_x%beg, 1, dx(m + 1), buff_size, mpi_p, bc_x%end, 1, &
2402 & mpi_comm_world, mpi_status_ignore, ierr)
2403 else ! PBC at the end only
2404 call mpi_sendrecv(dx(m - buff_size + 1), buff_size, mpi_p, bc_x%end, 0, dx(m + 1), buff_size, mpi_p, &
2405 & bc_x%end, 1, mpi_comm_world, mpi_status_ignore, ierr)
2406 end if
2407 end if
2408 else if (mpi_dir == 2) then
2409 if (pbc_loc == -1) then ! PBC at the beginning
2410
2411 if (bc_y%end >= 0) then ! PBC at the beginning and end
2412 call mpi_sendrecv(dy(n - buff_size + 1), buff_size, mpi_p, bc_y%end, 0, dy(-buff_size), buff_size, mpi_p, &
2413 & bc_y%beg, 0, mpi_comm_world, mpi_status_ignore, ierr)
2414 else ! PBC at the beginning only
2415 call mpi_sendrecv(dy(0), buff_size, mpi_p, bc_y%beg, 1, dy(-buff_size), buff_size, mpi_p, bc_y%beg, 0, &
2416 & mpi_comm_world, mpi_status_ignore, ierr)
2417 end if
2418 else ! PBC at the end
2419 if (bc_y%beg >= 0) then ! PBC at the end and beginning
2420 call mpi_sendrecv(dy(0), buff_size, mpi_p, bc_y%beg, 1, dy(n + 1), buff_size, mpi_p, bc_y%end, 1, &
2421 & mpi_comm_world, mpi_status_ignore, ierr)
2422 else ! PBC at the end only
2423 call mpi_sendrecv(dy(n - buff_size + 1), buff_size, mpi_p, bc_y%end, 0, dy(n + 1), buff_size, mpi_p, &
2424 & bc_y%end, 1, mpi_comm_world, mpi_status_ignore, ierr)
2425 end if
2426 end if
2427 else
2428 if (pbc_loc == -1) then ! PBC at the beginning
2429
2430 if (bc_z%end >= 0) then ! PBC at the beginning and end
2431 call mpi_sendrecv(dz(p - buff_size + 1), buff_size, mpi_p, bc_z%end, 0, dz(-buff_size), buff_size, mpi_p, &
2432 & bc_z%beg, 0, mpi_comm_world, mpi_status_ignore, ierr)
2433 else ! PBC at the beginning only
2434 call mpi_sendrecv(dz(0), buff_size, mpi_p, bc_z%beg, 1, dz(-buff_size), buff_size, mpi_p, bc_z%beg, 0, &
2435 & mpi_comm_world, mpi_status_ignore, ierr)
2436 end if
2437 else ! PBC at the end
2438 if (bc_z%beg >= 0) then ! PBC at the end and beginning
2439 call mpi_sendrecv(dz(0), buff_size, mpi_p, bc_z%beg, 1, dz(p + 1), buff_size, mpi_p, bc_z%end, 1, &
2440 & mpi_comm_world, mpi_status_ignore, ierr)
2441 else ! PBC at the end only
2442 call mpi_sendrecv(dz(p - buff_size + 1), buff_size, mpi_p, bc_z%end, 0, dz(p + 1), buff_size, mpi_p, &
2443 & bc_z%end, 1, mpi_comm_world, mpi_status_ignore, ierr)
2444 end if
2445 end if
2446 end if
2447#endif
2448
2450#endif
2451
2452 !> Module deallocation and/or disassociation procedures
2454
2455#ifdef MFC_MPI
2456 deallocate (buff_send, buff_recv)
2457#endif
2458
2459 end subroutine s_finalize_mpi_common_module
2460
2461end module m_mpi_common
type(scalar_field), dimension(sys_size), intent(inout) q_cons_vf
integer, intent(in) j
Shared derived types for field data, patch geometry, bubble dynamics, and MPI I/O structures.
Global parameters for the post-process: domain geometry, equation of state, and output database setti...
integer sys_size
Number of unknowns in the system of equations.
integer buff_size
Number of ghost cells for boundary condition storage.
type(cell_num_bounds) cells_bounds
Utility routines for bubble model setup, coordinate transforms, array sampling, and special functions...
MPI communication layer: domain decomposition, halo exchange, reductions, and parallel I/O setup.
impure subroutine s_mpi_abort(prnt, code)
The subroutine terminates the MPI execution environment.
impure subroutine s_initialize_mpi_common_module
Initialize the module.
impure subroutine s_mpi_gather_data(my_vector, counts, gathered_vector, root)
Gather variable-length real vectors from all MPI ranks onto the root process.
impure subroutine s_mpi_barrier
Halts all processes until all have reached barrier.
impure subroutine s_mpi_initialize
Initialize the MPI execution environment and query the number of processors and local rank.
impure subroutine s_mpi_allreduce_vectors_sum(var_loc, var_glb, num_vectors, vector_length)
Reduce an array of vectors to their global sums across all MPI ranks.
real(wp), dimension(:), allocatable, private buff_recv
Primitive variable receive buffer for halo exchange.
impure subroutine s_initialize_mpi_data(q_cons_vf, ib_markers, beta)
Set up MPI I/O data views and variable pointers for parallel file output.
impure subroutine s_mpi_reduce_maxloc(var_loc)
Reduce a 2-element variable to its global maximum value with the owning processor rank (MPI_MAXLOC)....
subroutine s_mpi_sendrecv_grid_variables_buffers(mpi_dir, pbc_loc)
The goal of this procedure is to populate the buffers of the grid variables by communicating with the...
impure subroutine s_mpi_reduce_stability_criteria_extrema(icfl_max_loc, vcfl_max_loc, rc_min_loc, icfl_max_glb, vcfl_max_glb, rc_min_glb)
The goal of this subroutine is to determine the global extrema of the stability criteria in the compu...
impure subroutine s_mpi_allreduce_sum(var_loc, var_glb)
Reduce a local real value to its global sum across all MPI ranks.
real(wp), dimension(:), allocatable, private buff_send
Primitive variable send buffer for halo exchange.
impure subroutine s_mpi_allreduce_min(var_loc, var_glb)
Reduce a local real value to its global minimum across all MPI ranks.
subroutine s_mpi_sendrecv_variables_buffers(q_comm, mpi_dir, pbc_loc, nvar, pb_in, mv_in)
The goal of this procedure is to populate the buffers of the cell-average conservative variables by c...
impure subroutine s_prohibit_abort(condition, message)
Print a case file error with the prohibited condition and message, then abort execution.
impure subroutine s_mpi_finalize
The subroutine finalizes the MPI execution environment.
subroutine s_initialize_mpi_data_ds(q_cons_vf)
Set up MPI I/O data views for downsampled (coarsened) parallel file output.
impure subroutine s_mpi_allreduce_max(var_loc, var_glb)
Reduce a local real value to its global maximum across all MPI ranks.
impure subroutine s_mpi_allreduce_integer_sum(var_loc, var_glb)
Reduce a local integer value to its global sum across all MPI ranks.
integer, private v_size
impure subroutine mpi_bcast_time_step_values(proc_time, time_avg)
Gather per-rank time step wall-clock times onto rank 0 for performance reporting.
impure subroutine s_mpi_reduce_min(var_loc)
Reduce a local real value to its global minimum across all ranks.
impure subroutine s_finalize_mpi_common_module
Module deallocation and/or disassociation procedures.
integer(kind=8) halo_size
subroutine s_mpi_decompose_computational_domain
Decompose the computational domain among processors by balancing cells per rank in each coordinate di...
NVIDIA NVTX profiling API bindings for GPU performance instrumentation.
Definition m_nvtx.f90:6