Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

merge branch GPU-dev #403

Merged
merged 143 commits into from
Dec 24, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
143 commits
Select commit Hold shift + click to select a range
3f5e136
dfMatrixDataBase
maorz1998 Aug 1, 2023
a59a190
add unittest of fvm::div(phi, U)
STwangyingrui Aug 2, 2023
1b25756
simplify unittest
STwangyingrui Aug 3, 2023
cbd7b49
add initial version of GPU new UEqn
STwangyingrui Aug 3, 2023
be72eb0
small fix of fvm_div_boundary
STwangyingrui Aug 4, 2023
7be1190
modify fvm_div_scalar to fvm_div_vector
STwangyingrui Aug 4, 2023
196f22a
implement fvm::ddt(rho, U) and add unittest for it
STwangyingrui Aug 4, 2023
4606238
implement fvm::laplacian(gamma, U) and add unittest for it; fix sever…
STwangyingrui Aug 7, 2023
cc7223d
implement fvc::ddt(rho, K) and add unittest for it; fix several old b…
STwangyingrui Aug 7, 2023
15e30b4
workaround to fix a bug of floating-point numerical error for fvc_ddt
STwangyingrui Aug 7, 2023
a8c68c6
fix occasional errors of fvm fvm::ddt and fvc::ddt: caused by re-usin…
STwangyingrui Aug 7, 2023
2453037
workaround way two (use volatile) to avoid floating-point numerical e…
STwangyingrui Aug 7, 2023
ae31072
workaround way three (use nvcc option -fmad=false) to avoid floating-…
STwangyingrui Aug 7, 2023
b16100f
use template to simplify unittest
STwangyingrui Aug 7, 2023
8b3805f
modify getTypeInfo to support tensor type
STwangyingrui Aug 7, 2023
776aa69
first commit for debugging
maorz1998 Aug 9, 2023
4ee65cc
fvc op & CPU op
maorz1998 Aug 11, 2023
7e29a10
add the comparison with the original method
maorz1998 Aug 12, 2023
db5a689
run pass basic ueqn_gpu
STwangyingrui Aug 15, 2023
04d5512
fvc/fvm ops support sign
STwangyingrui Aug 15, 2023
cbb74cd
use cuda graph in ueqn
STwangyingrui Aug 15, 2023
50eee68
fix bugs in turbulence term
maorz1998 Aug 16, 2023
f691478
primiry opt & add time monitor
maorz1998 Aug 17, 2023
7a7accf
add solve part, fix some bugs
maorz1998 Aug 21, 2023
d684488
modify app
maorz1998 Aug 21, 2023
3f5c40a
Merge pull request #325 from STwangyingrui/mrz/0817
maorz1998 Aug 21, 2023
03744a4
fix bugs in UEqn
maorz1998 Aug 22, 2023
5c7daa5
opt ldu2csr
maorz1998 Aug 22, 2023
920fc84
fvMatrix.A(), fvMatrix.H(), opt ldu2csr
maorz1998 Aug 27, 2023
9a873b6
Merge pull request #327 from maorz1998/GPU-dev
maorz1998 Aug 27, 2023
a2a93fb
init YEqn on GPU(so solve yet)
STwangyingrui Aug 28, 2023
3c6eb86
merge GPU-Dev
STwangyingrui Aug 28, 2023
5710d72
YEqn use ldu instead of lower/upper/diag, and fix bug of fvm_laplacia…
STwangyingrui Aug 28, 2023
529b847
UEqn use DEBUG_ mode
STwangyingrui Aug 28, 2023
2aec825
fix comment of fvm::laplacian
STwangyingrui Aug 28, 2023
71a0cb2
yeqn: reopen graph and recover DEff
STwangyingrui Aug 28, 2023
14aa610
add dfRhoEqn, opt UEqn.A()
maorz1998 Aug 29, 2023
e35084c
Merge pull request #330 from maorz1998/GPU-dev
maorz1998 Aug 29, 2023
ab2a123
merge GPU-dev and resolve conflicts; refactor common fvc ops; adjust …
STwangyingrui Aug 29, 2023
2d33fe3
move thermo fields to database
STwangyingrui Aug 29, 2023
23330b4
move some fields (used by eeqn) from yeqn to database; and fix severa…
STwangyingrui Aug 29, 2023
1f2b398
Merge pull request #328 from STwangyingrui/yr/230815
maorz1998 Aug 30, 2023
9a7182a
fix bugs in dfUEqn, initial version of dfEEqn
maorz1998 Aug 31, 2023
80a42bc
Merge pull request #335 from maorz1998/GPU-dev
maorz1998 Aug 31, 2023
73a37fb
modify data structure of face vector, modify time monitor and macro d…
maorz1998 Sep 1, 2023
d89a865
Merge branch 'GPU-dev' of github.com:maorz1998/deepflame-dev into GPU…
maorz1998 Sep 1, 2023
aa096ea
Merge pull request #338 from maorz1998/GPU-dev
maorz1998 Sep 1, 2023
5543815
add support to open eqn one by one
STwangyingrui Sep 1, 2023
a04c639
1. Implemented fixedValue boundary condition, 2. Resolved some bugs, …
maorz1998 Sep 2, 2023
cdb5358
Merge pull request #339 from maorz1998/GPU-dev
maorz1998 Sep 2, 2023
d5a7c4c
Merge branch 'GPU-dev' into yr/230901_v2
STwangyingrui Sep 2, 2023
0562727
run pass stream ordered allocator
STwangyingrui Sep 2, 2023
3d577f6
run pass stream ordered allocator within graph
STwangyingrui Sep 2, 2023
57f6cd3
use macro to open stream ordered allocator; run pass tests of on-off …
STwangyingrui Sep 4, 2023
869df41
Merge pull request #340 from STwangyingrui/yr/230901_v2
maorz1998 Sep 4, 2023
51bb21f
calculate HbyA, rAU in dfUEqn
maorz1998 Sep 5, 2023
fed389a
Merge branch 'GPU-dev' into GPU-dev
maorz1998 Sep 5, 2023
b7c636b
Merge pull request #341 from maorz1998/GPU-dev
maorz1998 Sep 5, 2023
d429406
fix bugs in git merge
maorz1998 Sep 5, 2023
6ae0780
init nccl and set to dfMatrixDataBase
STwangyingrui Sep 14, 2023
57f79f7
first commit of dfpEqn
maorz1998 Sep 14, 2023
266cbb5
Merge pull request #343 from STwangyingrui/yr/init_nccl
maorz1998 Sep 15, 2023
aee2425
Merge pull request #344 from maorz1998/GPU-dev
maorz1998 Sep 15, 2023
55db157
add neighbour ranks
maorz1998 Sep 16, 2023
9b0bd4e
fixed some bugs, debugging the parallel construction of rhoEqn
maorz1998 Sep 16, 2023
5f91bf3
a parallel example
maorz1998 Sep 16, 2023
e154233
Merge pull request #345 from maorz1998/GPU-dev
maorz1998 Sep 16, 2023
d0af72e
Fixed some bugs and successfully ran the Sydney flame case
maorz1998 Sep 17, 2023
fe15ecd
Merge pull request #346 from maorz1998/GPU-dev
maorz1998 Sep 17, 2023
a55f4c0
fix cudaMemsetAsync: invalid cudaStream created before cudaSetDevice
STwangyingrui Sep 18, 2023
d03c244
fix cudaErrorCudartUnloading when cudaStreamDestroy: clean cuda resou…
STwangyingrui Sep 18, 2023
eb9fbe0
Merge pull request #347 from STwangyingrui/yr/GPU_par_230918
maorz1998 Sep 20, 2023
b4f5790
correct scalar processor boundary
maorz1998 Sep 22, 2023
80b922a
Merge pull request #351 from maorz1998/GPU-dev
maorz1998 Sep 22, 2023
643f5ac
construct rho & U matrix in parallel
maorz1998 Sep 26, 2023
af6b647
Merge pull request #355 from maorz1998/GPU-dev
maorz1998 Sep 26, 2023
a583f6c
construct matrix p in parallel
maorz1998 Oct 2, 2023
bbc0527
Merge pull request #357 from maorz1998/GPU-dev
maorz1998 Oct 2, 2023
6c8b875
construct matrix y in parallel
STwangyingrui Oct 8, 2023
cdba4df
Merge pull request #358 from STwangyingrui/yr/parallel_y
maorz1998 Oct 9, 2023
45ce77f
parallel linear solver
maorz1998 Oct 9, 2023
25ae2e3
Merge branch 'GPU-dev' of github.com:maorz1998/deepflame-dev into GPU…
maorz1998 Oct 9, 2023
860ba31
Merge branch 'GPU-dev' into GPU-dev
maorz1998 Oct 9, 2023
d21542e
Merge pull request #359 from maorz1998/GPU-dev
maorz1998 Oct 9, 2023
226fd5e
initial commit of GPUThermo, EEqn boundary
maorz1998 Oct 10, 2023
c5f5cfe
Merge pull request #360 from maorz1998/GPU-dev
maorz1998 Oct 10, 2023
c9ed65b
pass ldu checking of parallel e; printinfo supports rank;
STwangyingrui Oct 11, 2023
c1c2f66
Adaptation of GPUThermo to deepflame, completion of UEqn and YEqn, wo…
maorz1998 Oct 12, 2023
6036caf
Merge pull request #361 from STwangyingrui/yr/parallel_e
maorz1998 Oct 12, 2023
084f33d
Merge branch 'GPU-dev' into GPU-dev
maorz1998 Oct 12, 2023
3c1746c
Merge pull request #362 from maorz1998/GPU-dev
maorz1998 Oct 12, 2023
b85246b
thermal file for GPUThermo class
maorz1998 Oct 13, 2023
b155e04
Merge pull request #365 from maorz1998/GPU-dev
maorz1998 Oct 13, 2023
17bf0b5
fix bugs, Enable GPU computation across time steps for the entire pro…
maorz1998 Oct 15, 2023
46f8d02
Merge pull request #366 from maorz1998/GPU-dev
maorz1998 Oct 15, 2023
14d3e84
fix a bug in dpdt
maorz1998 Oct 15, 2023
6683bab
Merge pull request #367 from maorz1998/GPU-dev
maorz1998 Oct 15, 2023
8a39382
fix the crash of graph destroy
STwangyingrui Oct 16, 2023
c07a79a
preliminary code organization
maorz1998 Oct 16, 2023
95ef107
Merge pull request #369 from STwangyingrui/yr/fix_graph_destroy
maorz1998 Oct 16, 2023
3e4e8ff
Merge branch 'GPU-dev' into GPU-dev
maorz1998 Oct 16, 2023
659f21f
Merge pull request #371 from maorz1998/GPU-dev
maorz1998 Oct 16, 2023
dada674
single card test pass
maorz1998 Oct 17, 2023
472b704
Merge pull request #372 from maorz1998/GPU-dev
maorz1998 Oct 17, 2023
1e6ed1b
run parallel TGV case
maorz1998 Oct 18, 2023
77e8747
run pass BC fixedEnergy
maorz1998 Oct 18, 2023
d33f091
Merge pull request #374 from maorz1998/GPU-dev
maorz1998 Oct 18, 2023
a175fc8
fix a bug in solving YEqn
maorz1998 Oct 19, 2023
29c6059
refactor for timing
STwangyingrui Oct 19, 2023
7a89595
run pass cyclic on one card
maorz1998 Oct 21, 2023
5b80713
refine time statistic
STwangyingrui Oct 23, 2023
8cd3cc4
run pass cyclic BC in parallel
maorz1998 Oct 23, 2023
5d7e0ef
Merge pull request #377 from maorz1998/GPU-dev
maorz1998 Oct 24, 2023
74dcf0c
refactor macros; fix invalid config for empty patches;
STwangyingrui Oct 24, 2023
4496dd5
use stream allocator in pEqn and thermo
STwangyingrui Oct 24, 2023
7ef4ce7
reuse amgx solvers
STwangyingrui Oct 24, 2023
02c5070
merge GPU-dev and fix conflicts; known issue: illegal handle of stream
STwangyingrui Oct 25, 2023
01ac9fe
fix sereral bugs and align multi-card multi-step result
STwangyingrui Oct 25, 2023
80d8f0e
Merge pull request #378 from STwangyingrui/yr/to_merge_1024
maorz1998 Oct 25, 2023
96f462f
smagrinsky model
maorz1998 Oct 25, 2023
0fd37b8
fix conflicts
maorz1998 Oct 25, 2023
0c20745
add dfChemistrySolver, use libtorch do inference
maorz1998 Oct 25, 2023
15c2576
Merge pull request #380 from maorz1998/GPU-dev
maorz1998 Oct 26, 2023
084bccb
pass libtorch version
maorz1998 Oct 27, 2023
00d3974
add batch size
maorz1998 Oct 28, 2023
e923fd5
Merge pull request #381 from maorz1998/GPU-dev
maorz1998 Oct 28, 2023
f78f459
run pass turbulence k & epsilon
maorz1998 Oct 30, 2023
4355f4d
reset amgx solver to avoid memleak
STwangyingrui Oct 31, 2023
ed15f81
checking memory usage info after each timestep
STwangyingrui Oct 31, 2023
03c7338
support dynimic NN input size
maorz1998 Nov 7, 2023
24d15bb
mixture averaged
maorz1998 Nov 7, 2023
9375fb1
fix bugs in pEqn & UEqn
maorz1998 Nov 7, 2023
7a2c924
limitedLinear debuging
maorz1998 Nov 7, 2023
d916ce7
Merge pull request #382 from STwangyingrui/yr/see_amgx
maorz1998 Nov 7, 2023
94c9f4b
Merge pull request #385 from maorz1998/GPU-dev
maorz1998 Nov 7, 2023
d9cab14
optimize calculate_viscosity_kernel
STwangyingrui Nov 14, 2023
342ddb0
Merge pull request #387 from STwangyingrui/yr/opt_thermo
maorz1998 Nov 16, 2023
c831fb2
fix bugs in 1223
maorz1998 Dec 23, 2023
f43d51f
Merge pull request #401 from maorz1998/GPU-dev
maorz1998 Dec 23, 2023
fb8c229
remove files
maorz1998 Dec 23, 2023
362d74d
Merge pull request #402 from maorz1998/GPU-dev
maorz1998 Dec 23, 2023
4ec0c1e
Merge remote-tracking branch 'origin/GPU-dev'
maorz1998 Dec 23, 2023
f320780
fix bugs in createfields
maorz1998 Dec 23, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
137 changes: 119 additions & 18 deletions applications/solvers/dfLowMachFoam/EEqn.H
Original file line number Diff line number Diff line change
@@ -1,36 +1,137 @@
{
volScalarField& he = thermo.he();

#if defined GPUSolverNew_
double *h_he = dfDataBase.getFieldPointer("he", location::cpu, position::internal);
double *h_boundary_he = dfDataBase.getFieldPointer("he", location::cpu, position::boundary);

EEqn_GPU.process();
EEqn_GPU.sync();
// EEqn_GPU.postProcess(h_he, h_boundary_he);

// copy h_he to he(cpu)
// memcpy(&he[0], h_he, dfDataBase.cell_value_bytes);

//DEBUG_TRACE;
//he.correctBoundaryConditions();
//DEBUG_TRACE;

#if defined DEBUG_
fvScalarMatrix EEqn
(

fvm::ddt(rho, he) + mvConvection->fvmDiv(phi, he)
+ fvc::ddt(rho, K) + fvc::div(phi, K)
- dpdt
==
(
turbName == "laminar"
?
(
fvm::laplacian(turbulence->alpha(), he)
- diffAlphaD
+ fvc::div(hDiffCorrFlux)
)
:
(
fvm::laplacian(turbulence->alphaEff(), he)
)
)
);
// EEqn.relax();
EEqn.solve("ha");
// checkResult
// TODO: for temp, now we compare ldu, finally we compare csr
std::vector<double> h_internal_coeffs(dfDataBase.num_boundary_surfaces);
std::vector<double> h_boundary_coeffs(dfDataBase.num_boundary_surfaces);

offset = 0;
forAll(he.boundaryField(), patchi)
{
const fvPatchScalarField& patchHe = he.boundaryField()[patchi];
int patchSize = patchHe.size();
const double* internal_coeff_ptr = &EEqn.internalCoeffs()[patchi][0];
const double* boundary_coeff_ptr = &EEqn.boundaryCoeffs()[patchi][0];
if (patchHe.type() == "processor"
|| patchHe.type() == "processorCyclic") {
memcpy(h_internal_coeffs.data() + offset, internal_coeff_ptr, patchSize * sizeof(double));
memset(h_internal_coeffs.data() + offset + patchSize, 0, patchSize * sizeof(double));
memcpy(h_boundary_coeffs.data() + offset, boundary_coeff_ptr, patchSize * sizeof(double));
memset(h_boundary_coeffs.data() + offset + patchSize, 0, patchSize * sizeof(double));
offset += patchSize * 2;
} else {
memcpy(h_internal_coeffs.data() + offset, internal_coeff_ptr, patchSize * sizeof(double));
memcpy(h_boundary_coeffs.data() + offset, boundary_coeff_ptr, patchSize * sizeof(double));
offset += patchSize;
}
}

double *h_boundary_he_tmp = new double[dfDataBase.num_boundary_surfaces];
offset = 0;
forAll(he.boundaryField(), patchi)
{
const fvPatchScalarField& patchHe = he.boundaryField()[patchi];
int patchSize = patchHe.size();
if (patchHe.type() == "processor"
|| patchHe.type() == "processorCyclic") {
const scalarField& patchHeInternal = dynamic_cast<const processorFvPatchField<scalar>&>(patchHe).patchInternalField()();
memcpy(h_boundary_he_tmp + offset, &patchHe[0], patchSize * sizeof(double));
memcpy(h_boundary_he_tmp + offset + patchSize, &patchHeInternal[0], patchSize * sizeof(double));
offset += patchSize * 2;
} else {
memcpy(h_boundary_he_tmp + offset, &patchHe[0], patchSize * sizeof(double));
offset += patchSize;
}
}

bool printFlag = false;
int rank = -1;
if (mpi_init_flag) {
MPI_Comm_rank(MPI_COMM_WORLD, &rank);
}
if (!mpi_init_flag || rank == 0) {
// DEBUG_TRACE;
// EEqn_GPU.compareResult(&EEqn.lower()[0], &EEqn.upper()[0], &EEqn.diag()[0], &EEqn.source()[0],
// h_internal_coeffs.data(), h_boundary_coeffs.data(), printFlag);
// DEBUG_TRACE;
// EEqn_GPU.compareHe(&he[0], h_boundary_he_tmp, printFlag);
}

delete h_boundary_he_tmp;

#endif

#else
start1 = std::clock();
fvScalarMatrix EEqn
(

fvm::ddt(rho, he) + mvConvection->fvmDiv(phi, he)
+ fvc::ddt(rho, K) + fvc::div(phi, K)
- dpdt
==
fvm::ddt(rho, he) + mvConvection->fvmDiv(phi, he)
+ fvc::ddt(rho, K) + fvc::div(phi, K)
- dpdt
==
(
turbName == "laminar"
?
(
fvm::laplacian(turbulence->alpha(), he)
- diffAlphaD
+ fvc::div(hDiffCorrFlux)
)
:
(
turbName == "laminar"
?
(
fvm::laplacian(turbulence->alpha(), he)
- diffAlphaD
+ fvc::div(hDiffCorrFlux)
)
:
(
fvm::laplacian(turbulence->alphaEff(), he)
)
fvm::laplacian(turbulence->alphaEff(), he)
)
);
)
);
end1 = std::clock();
time_monitor_EEqn += double(end1 - start1) / double(CLOCKS_PER_SEC);
time_monitor_EEqn_mtxAssembly += double(end1 - start1) / double(CLOCKS_PER_SEC);

EEqn.relax();
// EEqn.relax();
start1 = std::clock();
EEqn.solve("ha");
end1 = std::clock();
time_monitor_EEqn += double(end1 - start1) / double(CLOCKS_PER_SEC);
time_monitor_EEqn_solve += double(end1 - start1) / double(CLOCKS_PER_SEC);
#endif
}
8 changes: 4 additions & 4 deletions applications/solvers/dfLowMachFoam/Make/options
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,6 @@ EXE_INC = -std=c++14 \
$(PFLAGS) $(PINC) \
$(if $(LIBTORCH_ROOT),-DUSE_LIBTORCH,) \
$(if $(PYTHON_INC_DIR),-DUSE_PYTORCH,) \
$(if $(AMGX_DIR),-DGPUSolver_,) \
-I$(LIB_SRC)/transportModels/compressible/lnInclude \
-I$(LIB_SRC)/thermophysicalModels/basic/lnInclude \
-I$(LIB_SRC)/TurbulenceModels/turbulenceModels/lnInclude \
Expand All @@ -28,7 +27,7 @@ EXE_INC = -std=c++14 \
$(if $(LIBTORCH_ROOT),-I$(LIBTORCH_ROOT)/include/torch/csrc/api/include,) \
$(PYTHON_INC_DIR) \
$(if $(AMGX_DIR), -I$(DF_ROOT)/src_gpu,) \
$(if $(AMGX_DIR), -I/usr/local/cuda-11.6/include,) \
$(if $(AMGX_DIR), -I/usr/local/cuda/include,) \
$(if $(AMGX_DIR), -I$(AMGX_DIR)/include,)

EXE_LIBS = \
Expand All @@ -50,6 +49,7 @@ EXE_LIBS = \
$(if $(LIBTORCH_ROOT),-lpthread,) \
$(if $(LIBTORCH_ROOT),$(DF_SRC)/dfChemistryModel/DNNInferencer/build/libDNNInferencer.so,) \
$(if $(PYTHON_LIB_DIR),$(PYTHON_LIB_DIR),) \
$(if $(AMGX_DIR), /usr/local/cuda-11.6/lib64/libcudart.so,) \
$(if $(AMGX_DIR), /usr/local/cuda/lib64/libcudart.so,) \
$(if $(AMGX_DIR), /usr/local/cuda/lib64/libnccl.so,) \
$(if $(AMGX_DIR), $(DF_ROOT)/src_gpu/build/libdfMatrix.so,) \
$(if $(AMGX_DIR), $(AMGX_DIR)/build/libamgxsh.so,)
$(if $(AMGX_DIR), $(AMGX_DIR)/build/libamgxsh.so,)
166 changes: 145 additions & 21 deletions applications/solvers/dfLowMachFoam/UEqn.H
Original file line number Diff line number Diff line change
@@ -1,24 +1,148 @@
start1 = std::clock();
tmp<fvVectorMatrix> tUEqn
(
fvm::ddt(rho, U) + fvm::div(phi, U)
+ turbulence->divDevRhoReff(U)
);
fvVectorMatrix& UEqn = tUEqn.ref();

end1 = std::clock();
time_monitor_UEqn += double(end1 - start1) / double(CLOCKS_PER_SEC);
time_monitor_UEqn_mtxAssembly += double(end1 - start1) / double(CLOCKS_PER_SEC);

UEqn.relax();
start1 = std::clock();
if (pimple.momentumPredictor())
{
solve(UEqn == -fvc::grad(p));
// Solve the Momentum equation
#ifdef GPUSolverNew_

#if defined DEBUG_
// run CPU, for temp
TICK_START;
tmp<fvVectorMatrix> tUEqn
(
fvm::ddt(rho, U)
+
fvm::div(phi, U)
+
turbulence->divDevRhoReff(U)
);
fvVectorMatrix& UEqn = tUEqn.ref();
TICK_STOP(CPU assembly time);

volTensorField gradU = fvc::grad(U);

double *h_boundary_gradU = new double[dfDataBase.num_boundary_surfaces * 9];
offset = 0;
forAll(U.boundaryField(), patchi)
{
const fvPatchTensorField& patchGradU = gradU.boundaryField()[patchi];
int patchsize = patchGradU.size();
if (patchGradU.type() == "processor"
|| patchGradU.type() == "processorCyclic") {
// print info
if (dynamic_cast<const processorFvPatchField<tensor>&>(patchGradU).doTransform()) {
Info << "gradU transform = true" << endl;
} else {
Info << "gradU transform = false" << endl;
}
Info << "rank = " << dynamic_cast<const processorFvPatchField<tensor>&>(patchGradU).rank() << endl;

memcpy(h_boundary_gradU + 9*offset, &patchGradU[0][0], patchsize * 9 * sizeof(double));
tensorField patchGradUInternal =
dynamic_cast<const processorFvPatchField<tensor>&>(patchGradU).patchInternalField()();
memcpy(h_boundary_gradU + 9*offset + patchsize * 9, &patchGradUInternal[0][0], patchsize * 9 * sizeof(double));
offset += patchsize * 2;
} else {
memcpy(h_boundary_gradU + 9*offset, &patchGradU[0][0], patchsize * 9 * sizeof(double));
offset += patchsize;
}
}
#endif

// process
TICK_START;
UEqn_GPU.process();
UEqn_GPU.sync();
TICK_STOP(GPU process time);

// postProcess
// TICK_START;
// UEqn_GPU.postProcess(h_u);
// memcpy(&U[0][0], h_u, dfDataBase.cell_value_vec_bytes);
// U.correctBoundaryConditions();
// K = 0.5*magSqr(U);
// DEBUG_TRACE;
// TICK_STOP(post process time);

#if defined DEBUG_
// UEqn.relax();
TICK_START;
solve(UEqn == -fvc::grad(p));
K.oldTime();
K = 0.5*magSqr(U);
}
end1 = std::clock();
time_monitor_UEqn += double(end1 - start1) / double(CLOCKS_PER_SEC);
time_monitor_UEqn_solve += double(end1 - start1) / double(CLOCKS_PER_SEC);
TICK_STOP(CPU solve time);
// checkResult
// TODO: for temp, now we compare ldu, finally we compare csr
std::vector<double> h_internal_coeffs(dfDataBase.num_boundary_surfaces * 3);
std::vector<double> h_boundary_coeffs(dfDataBase.num_boundary_surfaces * 3);

offset = 0;
for (int patchi = 0; patchi < dfDataBase.num_patches; patchi++)
{
const fvPatchVectorField& patchU = U.boundaryField()[patchi];
int patchsize = dfDataBase.patch_size[patchi];
const double* internal_coeff_ptr = &UEqn.internalCoeffs()[patchi][0][0];
const double* boundary_coeff_ptr = &UEqn.boundaryCoeffs()[patchi][0][0];
memcpy(h_internal_coeffs.data() + offset * 3, internal_coeff_ptr, patchsize * 3 * sizeof(double));
memcpy(h_boundary_coeffs.data() + offset * 3, boundary_coeff_ptr, patchsize * 3 * sizeof(double));
if (patchU.type() == "processor" || patchU.type() == "processorCyclic") offset += 2 * patchsize;
else offset += patchsize;
}

double *h_boundary_u_tmp = new double[dfDataBase.num_boundary_surfaces * 3];
offset = 0;
forAll(U.boundaryField(), patchi)
{
const fvPatchVectorField& patchU = U.boundaryField()[patchi];
int patchsize = dfDataBase.patch_size[patchi];

if (patchU.type() == "processor"
|| patchU.type() == "processorCyclic") {
memcpy(h_boundary_u_tmp + 3*offset, &patchU[0][0], 3*patchsize * sizeof(double));
vectorField patchUInternal =
dynamic_cast<const processorFvPatchField<vector>&>(patchU).patchInternalField()();
memcpy(h_boundary_u_tmp + 3*offset + 3*patchsize, &patchUInternal[0][0], 3*patchsize * sizeof(double));
offset += 2 * patchsize;
} else {
memcpy(h_boundary_u_tmp + 3*offset, &patchU[0][0], 3*patchsize * sizeof(double));
offset += patchsize;
}
}

bool printFlag = false;

int rank = -1;
if (mpi_init_flag) {
MPI_Comm_rank(MPI_COMM_WORLD, &rank);
}

if (!mpi_init_flag || rank == 0) {
// UEqn_GPU.compareResult(&UEqn.lower()[0], &UEqn.upper()[0], &UEqn.diag()[0], &UEqn.source()[0][0],
// h_internal_coeffs.data(), h_boundary_coeffs.data(),
// // &gradU[0][0], h_boundary_gradU,
// printFlag);
// UEqn_GPU.compareU(&U[0][0], h_boundary_u_tmp, printFlag);
}
DEBUG_TRACE;
#endif

#else
start1 = std::clock();
tmp<fvVectorMatrix> tUEqn
(
fvm::ddt(rho, U) + fvm::div(phi, U)
+ turbulence->divDevRhoReff(U)
);
fvVectorMatrix& UEqn = tUEqn.ref();
end1 = std::clock();
time_monitor_UEqn += double(end1 - start1) / double(CLOCKS_PER_SEC);
time_monitor_UEqn_mtxAssembly += double(end1 - start1) / double(CLOCKS_PER_SEC);

// UEqn.relax();
start1 = std::clock();
if (pimple.momentumPredictor())
{
solve(UEqn == -fvc::grad(p));
K.oldTime();
K = 0.5*magSqr(U);
}
end1 = std::clock();
time_monitor_UEqn += double(end1 - start1) / double(CLOCKS_PER_SEC);
time_monitor_UEqn_solve += double(end1 - start1) / double(CLOCKS_PER_SEC);
#endif
Loading