forked from dealias/fftwpp
-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathREADME
251 lines (183 loc) · 8.88 KB
/
README
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
FFTW++: Fast Fourier Transform C++ Header/MPI Transpose for FFTW3 Library
Copyright (C) 2004-17 John C. Bowman and Malcolm Roberts, University of Alberta
http://fftwpp.sourceforge.net
FFTW++ is a C++ header/MPI transpose for Version 3 of the highly optimized FFTW
(http://www.fftw.org) Fourier Transform library. It provides a simple
interface for 1D, 2D, and 3D complex-to-complex, real-to-complex, and
complex-to-real Fast Fourier Transforms and convolutions. It takes
care of the technical aspects of memory allocation, alignment, planning,
wisdom, and communication on both serial and parallel (OpenMP/MPI)
architectures. Wrappers for multiple 1D transforms are also provided. As
with the FFTW3 library itself, both in-place and out-of-place transforms of
arbitrary size are supported.
Implicit dealiasing of standard and centered Hermitian convolutions is
also implemented; in 2D and 3D implicit zero-padding substantially
reduces memory usage and computation time. For more information, see
"Efficient Dealiased Convolutions without Padding," by
John C. Bowman and Malcolm Roberts, SIAM Journal on Scientific
Computing, 33:1, 386-406 (2011).
http://www.math.ualberta.ca/~bowman/publications/dealias.pdf
"Parallel Implicitly Dealiased Convolutions on Shared Memory Architectures,"
by Malcolm Roberts and John C. Bowman, submitted to Journal of Scientific
Computing, (2017).
http://www.math.ualberta.ca/~bowman/publications/dealias2.pdf
Convenient optional shift routines that place the Fourier origin in the logical
center of the domain are provided for centered complex-to-real transforms
in 2D and 3D; see fftw++.h for details.
FFTW++ supports multithreaded transforms and convolutions.
The global variable fftw::maxthreads specifies the maximum number of threads
to use. The constructors invoke a short timing test to check that using
multiple threads is actually beneficial for the given problem size.
Multithreading requires linking with a multithreaded FFTW implementation
and can be disabled by adding -DFFTWPP_SINGLE_THREAD to CFLAGS.
FFTW++ can also exploit the high-performance Array class available at
http://www.math.ualberta.ca/~bowman/Array (version 1.49 or higher),
designed for scientific computing. The arrays in that package do
memory bounds checking in debugging mode, but can be optimized by
specifying the -DNDEBUG compilation option (1D arrays optimize
completely to pointer operations).
Detailed documentation is provided before each class in the fftw++.h
header file. The included examples illustrate how easy it is to use
FFTW in C++ with the FFTW++ header class. Use of the Array class is
optional, but encouraged. If for some reason the Array class is not
used, memory should be allocated with ComplexAlign (or doubleAlign) to
ensure that the data is optimally aligned to sizeof(Complex), to
enable the SIMD extensions. The optional alignment check in fftw++.h
can be disabled with the -DNO_CHECK_ALIGN compiler option.
########################## MPI ##########################
Hybrid OpenMP/MPI versions of the convolution routines in 2 and 3
dimensions are available in the mpi/ directory. Parallelization is
accomplished using the adaptive hybrid OpenMP/MPI transpose routine
described in "Adaptive Matrix Transpose Algorithms for Distributed
Multicore Processors", John C. Bowman and Malcolm Roberts,
Interdisciplinary Topics in Applied Mathematics, Modeling and Computational
Science, Springer Proceedings in Mathematics & Statistics 117,
97-103 (2015):
http://www.math.ualberta.ca/~bowman/publications/transpose.pdf
Either a 1D ("slab") and 2D ("pencil") data decomposition is used
for the three-dimensional convolutions, depending on the number of processors.
mpi/fftw/ contains comparison code using FFTW's parallel MPI transform
and explicit padding.
########################## Examples ##########################
The following programs are provided in the examples directory:
1D examples using ComplexAlign allocator:
example0.cc
example0r.cc
1D examples using Array class:
example1.cc
example1r.cc
2D examples using Array class:
example2.cc
example2r.cc
3D examples using Array class:
example3.cc
example3r.cc
Examples of implicitly dealiased convolutions on complex non-centered
data in 1, 2, and 3 dimensions:
examplecconv.cc, examplecconv2.cc, examplecconv3.cc
Examples of implicitly dealiased convolutions on complex
Hermitian-symmetric centered data in 1, 2, and 3 dimensions:
exampleconv.cc, exampleconv2.cc, exampleconv3.cc
Local transpose (in-place or out-of-place):
exampletranspose.cc
More general types of convolutions (for example, autoconvolutions)
can be performed by defining a custom multiplier or realmultiplier
function pointer.
########################## Wrappers ##########################
Wrappers for the convolution routines are available for C, Fortran,
and Python. Examples are given in the wrappers/ directory. The C
wrapper may be found in cfftw++.h and cfftw++.cc, the Fortran wrapper
in fftwpp.f90, and the Python wrapper in fftwpp.py. A unit-testing
script, test.py, is also available. Results for the given input data
are checked with a simple hash.
Compilation uses the environment variables CPLUS_INCLUDE_PATH to tell
the compiler where to find fftw3.h, and FORTRAN_INCLUDE_PATH to
indicate to the compiler the location of fftw3.f03 from FFTW.
The following programs are available in the wrappers directory:
Using C to call multi-threaded 1D, 2D, and 3D binary convolutions and
1D and 2D ternary convolutions, with and without passing work arrays,
where the operation in physical space may correspond to either a
scalar multiplication (M=1) or a dot product (M > 1):
cexample.c
Using Fortran to call multi-threaded 1D, 2D, and 3D binary
convolutions, with and without passing work arrays, where the
operation in physical space may correspond to either a scalar
multiplication (mm=1) or a dot product (mm > 1):
fexample.f90
Using Python to call multi-threaded 1D, 2D, and 3D binary convolutions
(for scalar multiplication (M=1) and with work arrays created by the
constructor):
pexample.py
########################## MPI ##########################
Hybrid OpenMP/MPI versions of the convolution routines in 2 and 3
dimensions are available in the mpi directory.
cconv2.cc and cconv3.cc are examples of two- and three-dimensional
complex non-centered convolutions.
conv2.cc and conv3.cc are examples of two- and three-dimensional
Hermitian-symmetric complex centered convolutions.
fft2.cc and fft2r are examples of two-dimensional hybrid MPI/OpenMP FFTs
using a 1D data decomposition, for complex and real data, respectively.
fft3.cc and fft3r are examples of three-dimensional hybrid MPI/OpenMP FFTs
using a 1D (slab) or 2D (pencil) data decomposition (depending on the
number of MPI processes), for complex and real data, respectively.
timing.py is a script which performs timing tests for mpi-based
convolutions.
The directory mpi/explicit contains comparison code using FFTW's parallel
MPI transform and explicit padding.
########################## Test Programs ##########################
The following programs are provided in tests/, along with various
timing and error analysis scripts. Asymptote
(http://asymptote.sourceforge.net/) scripts are provided for
visualizing the output.
1D complex convolution test:
cconv.cc
1D Hermitian convolution test:
conv.cc
1D Hermitian ternary convolution test:
tconv.cc
2D complex convolution test:
cconv2.cc
2D Hermitian convolution test:
conv2.cc
2D Hermitian ternary convolution test:
tconv2.cc
3D complex convolution test:
cconv3.cc
3D Hermitian convolution test:
conv3.cc
1D FFT:
fft1.cc
1D real FFT:
fft1r.cc
1D multiple FFT:
mfft1.cc
1D multiple real FFT:
mfft1r.cc
2D FFT:
fft2.cc
2D real FFT:
fft2r.cc
3D FFT:
fft3.cc
3D real FFT:
fft3r.cc
######################## Availability and License ########################
To compile from Git developmental source code:
git clone https://github.com/dealias/fftwpp
All source files in the FFTW++ project, unless explicitly noted otherwise,
are released under version 3 (or later) of the GNU Lesser General Public
License (see the files LICENSE.LESSER and LICENSE in the top-level source
directory).
========================================================================
This program is free software; you can redistribute it and/or modify
it under the terms of the GNU Lesser General Public License as published by
the Free Software Foundation; either version 3 of the License, or
(at your option) any later version.
This program is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
GNU Lesser General Public License for more details.
You should have received a copy of the GNU Lesser General Public License
along with this program; if not, write to the Free Software
Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA.
========================================================================