diff --git a/doc/pub/week9/html/._week9-bs000.html b/doc/pub/week9/html/._week9-bs000.html index 0ee9f022..eb84da37 100644 --- a/doc/pub/week9/html/._week9-bs000.html +++ b/doc/pub/week9/html/._week9-bs000.html @@ -79,11 +79,388 @@ 2, None, 'blocking-transformations-final-expressions'), + ('More on the blocking method', + 2, + None, + 'more-on-the-blocking-method'), ('Example code form last week', 2, None, 'example-code-form-last-week'), - ('Resampling analysis', 2, None, 'resampling-analysis')]} + ('Resampling analysis', 2, None, 'resampling-analysis'), + ('Content', 2, None, 'content'), + ('Optimization and profiling', + 2, + None, + 'optimization-and-profiling'), + ('More on optimization', 2, None, 'more-on-optimization'), + ('Optimization and profiling', + 2, + None, + 'optimization-and-profiling'), + ('Optimization and debugging', + 2, + None, + 'optimization-and-debugging'), + ('Other hints', 2, None, 'other-hints'), + ('Vectorization and the basic idea behind parallel computing', + 2, + None, + 'vectorization-and-the-basic-idea-behind-parallel-computing'), + ('A rough classification of hardware models', + 2, + None, + 'a-rough-classification-of-hardware-models'), + ('Shared memory and distributed memory', + 2, + None, + 'shared-memory-and-distributed-memory'), + ('Different parallel programming paradigms', + 2, + None, + 'different-parallel-programming-paradigms'), + ('Different parallel programming paradigms', + 2, + None, + 'different-parallel-programming-paradigms'), + ('What is vectorization?', 2, None, 'what-is-vectorization'), + ('Number of elements that can acted upon', + 2, + None, + 'number-of-elements-that-can-acted-upon'), + ('Number of elements that can acted upon, examples', + 2, + None, + 'number-of-elements-that-can-acted-upon-examples'), + ('Operation counts for scalar operation', + 2, + None, + 'operation-counts-for-scalar-operation'), + ('Number of elements that can acted upon, examples', + 2, + None, + 'number-of-elements-that-can-acted-upon-examples'), + ('Number of operations when vectorized', + 2, + None, + 'number-of-operations-when-vectorized'), + ('"A simple test case with and without ' + 'vectorization":"https://github.com/CompPhysics/ComputationalPhysicsMSU/blob/master/doc/Programs/LecturePrograms/programs/Classes/cpp/program7.cpp"', + 2, + None, + 'a-simple-test-case-with-and-without-vectorization-https-github-com-compphysics-computationalphysicsmsu-blob-master-doc-programs-lectureprograms-programs-classes-cpp-program7-cpp'), + ('Compiling with and without vectorization', + 2, + None, + 'compiling-with-and-without-vectorization'), + ('Compiling with and without vectorization using clang', + 2, + None, + 'compiling-with-and-without-vectorization-using-clang'), + ('Automatic vectorization and vectorization inhibitors, criteria', + 2, + None, + 'automatic-vectorization-and-vectorization-inhibitors-criteria'), + ('Automatic vectorization and vectorization inhibitors, exit ' + 'criteria', + 2, + None, + 'automatic-vectorization-and-vectorization-inhibitors-exit-criteria'), + ('Automatic vectorization and vectorization inhibitors, ' + 'straight-line code', + 2, + None, + 'automatic-vectorization-and-vectorization-inhibitors-straight-line-code'), + ('Automatic vectorization and vectorization inhibitors, nested ' + 'loops', + 2, + None, + 'automatic-vectorization-and-vectorization-inhibitors-nested-loops'), + ('Automatic vectorization and vectorization inhibitors, function ' + 'calls', + 2, + None, + 'automatic-vectorization-and-vectorization-inhibitors-function-calls'), + ('Automatic vectorization and vectorization inhibitors, data ' + 'dependencies', + 2, + None, + 'automatic-vectorization-and-vectorization-inhibitors-data-dependencies'), + ('Automatic vectorization and vectorization inhibitors, more ' + 'data dependencies', + 2, + None, + 'automatic-vectorization-and-vectorization-inhibitors-more-data-dependencies'), + ('Automatic vectorization and vectorization inhibitors, memory ' + 'stride', + 2, + None, + 'automatic-vectorization-and-vectorization-inhibitors-memory-stride'), + ('Memory management', 2, None, 'memory-management'), + ('Memory and communication', 2, None, 'memory-and-communication'), + ('Measuring performance', 2, None, 'measuring-performance'), + ('Problems with measuring time', + 2, + None, + 'problems-with-measuring-time'), + ('Problems with cold start', 2, None, 'problems-with-cold-start'), + ('Problems with smart compilers', + 2, + None, + 'problems-with-smart-compilers'), + ('Problems with interference', + 2, + None, + 'problems-with-interference'), + ('Problems with measuring performance', + 2, + None, + 'problems-with-measuring-performance'), + ('Thomas algorithm for tridiagonal linear algebra equations', + 2, + None, + 'thomas-algorithm-for-tridiagonal-linear-algebra-equations'), + ('Thomas algorithm, forward substitution', + 2, + None, + 'thomas-algorithm-forward-substitution'), + ('Thomas algorithm, backward substitution', + 2, + None, + 'thomas-algorithm-backward-substitution'), + ('Thomas algorithm and counting of operations (floating point ' + 'and memory)', + 2, + None, + 'thomas-algorithm-and-counting-of-operations-floating-point-and-memory'), + ('"Example: Transpose of a ' + 'matrix":"https://github.com/CompPhysics/ComputationalPhysicsMSU/blob/master/doc/Programs/LecturePrograms/programs/Classes/cpp/program8.cpp"', + 2, + None, + 'example-transpose-of-a-matrix-https-github-com-compphysics-computationalphysicsmsu-blob-master-doc-programs-lectureprograms-programs-classes-cpp-program8-cpp'), + ('"Matrix-matrix ' + 'multiplication":"https://github.com/CompPhysics/ComputationalPhysicsMSU/blob/master/doc/Programs/LecturePrograms/programs/Classes/cpp/program9.cpp"', + 2, + None, + 'matrix-matrix-multiplication-https-github-com-compphysics-computationalphysicsmsu-blob-master-doc-programs-lectureprograms-programs-classes-cpp-program9-cpp'), + ('How do we define speedup? Simplest form', + 2, + None, + 'how-do-we-define-speedup-simplest-form'), + ('How do we define speedup? Correct baseline', + 2, + None, + 'how-do-we-define-speedup-correct-baseline'), + ('Parallel speedup', 2, None, 'parallel-speedup'), + ('Speedup and memory', 2, None, 'speedup-and-memory'), + ('Upper bounds on speedup', 2, None, 'upper-bounds-on-speedup'), + ("Amdahl's law", 2, None, 'amdahl-s-law'), + ('How much is parallelizable', + 2, + None, + 'how-much-is-parallelizable'), + ("Today's situation of parallel computing", + 2, + None, + 'today-s-situation-of-parallel-computing'), + ('Overhead present in parallel computing', + 2, + None, + 'overhead-present-in-parallel-computing'), + ('Parallelizing a sequential algorithm', + 2, + None, + 'parallelizing-a-sequential-algorithm'), + ('Strategies', 2, None, 'strategies'), + ('How do I run MPI on a PC/Laptop? MPI', + 2, + None, + 'how-do-i-run-mpi-on-a-pc-laptop-mpi'), + ('Can I do it on my own PC/laptop? OpenMP installation', + 2, + None, + 'can-i-do-it-on-my-own-pc-laptop-openmp-installation'), + ('Installing MPI', 2, None, 'installing-mpi'), + ('Installing MPI and using Qt', + 2, + None, + 'installing-mpi-and-using-qt'), + ('What is Message Passing Interface (MPI)?', + 2, + None, + 'what-is-message-passing-interface-mpi'), + ('Going Parallel with MPI', 2, None, 'going-parallel-with-mpi'), + ('MPI is a library', 2, None, 'mpi-is-a-library'), + ('Bindings to MPI routines', 2, None, 'bindings-to-mpi-routines'), + ('Communicator', 2, None, 'communicator'), + ('Some of the most important MPI functions', + 2, + None, + 'some-of-the-most-important-mpi-functions'), + ('"The first MPI C/C++ ' + 'program":"https://github.com/CompPhysics/ComputationalPhysics2/blob/gh-pages/doc/Programs/LecturePrograms/programs/MPI/chapter07/program2.cpp"', + 2, + None, + 'the-first-mpi-c-c-program-https-github-com-compphysics-computationalphysics2-blob-gh-pages-doc-programs-lectureprograms-programs-mpi-chapter07-program2-cpp'), + ('The Fortran program', 2, None, 'the-fortran-program'), + ('Note 1', 2, None, 'note-1'), + ('"Ordered output with ' + 'MPIBarrier":"https://github.com/CompPhysics/ComputationalPhysics2/blob/gh-pages/doc/Programs/LecturePrograms/programs/MPI/chapter07/program3.cpp"', + 2, + None, + 'ordered-output-with-mpibarrier-https-github-com-compphysics-computationalphysics2-blob-gh-pages-doc-programs-lectureprograms-programs-mpi-chapter07-program3-cpp'), + ('Note 2', 2, None, 'note-2'), + ('"Ordered ' + 'output":"https://github.com/CompPhysics/ComputationalPhysics2/blob/gh-pages/doc/Programs/LecturePrograms/programs/MPI/chapter07/program4.cpp"', + 2, + None, + 'ordered-output-https-github-com-compphysics-computationalphysics2-blob-gh-pages-doc-programs-lectureprograms-programs-mpi-chapter07-program4-cpp'), + ('Note 3', 2, None, 'note-3'), + ('Note 4', 2, None, 'note-4'), + ('"Numerical integration in ' + 'parallel":"https://github.com/CompPhysics/ComputationalPhysics2/blob/gh-pages/doc/Programs/LecturePrograms/programs/MPI/chapter07/program6.cpp"', + 2, + None, + 'numerical-integration-in-parallel-https-github-com-compphysics-computationalphysics2-blob-gh-pages-doc-programs-lectureprograms-programs-mpi-chapter07-program6-cpp'), + ('Dissection of trapezoidal rule with $MPI\\_reduce$', + 2, + None, + 'dissection-of-trapezoidal-rule-with-mpi-reduce'), + ('Dissection of trapezoidal rule', + 2, + None, + 'dissection-of-trapezoidal-rule'), + ('Integrating with _MPI_', 2, None, 'integrating-with-mpi'), + ('How do I use $MPI\\_reduce$?', + 2, + None, + 'how-do-i-use-mpi-reduce'), + ('More on $MPI\\_Reduce$', 2, None, 'more-on-mpi-reduce'), + ('Dissection of trapezoidal rule', + 2, + None, + 'dissection-of-trapezoidal-rule'), + ('Dissection of trapezoidal rule', + 2, + None, + 'dissection-of-trapezoidal-rule'), + ('"The quantum dot program for two ' + 'electrons":"https://github.com/CompPhysics/ComputationalPhysics2/blob/master/doc/Programs/ParallelizationMPI/MPIvmcqdot.cpp"', + 2, + None, + 'the-quantum-dot-program-for-two-electrons-https-github-com-compphysics-computationalphysics2-blob-master-doc-programs-parallelizationmpi-mpivmcqdot-cpp'), + ('What is OpenMP', 2, None, 'what-is-openmp'), + ('Getting started, things to remember', + 2, + None, + 'getting-started-things-to-remember'), + ('OpenMP syntax', 2, None, 'openmp-syntax'), + ('Different OpenMP styles of parallelism', + 2, + None, + 'different-openmp-styles-of-parallelism'), + ('General code structure', 2, None, 'general-code-structure'), + ('Parallel region', 2, None, 'parallel-region'), + ('Hello world, not again, please!', + 2, + None, + 'hello-world-not-again-please'), + ('Hello world, yet another variant', + 2, + None, + 'hello-world-yet-another-variant'), + ('Important OpenMP library routines', + 2, + None, + 'important-openmp-library-routines'), + ('Private variables', 2, None, 'private-variables'), + ('Master region', 2, None, 'master-region'), + ('Parallel for loop', 2, None, 'parallel-for-loop'), + ('Parallel computations and loops', + 2, + None, + 'parallel-computations-and-loops'), + ('Scheduling of loop computations', + 2, + None, + 'scheduling-of-loop-computations'), + ('Example code for loop scheduling', + 2, + None, + 'example-code-for-loop-scheduling'), + ('Example code for loop scheduling, guided instead of dynamic', + 2, + None, + 'example-code-for-loop-scheduling-guided-instead-of-dynamic'), + ('More on Parallel for loop', + 2, + None, + 'more-on-parallel-for-loop'), + ('What can happen with this loop?', + 2, + None, + 'what-can-happen-with-this-loop'), + ('Inner product', 2, None, 'inner-product'), + ('Different threads do different tasks', + 2, + None, + 'different-threads-do-different-tasks'), + ('Single execution', 2, None, 'single-execution'), + ('Coordination and synchronization', + 2, + None, + 'coordination-and-synchronization'), + ('Data scope', 2, None, 'data-scope'), + ('Some remarks', 2, None, 'some-remarks'), + ('Parallelizing nested for-loops', + 2, + None, + 'parallelizing-nested-for-loops'), + ('Nested parallelism', 2, None, 'nested-parallelism'), + ('Parallel tasks', 2, None, 'parallel-tasks'), + ('Common mistakes', 2, None, 'common-mistakes'), + ('Not all computations are simple', + 2, + None, + 'not-all-computations-are-simple'), + ('Not all computations are simple, competing threads', + 2, + None, + 'not-all-computations-are-simple-competing-threads'), + ('How to find the max value using OpenMP', + 2, + None, + 'how-to-find-the-max-value-using-openmp'), + ('Then deal with the race conditions', + 2, + None, + 'then-deal-with-the-race-conditions'), + ('What can slow down OpenMP performance?', + 2, + None, + 'what-can-slow-down-openmp-performance'), + ('What can slow down OpenMP performance?', + 2, + None, + 'what-can-slow-down-openmp-performance'), + ('Find the max location for each thread', + 2, + None, + 'find-the-max-location-for-each-thread'), + ('Combine the values from each thread', + 2, + None, + 'combine-the-values-from-each-thread'), + ('"Matrix-matrix ' + 'multiplication":"https://github.com/CompPhysics/ComputationalPhysicsMSU/blob/master/doc/Programs/ParallelizationOpenMP/OpenMPvectornorm.cpp"', + 2, + None, + 'matrix-matrix-multiplication-https-github-com-compphysics-computationalphysicsmsu-blob-master-doc-programs-parallelizationopenmp-openmpvectornorm-cpp'), + ('"Matrix-matrix ' + 'multiplication":"https://github.com/CompPhysics/ComputationalPhysicsMSU/blob/master/doc/Programs/ParallelizationOpenMP/OpenMPmatrixmatrixmult.cpp"', + 2, + None, + 'matrix-matrix-multiplication-https-github-com-compphysics-computationalphysicsmsu-blob-master-doc-programs-parallelizationopenmp-openmpmatrixmatrixmult-cpp')]} end of tocinfo -->
@@ -137,8 +514,127 @@Note, these notes contain additional material om optimization and parallelization. Parts of this material will be discussed this week.
@@ -193,7 +680,7 @@
The blocking method was made popular by Flyvbjerg and Pedersen (1989) -and has become one of the standard ways to estimate -\( V(\widehat{\theta}) \) for exactly one \( \widehat{\theta} \), namely -\( \widehat{\theta} = \overline{X} \). +and has become one of the standard ways to estimate the variance +\( \mathrm{var}(\widehat{\theta}) \) for exactly one estimator \( \widehat{\theta} \), namely +\( \widehat{\theta} = \overline{X} \), the mean value.
Assume \( n = 2^d \) for some integer \( d>1 \) and \( X_1,X_2,\cdots, X_n \) is a stationary time series to begin with. @@ -191,6 +687,9 @@
Flyvbjerg and Petersen demonstrated that the sequence -\( \{e_k\}_{k=0}^{d-1} \) is decreasing, and conjecture that the term -\( e_k \) can be made as small as we would like by making \( k \) (and hence -\( d \)) sufficiently large. The sequence is decreasing (Master of Science thesis by Marius Jonsson, UiO 2018). -It means we can apply blocking transformations until -\( e_k \) is sufficiently small, and then estimate \( \mathrm{var}(\overline{X}) \) by -\( \widehat{\sigma}^2_k/n_k \). -
- -For an elegant solution and proof of the blocking method, see the recent article of Marius Jonsson (former MSc student of the Computational Physics group).
@@ -196,6 +682,15 @@
-
# 2-electron VMC code for 2dim quantum dot with importance sampling
-# Using gaussian rng for new positions and Metropolis- Hastings
-# Added energy minimization
-from math import exp, sqrt
-from random import random, seed, normalvariate
-import numpy as np
-import matplotlib.pyplot as plt
-from mpl_toolkits.mplot3d import Axes3D
-from matplotlib import cm
-from matplotlib.ticker import LinearLocator, FormatStrFormatter
-from scipy.optimize import minimize
-import sys
-import os
-
-# Where to save data files
-PROJECT_ROOT_DIR = "Results"
-DATA_ID = "Results/EnergyMin"
+More on the blocking method
-if not os.path.exists(PROJECT_ROOT_DIR):
- os.mkdir(PROJECT_ROOT_DIR)
-
-if not os.path.exists(DATA_ID):
- os.makedirs(DATA_ID)
-
-def data_path(dat_id):
- return os.path.join(DATA_ID, dat_id)
-
-outfile = open(data_path("Energies.dat"),'w')
-
-
-# Trial wave function for the 2-electron quantum dot in two dims
-def WaveFunction(r,alpha,beta):
- r1 = r[0,0]**2 + r[0,1]**2
- r2 = r[1,0]**2 + r[1,1]**2
- r12 = sqrt((r[0,0]-r[1,0])**2 + (r[0,1]-r[1,1])**2)
- deno = r12/(1+beta*r12)
- return exp(-0.5*alpha*(r1+r2)+deno)
-
-# Local energy for the 2-electron quantum dot in two dims, using analytical local energy
-def LocalEnergy(r,alpha,beta):
-
- r1 = (r[0,0]**2 + r[0,1]**2)
- r2 = (r[1,0]**2 + r[1,1]**2)
- r12 = sqrt((r[0,0]-r[1,0])**2 + (r[0,1]-r[1,1])**2)
- deno = 1.0/(1+beta*r12)
- deno2 = deno*deno
- return 0.5*(1-alpha*alpha)*(r1 + r2) +2.0*alpha + 1.0/r12+deno2*(alpha*r12-deno2+2*beta*deno-1.0/r12)
-
-# Derivate of wave function ansatz as function of variational parameters
-def DerivativeWFansatz(r,alpha,beta):
-
- WfDer = np.zeros((2), np.double)
- r1 = (r[0,0]**2 + r[0,1]**2)
- r2 = (r[1,0]**2 + r[1,1]**2)
- r12 = sqrt((r[0,0]-r[1,0])**2 + (r[0,1]-r[1,1])**2)
- deno = 1.0/(1+beta*r12)
- deno2 = deno*deno
- WfDer[0] = -0.5*(r1+r2)
- WfDer[1] = -r12*r12*deno2
- return WfDer
-
-# Setting up the quantum force for the two-electron quantum dot, recall that it is a vector
-def QuantumForce(r,alpha,beta):
-
- qforce = np.zeros((NumberParticles,Dimension), np.double)
- r12 = sqrt((r[0,0]-r[1,0])**2 + (r[0,1]-r[1,1])**2)
- deno = 1.0/(1+beta*r12)
- qforce[0,:] = -2*r[0,:]*alpha*(r[0,:]-r[1,:])*deno*deno/r12
- qforce[1,:] = -2*r[1,:]*alpha*(r[1,:]-r[0,:])*deno*deno/r12
- return qforce
-
-
-# Computing the derivative of the energy and the energy
-def EnergyDerivative(x0):
-
-
- # Parameters in the Fokker-Planck simulation of the quantum force
- D = 0.5
- TimeStep = 0.05
- # positions
- PositionOld = np.zeros((NumberParticles,Dimension), np.double)
- PositionNew = np.zeros((NumberParticles,Dimension), np.double)
- # Quantum force
- QuantumForceOld = np.zeros((NumberParticles,Dimension), np.double)
- QuantumForceNew = np.zeros((NumberParticles,Dimension), np.double)
-
- energy = 0.0
- DeltaE = 0.0
- alpha = x0[0]
- beta = x0[1]
- EnergyDer = 0.0
- DeltaPsi = 0.0
- DerivativePsiE = 0.0
- #Initial position
- for i in range(NumberParticles):
- for j in range(Dimension):
- PositionOld[i,j] = normalvariate(0.0,1.0)*sqrt(TimeStep)
- wfold = WaveFunction(PositionOld,alpha,beta)
- QuantumForceOld = QuantumForce(PositionOld,alpha, beta)
-
- #Loop over MC MCcycles
- for MCcycle in range(NumberMCcycles):
- #Trial position moving one particle at the time
- for i in range(NumberParticles):
- for j in range(Dimension):
- PositionNew[i,j] = PositionOld[i,j]+normalvariate(0.0,1.0)*sqrt(TimeStep)+\
- QuantumForceOld[i,j]*TimeStep*D
- wfnew = WaveFunction(PositionNew,alpha,beta)
- QuantumForceNew = QuantumForce(PositionNew,alpha, beta)
- GreensFunction = 0.0
- for j in range(Dimension):
- GreensFunction += 0.5*(QuantumForceOld[i,j]+QuantumForceNew[i,j])*\
- (D*TimeStep*0.5*(QuantumForceOld[i,j]-QuantumForceNew[i,j])-\
- PositionNew[i,j]+PositionOld[i,j])
-
- GreensFunction = exp(GreensFunction)
- ProbabilityRatio = GreensFunction*wfnew**2/wfold**2
- #Metropolis-Hastings test to see whether we accept the move
- if random() <= ProbabilityRatio:
- for j in range(Dimension):
- PositionOld[i,j] = PositionNew[i,j]
- QuantumForceOld[i,j] = QuantumForceNew[i,j]
- wfold = wfnew
- DeltaE = LocalEnergy(PositionOld,alpha,beta)
- DerPsi = DerivativeWFansatz(PositionOld,alpha,beta)
- DeltaPsi += DerPsi
- energy += DeltaE
- DerivativePsiE += DerPsi*DeltaE
-
- # We calculate mean values
- energy /= NumberMCcycles
- DerivativePsiE /= NumberMCcycles
- DeltaPsi /= NumberMCcycles
- EnergyDer = 2*(DerivativePsiE-DeltaPsi*energy)
- return EnergyDer
-
-
-# Computing the expectation value of the local energy
-def Energy(x0):
- # Parameters in the Fokker-Planck simulation of the quantum force
- D = 0.5
- TimeStep = 0.05
- # positions
- PositionOld = np.zeros((NumberParticles,Dimension), np.double)
- PositionNew = np.zeros((NumberParticles,Dimension), np.double)
- # Quantum force
- QuantumForceOld = np.zeros((NumberParticles,Dimension), np.double)
- QuantumForceNew = np.zeros((NumberParticles,Dimension), np.double)
-
- energy = 0.0
- DeltaE = 0.0
- alpha = x0[0]
- beta = x0[1]
- #Initial position
- for i in range(NumberParticles):
- for j in range(Dimension):
- PositionOld[i,j] = normalvariate(0.0,1.0)*sqrt(TimeStep)
- wfold = WaveFunction(PositionOld,alpha,beta)
- QuantumForceOld = QuantumForce(PositionOld,alpha, beta)
-
- #Loop over MC MCcycles
- for MCcycle in range(NumberMCcycles):
- #Trial position moving one particle at the time
- for i in range(NumberParticles):
- for j in range(Dimension):
- PositionNew[i,j] = PositionOld[i,j]+normalvariate(0.0,1.0)*sqrt(TimeStep)+\
- QuantumForceOld[i,j]*TimeStep*D
- wfnew = WaveFunction(PositionNew,alpha,beta)
- QuantumForceNew = QuantumForce(PositionNew,alpha, beta)
- GreensFunction = 0.0
- for j in range(Dimension):
- GreensFunction += 0.5*(QuantumForceOld[i,j]+QuantumForceNew[i,j])*\
- (D*TimeStep*0.5*(QuantumForceOld[i,j]-QuantumForceNew[i,j])-\
- PositionNew[i,j]+PositionOld[i,j])
-
- GreensFunction = exp(GreensFunction)
- ProbabilityRatio = GreensFunction*wfnew**2/wfold**2
- #Metropolis-Hastings test to see whether we accept the move
- if random() <= ProbabilityRatio:
- for j in range(Dimension):
- PositionOld[i,j] = PositionNew[i,j]
- QuantumForceOld[i,j] = QuantumForceNew[i,j]
- wfold = wfnew
- DeltaE = LocalEnergy(PositionOld,alpha,beta)
- energy += DeltaE
- if Printout:
- outfile.write('%f\n' %(energy/(MCcycle+1.0)))
- # We calculate mean values
- energy /= NumberMCcycles
- return energy
-
-#Here starts the main program with variable declarations
-NumberParticles = 2
-Dimension = 2
-# seed for rng generator
-seed()
-# Monte Carlo cycles for parameter optimization
-Printout = False
-NumberMCcycles= 10000
-# guess for variational parameters
-x0 = np.array([0.9,0.2])
-# Using Broydens method to find optimal parameters
-res = minimize(Energy, x0, method='BFGS', jac=EnergyDerivative, options={'gtol': 1e-4,'disp': True})
-x0 = res.x
-# Compute the energy again with the optimal parameters and increased number of Monte Cycles
-NumberMCcycles= 2**19
-Printout = True
-FinalEnergy = Energy(x0)
-EResult = np.array([FinalEnergy,FinalEnergy])
-outfile.close()
-#nice printout with Pandas
-import pandas as pd
-from pandas import DataFrame
-data ={'Optimal Parameters':x0, 'Final Energy':EResult}
-frame = pd.DataFrame(data)
-print(frame)
-
-Flyvbjerg and Petersen demonstrated that the sequence +\( \{e_k\}_{k=0}^{d-1} \) is decreasing, and conjecture that the term +\( e_k \) can be made as small as we would like by making \( k \) (and hence +\( d \)) sufficiently large. The sequence is decreasing. +It means we can apply blocking transformations until +\( e_k \) is sufficiently small, and then estimate \( \mathrm{var}(\overline{X}) \) by +\( \widehat{\sigma}^2_k/n_k \). +
+For an elegant solution and proof of the blocking method, see the recent article of Marius Jonsson (former MSc student of the Computational Physics group).
@@ -406,6 +675,16 @@
-
The next step is then to use the above data sets and perform a -resampling analysis using the blocking method -The blocking code, based on the article of Marius Jonsson is given here -
- +# Common imports
+ # 2-electron VMC code for 2dim quantum dot with importance sampling
+# Using gaussian rng for new positions and Metropolis- Hastings
+# Added energy minimization
+from math import exp, sqrt
+from random import random, seed, normalvariate
+import numpy as np
+import matplotlib.pyplot as plt
+from mpl_toolkits.mplot3d import Axes3D
+from matplotlib import cm
+from matplotlib.ticker import LinearLocator, FormatStrFormatter
+from scipy.optimize import minimize
+import sys
import os
-# Where to save the figures and data files
+# Where to save data files
+PROJECT_ROOT_DIR = "Results"
DATA_ID = "Results/EnergyMin"
+if not os.path.exists(PROJECT_ROOT_DIR):
+ os.mkdir(PROJECT_ROOT_DIR)
+
+if not os.path.exists(DATA_ID):
+ os.makedirs(DATA_ID)
+
def data_path(dat_id):
return os.path.join(DATA_ID, dat_id)
-infile = open(data_path("Energies.dat"),'r')
-
-from numpy import log2, zeros, mean, var, sum, loadtxt, arange, array, cumsum, dot, transpose, diagonal, sqrt
-from numpy.linalg import inv
-
-def block(x):
- # preliminaries
- n = len(x)
- d = int(log2(n))
- s, gamma = zeros(d), zeros(d)
- mu = mean(x)
-
- # estimate the auto-covariance and variances
- # for each blocking transformation
- for i in arange(0,d):
- n = len(x)
- # estimate autocovariance of x
- gamma[i] = (n)**(-1)*sum( (x[0:(n-1)]-mu)*(x[1:n]-mu) )
- # estimate variance of x
- s[i] = var(x)
- # perform blocking transformation
- x = 0.5*(x[0::2] + x[1::2])
-
- # generate the test observator M_k from the theorem
- M = (cumsum( ((gamma/s)**2*2**arange(1,d+1)[::-1])[::-1] ) )[::-1]
-
- # we need a list of magic numbers
- q =array([6.634897,9.210340, 11.344867, 13.276704, 15.086272, 16.811894, 18.475307, 20.090235, 21.665994, 23.209251, 24.724970, 26.216967, 27.688250, 29.141238, 30.577914, 31.999927, 33.408664, 34.805306, 36.190869, 37.566235, 38.932173, 40.289360, 41.638398, 42.979820, 44.314105, 45.641683, 46.962942, 48.278236, 49.587884, 50.892181])
-
- # use magic to determine when we should have stopped blocking
- for k in arange(0,d):
- if(M[k] < q[k]):
- break
- if (k >= d-1):
- print("Warning: Use more data")
- return mu, s[k]/2**(d-k)
-
-
-x = loadtxt(infile)
-(mean, var) = block(x)
-std = sqrt(var)
+outfile = open(data_path("Energies.dat"),'w')
+
+
+# Trial wave function for the 2-electron quantum dot in two dims
+def WaveFunction(r,alpha,beta):
+ r1 = r[0,0]**2 + r[0,1]**2
+ r2 = r[1,0]**2 + r[1,1]**2
+ r12 = sqrt((r[0,0]-r[1,0])**2 + (r[0,1]-r[1,1])**2)
+ deno = r12/(1+beta*r12)
+ return exp(-0.5*alpha*(r1+r2)+deno)
+
+# Local energy for the 2-electron quantum dot in two dims, using analytical local energy
+def LocalEnergy(r,alpha,beta):
+
+ r1 = (r[0,0]**2 + r[0,1]**2)
+ r2 = (r[1,0]**2 + r[1,1]**2)
+ r12 = sqrt((r[0,0]-r[1,0])**2 + (r[0,1]-r[1,1])**2)
+ deno = 1.0/(1+beta*r12)
+ deno2 = deno*deno
+ return 0.5*(1-alpha*alpha)*(r1 + r2) +2.0*alpha + 1.0/r12+deno2*(alpha*r12-deno2+2*beta*deno-1.0/r12)
+
+# Derivate of wave function ansatz as function of variational parameters
+def DerivativeWFansatz(r,alpha,beta):
+
+ WfDer = np.zeros((2), np.double)
+ r1 = (r[0,0]**2 + r[0,1]**2)
+ r2 = (r[1,0]**2 + r[1,1]**2)
+ r12 = sqrt((r[0,0]-r[1,0])**2 + (r[0,1]-r[1,1])**2)
+ deno = 1.0/(1+beta*r12)
+ deno2 = deno*deno
+ WfDer[0] = -0.5*(r1+r2)
+ WfDer[1] = -r12*r12*deno2
+ return WfDer
+
+# Setting up the quantum force for the two-electron quantum dot, recall that it is a vector
+def QuantumForce(r,alpha,beta):
+
+ qforce = np.zeros((NumberParticles,Dimension), np.double)
+ r12 = sqrt((r[0,0]-r[1,0])**2 + (r[0,1]-r[1,1])**2)
+ deno = 1.0/(1+beta*r12)
+ qforce[0,:] = -2*r[0,:]*alpha*(r[0,:]-r[1,:])*deno*deno/r12
+ qforce[1,:] = -2*r[1,:]*alpha*(r[1,:]-r[0,:])*deno*deno/r12
+ return qforce
+
+
+# Computing the derivative of the energy and the energy
+def EnergyDerivative(x0):
+
+
+ # Parameters in the Fokker-Planck simulation of the quantum force
+ D = 0.5
+ TimeStep = 0.05
+ # positions
+ PositionOld = np.zeros((NumberParticles,Dimension), np.double)
+ PositionNew = np.zeros((NumberParticles,Dimension), np.double)
+ # Quantum force
+ QuantumForceOld = np.zeros((NumberParticles,Dimension), np.double)
+ QuantumForceNew = np.zeros((NumberParticles,Dimension), np.double)
+
+ energy = 0.0
+ DeltaE = 0.0
+ alpha = x0[0]
+ beta = x0[1]
+ EnergyDer = 0.0
+ DeltaPsi = 0.0
+ DerivativePsiE = 0.0
+ #Initial position
+ for i in range(NumberParticles):
+ for j in range(Dimension):
+ PositionOld[i,j] = normalvariate(0.0,1.0)*sqrt(TimeStep)
+ wfold = WaveFunction(PositionOld,alpha,beta)
+ QuantumForceOld = QuantumForce(PositionOld,alpha, beta)
+
+ #Loop over MC MCcycles
+ for MCcycle in range(NumberMCcycles):
+ #Trial position moving one particle at the time
+ for i in range(NumberParticles):
+ for j in range(Dimension):
+ PositionNew[i,j] = PositionOld[i,j]+normalvariate(0.0,1.0)*sqrt(TimeStep)+\
+ QuantumForceOld[i,j]*TimeStep*D
+ wfnew = WaveFunction(PositionNew,alpha,beta)
+ QuantumForceNew = QuantumForce(PositionNew,alpha, beta)
+ GreensFunction = 0.0
+ for j in range(Dimension):
+ GreensFunction += 0.5*(QuantumForceOld[i,j]+QuantumForceNew[i,j])*\
+ (D*TimeStep*0.5*(QuantumForceOld[i,j]-QuantumForceNew[i,j])-\
+ PositionNew[i,j]+PositionOld[i,j])
+
+ GreensFunction = exp(GreensFunction)
+ ProbabilityRatio = GreensFunction*wfnew**2/wfold**2
+ #Metropolis-Hastings test to see whether we accept the move
+ if random() <= ProbabilityRatio:
+ for j in range(Dimension):
+ PositionOld[i,j] = PositionNew[i,j]
+ QuantumForceOld[i,j] = QuantumForceNew[i,j]
+ wfold = wfnew
+ DeltaE = LocalEnergy(PositionOld,alpha,beta)
+ DerPsi = DerivativeWFansatz(PositionOld,alpha,beta)
+ DeltaPsi += DerPsi
+ energy += DeltaE
+ DerivativePsiE += DerPsi*DeltaE
+
+ # We calculate mean values
+ energy /= NumberMCcycles
+ DerivativePsiE /= NumberMCcycles
+ DeltaPsi /= NumberMCcycles
+ EnergyDer = 2*(DerivativePsiE-DeltaPsi*energy)
+ return EnergyDer
+
+
+# Computing the expectation value of the local energy
+def Energy(x0):
+ # Parameters in the Fokker-Planck simulation of the quantum force
+ D = 0.5
+ TimeStep = 0.05
+ # positions
+ PositionOld = np.zeros((NumberParticles,Dimension), np.double)
+ PositionNew = np.zeros((NumberParticles,Dimension), np.double)
+ # Quantum force
+ QuantumForceOld = np.zeros((NumberParticles,Dimension), np.double)
+ QuantumForceNew = np.zeros((NumberParticles,Dimension), np.double)
+
+ energy = 0.0
+ DeltaE = 0.0
+ alpha = x0[0]
+ beta = x0[1]
+ #Initial position
+ for i in range(NumberParticles):
+ for j in range(Dimension):
+ PositionOld[i,j] = normalvariate(0.0,1.0)*sqrt(TimeStep)
+ wfold = WaveFunction(PositionOld,alpha,beta)
+ QuantumForceOld = QuantumForce(PositionOld,alpha, beta)
+
+ #Loop over MC MCcycles
+ for MCcycle in range(NumberMCcycles):
+ #Trial position moving one particle at the time
+ for i in range(NumberParticles):
+ for j in range(Dimension):
+ PositionNew[i,j] = PositionOld[i,j]+normalvariate(0.0,1.0)*sqrt(TimeStep)+\
+ QuantumForceOld[i,j]*TimeStep*D
+ wfnew = WaveFunction(PositionNew,alpha,beta)
+ QuantumForceNew = QuantumForce(PositionNew,alpha, beta)
+ GreensFunction = 0.0
+ for j in range(Dimension):
+ GreensFunction += 0.5*(QuantumForceOld[i,j]+QuantumForceNew[i,j])*\
+ (D*TimeStep*0.5*(QuantumForceOld[i,j]-QuantumForceNew[i,j])-\
+ PositionNew[i,j]+PositionOld[i,j])
+
+ GreensFunction = exp(GreensFunction)
+ ProbabilityRatio = GreensFunction*wfnew**2/wfold**2
+ #Metropolis-Hastings test to see whether we accept the move
+ if random() <= ProbabilityRatio:
+ for j in range(Dimension):
+ PositionOld[i,j] = PositionNew[i,j]
+ QuantumForceOld[i,j] = QuantumForceNew[i,j]
+ wfold = wfnew
+ DeltaE = LocalEnergy(PositionOld,alpha,beta)
+ energy += DeltaE
+ if Printout:
+ outfile.write('%f\n' %(energy/(MCcycle+1.0)))
+ # We calculate mean values
+ energy /= NumberMCcycles
+ return energy
+
+#Here starts the main program with variable declarations
+NumberParticles = 2
+Dimension = 2
+# seed for rng generator
+seed()
+# Monte Carlo cycles for parameter optimization
+Printout = False
+NumberMCcycles= 10000
+# guess for variational parameters
+x0 = np.array([0.9,0.2])
+# Using Broydens method to find optimal parameters
+res = minimize(Energy, x0, method='BFGS', jac=EnergyDerivative, options={'gtol': 1e-4,'disp': True})
+x0 = res.x
+# Compute the energy again with the optimal parameters and increased number of Monte Cycles
+NumberMCcycles= 2**19
+Printout = True
+FinalEnergy = Energy(x0)
+EResult = np.array([FinalEnergy,FinalEnergy])
+outfile.close()
+#nice printout with Pandas
import pandas as pd
from pandas import DataFrame
-data ={'Mean':[mean], 'STDev':[std]}
-frame = pd.DataFrame(data,index=['Values'])
+data ={'Optimal Parameters':x0, 'Final Energy':EResult}
+frame = pd.DataFrame(data)
print(frame)
-
The blocking method was made popular by Flyvbjerg and Pedersen (1989) -and has become one of the standard ways to estimate -\( V(\widehat{\theta}) \) for exactly one \( \widehat{\theta} \), namely -\( \widehat{\theta} = \overline{X} \). +
The next step is then to use the above data sets and perform a +resampling analysis using the blocking method +The blocking code, based on the article of Marius Jonsson is given here
-Assume \( n = 2^d \) for some integer \( d>1 \) and \( X_1,X_2,\cdots, X_n \) is a stationary time series to begin with. -Moreover, assume that the time series is asymptotically uncorrelated. We switch to vector notation by arranging \( X_1,X_2,\cdots,X_n \) in an \( n \)-tuple. Define: -
-$$ -\begin{align*} -\hat{X} = (X_1,X_2,\cdots,X_n). -\end{align*} -$$ - -The strength of the blocking method is when the number of -observations, \( n \) is large. For large \( n \), the complexity of dependent -bootstrapping scales poorly, but the blocking method does not, -moreover, it becomes more accurate the larger \( n \) is. -
+ + +# Common imports
+import os
+
+# Where to save the figures and data files
+DATA_ID = "Results/EnergyMin"
+
+def data_path(dat_id):
+ return os.path.join(DATA_ID, dat_id)
+
+infile = open(data_path("Energies.dat"),'r')
+
+from numpy import log2, zeros, mean, var, sum, loadtxt, arange, array, cumsum, dot, transpose, diagonal, sqrt
+from numpy.linalg import inv
+
+def block(x):
+ # preliminaries
+ n = len(x)
+ d = int(log2(n))
+ s, gamma = zeros(d), zeros(d)
+ mu = mean(x)
+
+ # estimate the auto-covariance and variances
+ # for each blocking transformation
+ for i in arange(0,d):
+ n = len(x)
+ # estimate autocovariance of x
+ gamma[i] = (n)**(-1)*sum( (x[0:(n-1)]-mu)*(x[1:n]-mu) )
+ # estimate variance of x
+ s[i] = var(x)
+ # perform blocking transformation
+ x = 0.5*(x[0::2] + x[1::2])
+
+ # generate the test observator M_k from the theorem
+ M = (cumsum( ((gamma/s)**2*2**arange(1,d+1)[::-1])[::-1] ) )[::-1]
+
+ # we need a list of magic numbers
+ q =array([6.634897,9.210340, 11.344867, 13.276704, 15.086272, 16.811894, 18.475307, 20.090235, 21.665994, 23.209251, 24.724970, 26.216967, 27.688250, 29.141238, 30.577914, 31.999927, 33.408664, 34.805306, 36.190869, 37.566235, 38.932173, 40.289360, 41.638398, 42.979820, 44.314105, 45.641683, 46.962942, 48.278236, 49.587884, 50.892181])
+
+ # use magic to determine when we should have stopped blocking
+ for k in arange(0,d):
+ if(M[k] < q[k]):
+ break
+ if (k >= d-1):
+ print("Warning: Use more data")
+ return mu, s[k]/2**(d-k)
+
+
+x = loadtxt(infile)
+(mean, var) = block(x)
+std = sqrt(var)
+import pandas as pd
+from pandas import DataFrame
+data ={'Mean':[mean], 'STDev':[std]}
+frame = pd.DataFrame(data,index=['Values'])
+print(frame)
+
+@@ -235,6 +752,11 @@
-
We now define -blocking transformations. The idea is to take the mean of subsequent -pair of elements from \( \vec{X} \) and form a new vector -\( \vec{X}_1 \). Continuing in the same way by taking the mean of -subsequent pairs of elements of \( \vec{X}_1 \) we obtain \( \vec{X}_2 \), and -so on. -Define \( \vec{X}_i \) recursively by: -
- -$$ -\begin{align} -(\vec{X}_0)_k &\equiv (\vec{X})_k \nonumber \\ -(\vec{X}_{i+1})_k &\equiv \frac{1}{2}\Big( (\vec{X}_i)_{2k-1} + -(\vec{X}_i)_{2k} \Big) \qquad \text{for all} \qquad 1 \leq i \leq d-1 -\tag{5} -\end{align} -$$ - -The quantity \( \vec{X}_k \) is -subject to \( k \) blocking transformations. We now have \( d \) vectors -\( \vec{X}_0, \vec{X}_1,\cdots,\vec X_{d-1} \) containing the subsequent -averages of observations. It turns out that if the components of -\( \vec{X} \) is a stationary time series, then the components of -\( \vec{X}_i \) is a stationary time series for all \( 0 \leq i \leq d-1 \) -
- -We can then compute the autocovariance, the variance, sample mean, and -number of observations for each \( i \). -Let \( \gamma_i, \sigma_i^2, -\overline{X}_i \) denote the autocovariance, variance and average of the -elements of \( \vec{X}_i \) and let \( n_i \) be the number of elements of -\( \vec{X}_i \). It follows by induction that \( n_i = n/2^i \). -
- +diff --git a/doc/pub/week9/html/._week9-bs024.html b/doc/pub/week9/html/._week9-bs024.html index 4381840f..2d43eb0d 100644 --- a/doc/pub/week9/html/._week9-bs024.html +++ b/doc/pub/week9/html/._week9-bs024.html @@ -47,6 +47,7 @@ None, 'and-why-do-we-use-such-methods'), ('Central limit theorem', 2, None, 'central-limit-theorem'), + ('Further remarks', 2, None, 'further-remarks'), ('Running many measurements', 2, None, @@ -62,62 +63,404 @@ 2, None, 'introducing-the-correlation-function'), - ('Statistics, wrapping up from last week', + ('Resampling methods: Blocking', 2, None, - 'statistics-wrapping-up-from-last-week'), - ('Statistics, final expression', + 'resampling-methods-blocking'), + ('Why blocking?', 2, None, 'why-blocking'), + ('Blocking Transformations', 2, None, 'blocking-transformations'), + ('Blocking transformations', 2, None, 'blocking-transformations'), + ('Blocking Transformations', 2, None, 'blocking-transformations'), + ('Blocking Transformations, getting there', 2, None, - 'statistics-final-expression'), - ('Statistics, effective number of correlations', + 'blocking-transformations-getting-there'), + ('Blocking Transformations, final expressions', 2, None, - 'statistics-effective-number-of-correlations'), - ('Can we understand this? Time Auto-correlation Function', + 'blocking-transformations-final-expressions'), + ('More on the blocking method', 2, None, - 'can-we-understand-this-time-auto-correlation-function'), - ('Time Auto-correlation Function', + 'more-on-the-blocking-method'), + ('Example code form last week', 2, None, - 'time-auto-correlation-function'), - ('Time Auto-correlation Function', + 'example-code-form-last-week'), + ('Resampling analysis', 2, None, 'resampling-analysis'), + ('Content', 2, None, 'content'), + ('Optimization and profiling', 2, None, - 'time-auto-correlation-function'), - ('Time Auto-correlation Function', + 'optimization-and-profiling'), + ('More on optimization', 2, None, 'more-on-optimization'), + ('Optimization and profiling', 2, None, - 'time-auto-correlation-function'), - ('Time Auto-correlation Function', + 'optimization-and-profiling'), + ('Optimization and debugging', 2, None, - 'time-auto-correlation-function'), - ('Time Auto-correlation Function', + 'optimization-and-debugging'), + ('Other hints', 2, None, 'other-hints'), + ('Vectorization and the basic idea behind parallel computing', 2, None, - 'time-auto-correlation-function'), - ('Correlation Time', 2, None, 'correlation-time'), - ('Resampling methods: Blocking', + 'vectorization-and-the-basic-idea-behind-parallel-computing'), + ('A rough classification of hardware models', 2, None, - 'resampling-methods-blocking'), - ('Blocking Transformations', 2, None, 'blocking-transformations'), - ('Blocking Transformations', 2, None, 'blocking-transformations'), - ('Blocking Transformations, getting there', + 'a-rough-classification-of-hardware-models'), + ('Shared memory and distributed memory', 2, None, - 'blocking-transformations-getting-there'), - ('Blocking Transformations, final expressions', + 'shared-memory-and-distributed-memory'), + ('Different parallel programming paradigms', 2, None, - 'blocking-transformations-final-expressions'), - ('Example code form last week', + 'different-parallel-programming-paradigms'), + ('Different parallel programming paradigms', 2, None, - 'example-code-form-last-week'), - ('Resampling analysis', 2, None, 'resampling-analysis')]} + 'different-parallel-programming-paradigms'), + ('What is vectorization?', 2, None, 'what-is-vectorization'), + ('Number of elements that can acted upon', + 2, + None, + 'number-of-elements-that-can-acted-upon'), + ('Number of elements that can acted upon, examples', + 2, + None, + 'number-of-elements-that-can-acted-upon-examples'), + ('Operation counts for scalar operation', + 2, + None, + 'operation-counts-for-scalar-operation'), + ('Number of elements that can acted upon, examples', + 2, + None, + 'number-of-elements-that-can-acted-upon-examples'), + ('Number of operations when vectorized', + 2, + None, + 'number-of-operations-when-vectorized'), + ('"A simple test case with and without ' + 'vectorization":"https://github.com/CompPhysics/ComputationalPhysicsMSU/blob/master/doc/Programs/LecturePrograms/programs/Classes/cpp/program7.cpp"', + 2, + None, + 'a-simple-test-case-with-and-without-vectorization-https-github-com-compphysics-computationalphysicsmsu-blob-master-doc-programs-lectureprograms-programs-classes-cpp-program7-cpp'), + ('Compiling with and without vectorization', + 2, + None, + 'compiling-with-and-without-vectorization'), + ('Compiling with and without vectorization using clang', + 2, + None, + 'compiling-with-and-without-vectorization-using-clang'), + ('Automatic vectorization and vectorization inhibitors, criteria', + 2, + None, + 'automatic-vectorization-and-vectorization-inhibitors-criteria'), + ('Automatic vectorization and vectorization inhibitors, exit ' + 'criteria', + 2, + None, + 'automatic-vectorization-and-vectorization-inhibitors-exit-criteria'), + ('Automatic vectorization and vectorization inhibitors, ' + 'straight-line code', + 2, + None, + 'automatic-vectorization-and-vectorization-inhibitors-straight-line-code'), + ('Automatic vectorization and vectorization inhibitors, nested ' + 'loops', + 2, + None, + 'automatic-vectorization-and-vectorization-inhibitors-nested-loops'), + ('Automatic vectorization and vectorization inhibitors, function ' + 'calls', + 2, + None, + 'automatic-vectorization-and-vectorization-inhibitors-function-calls'), + ('Automatic vectorization and vectorization inhibitors, data ' + 'dependencies', + 2, + None, + 'automatic-vectorization-and-vectorization-inhibitors-data-dependencies'), + ('Automatic vectorization and vectorization inhibitors, more ' + 'data dependencies', + 2, + None, + 'automatic-vectorization-and-vectorization-inhibitors-more-data-dependencies'), + ('Automatic vectorization and vectorization inhibitors, memory ' + 'stride', + 2, + None, + 'automatic-vectorization-and-vectorization-inhibitors-memory-stride'), + ('Memory management', 2, None, 'memory-management'), + ('Memory and communication', 2, None, 'memory-and-communication'), + ('Measuring performance', 2, None, 'measuring-performance'), + ('Problems with measuring time', + 2, + None, + 'problems-with-measuring-time'), + ('Problems with cold start', 2, None, 'problems-with-cold-start'), + ('Problems with smart compilers', + 2, + None, + 'problems-with-smart-compilers'), + ('Problems with interference', + 2, + None, + 'problems-with-interference'), + ('Problems with measuring performance', + 2, + None, + 'problems-with-measuring-performance'), + ('Thomas algorithm for tridiagonal linear algebra equations', + 2, + None, + 'thomas-algorithm-for-tridiagonal-linear-algebra-equations'), + ('Thomas algorithm, forward substitution', + 2, + None, + 'thomas-algorithm-forward-substitution'), + ('Thomas algorithm, backward substitution', + 2, + None, + 'thomas-algorithm-backward-substitution'), + ('Thomas algorithm and counting of operations (floating point ' + 'and memory)', + 2, + None, + 'thomas-algorithm-and-counting-of-operations-floating-point-and-memory'), + ('"Example: Transpose of a ' + 'matrix":"https://github.com/CompPhysics/ComputationalPhysicsMSU/blob/master/doc/Programs/LecturePrograms/programs/Classes/cpp/program8.cpp"', + 2, + None, + 'example-transpose-of-a-matrix-https-github-com-compphysics-computationalphysicsmsu-blob-master-doc-programs-lectureprograms-programs-classes-cpp-program8-cpp'), + ('"Matrix-matrix ' + 'multiplication":"https://github.com/CompPhysics/ComputationalPhysicsMSU/blob/master/doc/Programs/LecturePrograms/programs/Classes/cpp/program9.cpp"', + 2, + None, + 'matrix-matrix-multiplication-https-github-com-compphysics-computationalphysicsmsu-blob-master-doc-programs-lectureprograms-programs-classes-cpp-program9-cpp'), + ('How do we define speedup? Simplest form', + 2, + None, + 'how-do-we-define-speedup-simplest-form'), + ('How do we define speedup? Correct baseline', + 2, + None, + 'how-do-we-define-speedup-correct-baseline'), + ('Parallel speedup', 2, None, 'parallel-speedup'), + ('Speedup and memory', 2, None, 'speedup-and-memory'), + ('Upper bounds on speedup', 2, None, 'upper-bounds-on-speedup'), + ("Amdahl's law", 2, None, 'amdahl-s-law'), + ('How much is parallelizable', + 2, + None, + 'how-much-is-parallelizable'), + ("Today's situation of parallel computing", + 2, + None, + 'today-s-situation-of-parallel-computing'), + ('Overhead present in parallel computing', + 2, + None, + 'overhead-present-in-parallel-computing'), + ('Parallelizing a sequential algorithm', + 2, + None, + 'parallelizing-a-sequential-algorithm'), + ('Strategies', 2, None, 'strategies'), + ('How do I run MPI on a PC/Laptop? MPI', + 2, + None, + 'how-do-i-run-mpi-on-a-pc-laptop-mpi'), + ('Can I do it on my own PC/laptop? OpenMP installation', + 2, + None, + 'can-i-do-it-on-my-own-pc-laptop-openmp-installation'), + ('Installing MPI', 2, None, 'installing-mpi'), + ('Installing MPI and using Qt', + 2, + None, + 'installing-mpi-and-using-qt'), + ('What is Message Passing Interface (MPI)?', + 2, + None, + 'what-is-message-passing-interface-mpi'), + ('Going Parallel with MPI', 2, None, 'going-parallel-with-mpi'), + ('MPI is a library', 2, None, 'mpi-is-a-library'), + ('Bindings to MPI routines', 2, None, 'bindings-to-mpi-routines'), + ('Communicator', 2, None, 'communicator'), + ('Some of the most important MPI functions', + 2, + None, + 'some-of-the-most-important-mpi-functions'), + ('"The first MPI C/C++ ' + 'program":"https://github.com/CompPhysics/ComputationalPhysics2/blob/gh-pages/doc/Programs/LecturePrograms/programs/MPI/chapter07/program2.cpp"', + 2, + None, + 'the-first-mpi-c-c-program-https-github-com-compphysics-computationalphysics2-blob-gh-pages-doc-programs-lectureprograms-programs-mpi-chapter07-program2-cpp'), + ('The Fortran program', 2, None, 'the-fortran-program'), + ('Note 1', 2, None, 'note-1'), + ('"Ordered output with ' + 'MPIBarrier":"https://github.com/CompPhysics/ComputationalPhysics2/blob/gh-pages/doc/Programs/LecturePrograms/programs/MPI/chapter07/program3.cpp"', + 2, + None, + 'ordered-output-with-mpibarrier-https-github-com-compphysics-computationalphysics2-blob-gh-pages-doc-programs-lectureprograms-programs-mpi-chapter07-program3-cpp'), + ('Note 2', 2, None, 'note-2'), + ('"Ordered ' + 'output":"https://github.com/CompPhysics/ComputationalPhysics2/blob/gh-pages/doc/Programs/LecturePrograms/programs/MPI/chapter07/program4.cpp"', + 2, + None, + 'ordered-output-https-github-com-compphysics-computationalphysics2-blob-gh-pages-doc-programs-lectureprograms-programs-mpi-chapter07-program4-cpp'), + ('Note 3', 2, None, 'note-3'), + ('Note 4', 2, None, 'note-4'), + ('"Numerical integration in ' + 'parallel":"https://github.com/CompPhysics/ComputationalPhysics2/blob/gh-pages/doc/Programs/LecturePrograms/programs/MPI/chapter07/program6.cpp"', + 2, + None, + 'numerical-integration-in-parallel-https-github-com-compphysics-computationalphysics2-blob-gh-pages-doc-programs-lectureprograms-programs-mpi-chapter07-program6-cpp'), + ('Dissection of trapezoidal rule with $MPI\\_reduce$', + 2, + None, + 'dissection-of-trapezoidal-rule-with-mpi-reduce'), + ('Dissection of trapezoidal rule', + 2, + None, + 'dissection-of-trapezoidal-rule'), + ('Integrating with _MPI_', 2, None, 'integrating-with-mpi'), + ('How do I use $MPI\\_reduce$?', + 2, + None, + 'how-do-i-use-mpi-reduce'), + ('More on $MPI\\_Reduce$', 2, None, 'more-on-mpi-reduce'), + ('Dissection of trapezoidal rule', + 2, + None, + 'dissection-of-trapezoidal-rule'), + ('Dissection of trapezoidal rule', + 2, + None, + 'dissection-of-trapezoidal-rule'), + ('"The quantum dot program for two ' + 'electrons":"https://github.com/CompPhysics/ComputationalPhysics2/blob/master/doc/Programs/ParallelizationMPI/MPIvmcqdot.cpp"', + 2, + None, + 'the-quantum-dot-program-for-two-electrons-https-github-com-compphysics-computationalphysics2-blob-master-doc-programs-parallelizationmpi-mpivmcqdot-cpp'), + ('What is OpenMP', 2, None, 'what-is-openmp'), + ('Getting started, things to remember', + 2, + None, + 'getting-started-things-to-remember'), + ('OpenMP syntax', 2, None, 'openmp-syntax'), + ('Different OpenMP styles of parallelism', + 2, + None, + 'different-openmp-styles-of-parallelism'), + ('General code structure', 2, None, 'general-code-structure'), + ('Parallel region', 2, None, 'parallel-region'), + ('Hello world, not again, please!', + 2, + None, + 'hello-world-not-again-please'), + ('Hello world, yet another variant', + 2, + None, + 'hello-world-yet-another-variant'), + ('Important OpenMP library routines', + 2, + None, + 'important-openmp-library-routines'), + ('Private variables', 2, None, 'private-variables'), + ('Master region', 2, None, 'master-region'), + ('Parallel for loop', 2, None, 'parallel-for-loop'), + ('Parallel computations and loops', + 2, + None, + 'parallel-computations-and-loops'), + ('Scheduling of loop computations', + 2, + None, + 'scheduling-of-loop-computations'), + ('Example code for loop scheduling', + 2, + None, + 'example-code-for-loop-scheduling'), + ('Example code for loop scheduling, guided instead of dynamic', + 2, + None, + 'example-code-for-loop-scheduling-guided-instead-of-dynamic'), + ('More on Parallel for loop', + 2, + None, + 'more-on-parallel-for-loop'), + ('What can happen with this loop?', + 2, + None, + 'what-can-happen-with-this-loop'), + ('Inner product', 2, None, 'inner-product'), + ('Different threads do different tasks', + 2, + None, + 'different-threads-do-different-tasks'), + ('Single execution', 2, None, 'single-execution'), + ('Coordination and synchronization', + 2, + None, + 'coordination-and-synchronization'), + ('Data scope', 2, None, 'data-scope'), + ('Some remarks', 2, None, 'some-remarks'), + ('Parallelizing nested for-loops', + 2, + None, + 'parallelizing-nested-for-loops'), + ('Nested parallelism', 2, None, 'nested-parallelism'), + ('Parallel tasks', 2, None, 'parallel-tasks'), + ('Common mistakes', 2, None, 'common-mistakes'), + ('Not all computations are simple', + 2, + None, + 'not-all-computations-are-simple'), + ('Not all computations are simple, competing threads', + 2, + None, + 'not-all-computations-are-simple-competing-threads'), + ('How to find the max value using OpenMP', + 2, + None, + 'how-to-find-the-max-value-using-openmp'), + ('Then deal with the race conditions', + 2, + None, + 'then-deal-with-the-race-conditions'), + ('What can slow down OpenMP performance?', + 2, + None, + 'what-can-slow-down-openmp-performance'), + ('What can slow down OpenMP performance?', + 2, + None, + 'what-can-slow-down-openmp-performance'), + ('Find the max location for each thread', + 2, + None, + 'find-the-max-location-for-each-thread'), + ('Combine the values from each thread', + 2, + None, + 'combine-the-values-from-each-thread'), + ('"Matrix-matrix ' + 'multiplication":"https://github.com/CompPhysics/ComputationalPhysicsMSU/blob/master/doc/Programs/ParallelizationOpenMP/OpenMPvectornorm.cpp"', + 2, + None, + 'matrix-matrix-multiplication-https-github-com-compphysics-computationalphysicsmsu-blob-master-doc-programs-parallelizationopenmp-openmpvectornorm-cpp'), + ('"Matrix-matrix ' + 'multiplication":"https://github.com/CompPhysics/ComputationalPhysicsMSU/blob/master/doc/Programs/ParallelizationOpenMP/OpenMPmatrixmatrixmult.cpp"', + 2, + None, + 'matrix-matrix-multiplication-https-github-com-compphysics-computationalphysicsmsu-blob-master-doc-programs-parallelizationopenmp-openmpmatrixmatrixmult-cpp')]} end of tocinfo --> @@ -157,29 +500,141 @@
-
Till now we have not paid much attention to speed and possible optimization possibilities +inherent in the various compilers. We have compiled and linked as +
+ + +c++ -c mycode.cpp
+c++ -o mycode.exe mycode.o
+
+Using the -definition of the blocking transformation and the distributive -property of the covariance, it is clear that since \( h =|i-j| \) -we can define +
For Fortran replace with for example gfortran or ifort. +This is what we call a flat compiler option and should be used when we develop the code. +It produces normally a very large and slow code when translated to machine instructions. +We use this option for debugging and for establishing the correct program output because +every operation is done precisely as the user specified it.
-$$ -\begin{align} -\gamma_{k+1}(h) &= cov\left( ({X}_{k+1})_{i}, ({X}_{k+1})_{j} \right) \nonumber \\ -&= \frac{1}{4}cov\left( ({X}_{k})_{2i-1} + ({X}_{k})_{2i}, ({X}_{k})_{2j-1} + ({X}_{k})_{2j} \right) \nonumber \\ -&= \frac{1}{2}\gamma_{k}(2h) + \frac{1}{2}\gamma_k(2h+1) \hspace{0.1cm} \mathrm{h = 0} -\tag{6}\\ -&=\frac{1}{4}\gamma_k(2h-1) + \frac{1}{2}\gamma_k(2h) + \frac{1}{4}\gamma_k(2h+1) \quad \mathrm{else} -\tag{7} -\end{align} -$$ -The quantity \( \hat{X} \) is asymptotic uncorrelated by assumption, \( \hat{X}_k \) is also asymptotic uncorrelated. Let's turn our attention to the variance of the sample mean \( V(\overline{X}) \).
+It is instructive to look up the compiler manual for further instructions by writing
+ + +man c++
+
+@@ -230,6 +730,13 @@
-
We have
-$$ -\begin{align} -V(\overline{X}_k) = \frac{\sigma_k^2}{n_k} + \underbrace{\frac{2}{n_k} \sum_{h=1}^{n_k-1}\left( 1 - \frac{h}{n_k} \right)\gamma_k(h)}_{\equiv e_k} = \frac{\sigma^2_k}{n_k} + e_k \quad \text{if} \quad \gamma_k(0) = \sigma_k^2. -\tag{8} -\end{align} -$$ +We have additional compiler options for optimization. These may include procedure inlining where +performance may be improved, moving constants inside loops outside the loop, +identify potential parallelism, include automatic vectorization or replace a division with a reciprocal +and a multiplication if this speeds up the code. +
-The term \( e_k \) is called the truncation error:
-$$ -\begin{equation} -e_k = \frac{2}{n_k} \sum_{h=1}^{n_k-1}\left( 1 - \frac{h}{n_k} \right)\gamma_k(h). -\tag{9} -\end{equation} -$$ + +c++ -O3 -c mycode.cpp
+c++ -O3 -o mycode.exe mycode.o
+
+We can show that \( V(\overline{X}_i) = V(\overline{X}_j) \) for all \( 0 \leq i \leq d-1 \) and \( 0 \leq j \leq d-1 \).
+This (other options are -O2 or -Ofast) is the recommended option.
+@@ -228,6 +701,14 @@
-
It is also useful to profile your program under the development stage. +You would then compile with +
-We can then wrap up
-$$ -\begin{align} -n_{j+1} \overline{X}_{j+1} &= \sum_{i=1}^{n_{j+1}} (\hat{X}_{j+1})_i = \frac{1}{2}\sum_{i=1}^{n_{j}/2} (\hat{X}_{j})_{2i-1} + (\hat{X}_{j})_{2i} \nonumber \\ -&= \frac{1}{2}\left[ (\hat{X}_j)_1 + (\hat{X}_j)_2 + \cdots + (\hat{X}_j)_{n_j} \right] = \underbrace{\frac{n_j}{2}}_{=n_{j+1}} \overline{X}_j = n_{j+1}\overline{X}_j. -\tag{10} -\end{align} -$$ + +c++ -pg -O3 -c mycode.cpp
+c++ -pg -O3 -o mycode.exe mycode.o
+
+After you have run the code you can obtain the profiling information via
-By repeated use of this equation we get \( V(\overline{X}_i) = V(\overline{X}_0) = V(\overline{X}) \) for all \( 0 \leq i \leq d-1 \). This has the consequence that
-$$ -\begin{align} -V(\overline{X}) = \frac{\sigma_k^2}{n_k} + e_k \qquad \text{for all} \qquad 0 \leq k \leq d-1. \tag{11} -\end{align} -$$ + +gprof mycode.exe > ProfileOutput
+
+Flyvbjerg and Petersen demonstrated that the sequence -\( \{e_k\}_{k=0}^{d-1} \) is decreasing, and conjecture that the term -\( e_k \) can be made as small as we would like by making \( k \) (and hence -\( d \)) sufficiently large. The sequence is decreasing (Master of Science thesis by Marius Jonsson, UiO 2018). -It means we can apply blocking transformations until -\( e_k \) is sufficiently small, and then estimate \( V(\overline{X}) \) by -\( \widehat{\sigma}^2_k/n_k \). +
When you have profiled properly your code, you must take out this option as it +slows down performance. +For memory tests use valgrind. An excellent environment for all these aspects, and much more, is Qt creator.
+For an elegant solution and proof of the blocking method, see the recent article of Marius Jonsson (former MSc student of the Computational Physics group).
@@ -237,6 +726,15 @@
-
Adding debugging options is a very useful alternative under the development stage of a program. +You would then compile with +
+ +# 2-electron VMC code for 2dim quantum dot with importance sampling
-# Using gaussian rng for new positions and Metropolis- Hastings
-# Added energy minimization
-from math import exp, sqrt
-from random import random, seed, normalvariate
-import numpy as np
-import matplotlib.pyplot as plt
-from mpl_toolkits.mplot3d import Axes3D
-from matplotlib import cm
-from matplotlib.ticker import LinearLocator, FormatStrFormatter
-from scipy.optimize import minimize
-import sys
-import os
-
-# Where to save data files
-PROJECT_ROOT_DIR = "Results"
-DATA_ID = "Results/EnergyMin"
-
-if not os.path.exists(PROJECT_ROOT_DIR):
- os.mkdir(PROJECT_ROOT_DIR)
-
-if not os.path.exists(DATA_ID):
- os.makedirs(DATA_ID)
-
-def data_path(dat_id):
- return os.path.join(DATA_ID, dat_id)
-
-outfile = open(data_path("Energies.dat"),'w')
-
-
-# Trial wave function for the 2-electron quantum dot in two dims
-def WaveFunction(r,alpha,beta):
- r1 = r[0,0]**2 + r[0,1]**2
- r2 = r[1,0]**2 + r[1,1]**2
- r12 = sqrt((r[0,0]-r[1,0])**2 + (r[0,1]-r[1,1])**2)
- deno = r12/(1+beta*r12)
- return exp(-0.5*alpha*(r1+r2)+deno)
-
-# Local energy for the 2-electron quantum dot in two dims, using analytical local energy
-def LocalEnergy(r,alpha,beta):
-
- r1 = (r[0,0]**2 + r[0,1]**2)
- r2 = (r[1,0]**2 + r[1,1]**2)
- r12 = sqrt((r[0,0]-r[1,0])**2 + (r[0,1]-r[1,1])**2)
- deno = 1.0/(1+beta*r12)
- deno2 = deno*deno
- return 0.5*(1-alpha*alpha)*(r1 + r2) +2.0*alpha + 1.0/r12+deno2*(alpha*r12-deno2+2*beta*deno-1.0/r12)
-
-# Derivate of wave function ansatz as function of variational parameters
-def DerivativeWFansatz(r,alpha,beta):
-
- WfDer = np.zeros((2), np.double)
- r1 = (r[0,0]**2 + r[0,1]**2)
- r2 = (r[1,0]**2 + r[1,1]**2)
- r12 = sqrt((r[0,0]-r[1,0])**2 + (r[0,1]-r[1,1])**2)
- deno = 1.0/(1+beta*r12)
- deno2 = deno*deno
- WfDer[0] = -0.5*(r1+r2)
- WfDer[1] = -r12*r12*deno2
- return WfDer
-
-# Setting up the quantum force for the two-electron quantum dot, recall that it is a vector
-def QuantumForce(r,alpha,beta):
-
- qforce = np.zeros((NumberParticles,Dimension), np.double)
- r12 = sqrt((r[0,0]-r[1,0])**2 + (r[0,1]-r[1,1])**2)
- deno = 1.0/(1+beta*r12)
- qforce[0,:] = -2*r[0,:]*alpha*(r[0,:]-r[1,:])*deno*deno/r12
- qforce[1,:] = -2*r[1,:]*alpha*(r[1,:]-r[0,:])*deno*deno/r12
- return qforce
-
-
-# Computing the derivative of the energy and the energy
-def EnergyDerivative(x0):
-
-
- # Parameters in the Fokker-Planck simulation of the quantum force
- D = 0.5
- TimeStep = 0.05
- # positions
- PositionOld = np.zeros((NumberParticles,Dimension), np.double)
- PositionNew = np.zeros((NumberParticles,Dimension), np.double)
- # Quantum force
- QuantumForceOld = np.zeros((NumberParticles,Dimension), np.double)
- QuantumForceNew = np.zeros((NumberParticles,Dimension), np.double)
-
- energy = 0.0
- DeltaE = 0.0
- alpha = x0[0]
- beta = x0[1]
- EnergyDer = 0.0
- DeltaPsi = 0.0
- DerivativePsiE = 0.0
- #Initial position
- for i in range(NumberParticles):
- for j in range(Dimension):
- PositionOld[i,j] = normalvariate(0.0,1.0)*sqrt(TimeStep)
- wfold = WaveFunction(PositionOld,alpha,beta)
- QuantumForceOld = QuantumForce(PositionOld,alpha, beta)
-
- #Loop over MC MCcycles
- for MCcycle in range(NumberMCcycles):
- #Trial position moving one particle at the time
- for i in range(NumberParticles):
- for j in range(Dimension):
- PositionNew[i,j] = PositionOld[i,j]+normalvariate(0.0,1.0)*sqrt(TimeStep)+\
- QuantumForceOld[i,j]*TimeStep*D
- wfnew = WaveFunction(PositionNew,alpha,beta)
- QuantumForceNew = QuantumForce(PositionNew,alpha, beta)
- GreensFunction = 0.0
- for j in range(Dimension):
- GreensFunction += 0.5*(QuantumForceOld[i,j]+QuantumForceNew[i,j])*\
- (D*TimeStep*0.5*(QuantumForceOld[i,j]-QuantumForceNew[i,j])-\
- PositionNew[i,j]+PositionOld[i,j])
-
- GreensFunction = exp(GreensFunction)
- ProbabilityRatio = GreensFunction*wfnew**2/wfold**2
- #Metropolis-Hastings test to see whether we accept the move
- if random() <= ProbabilityRatio:
- for j in range(Dimension):
- PositionOld[i,j] = PositionNew[i,j]
- QuantumForceOld[i,j] = QuantumForceNew[i,j]
- wfold = wfnew
- DeltaE = LocalEnergy(PositionOld,alpha,beta)
- DerPsi = DerivativeWFansatz(PositionOld,alpha,beta)
- DeltaPsi += DerPsi
- energy += DeltaE
- DerivativePsiE += DerPsi*DeltaE
-
- # We calculate mean values
- energy /= NumberMCcycles
- DerivativePsiE /= NumberMCcycles
- DeltaPsi /= NumberMCcycles
- EnergyDer = 2*(DerivativePsiE-DeltaPsi*energy)
- return EnergyDer
-
-
-# Computing the expectation value of the local energy
-def Energy(x0):
- # Parameters in the Fokker-Planck simulation of the quantum force
- D = 0.5
- TimeStep = 0.05
- # positions
- PositionOld = np.zeros((NumberParticles,Dimension), np.double)
- PositionNew = np.zeros((NumberParticles,Dimension), np.double)
- # Quantum force
- QuantumForceOld = np.zeros((NumberParticles,Dimension), np.double)
- QuantumForceNew = np.zeros((NumberParticles,Dimension), np.double)
-
- energy = 0.0
- DeltaE = 0.0
- alpha = x0[0]
- beta = x0[1]
- #Initial position
- for i in range(NumberParticles):
- for j in range(Dimension):
- PositionOld[i,j] = normalvariate(0.0,1.0)*sqrt(TimeStep)
- wfold = WaveFunction(PositionOld,alpha,beta)
- QuantumForceOld = QuantumForce(PositionOld,alpha, beta)
-
- #Loop over MC MCcycles
- for MCcycle in range(NumberMCcycles):
- #Trial position moving one particle at the time
- for i in range(NumberParticles):
- for j in range(Dimension):
- PositionNew[i,j] = PositionOld[i,j]+normalvariate(0.0,1.0)*sqrt(TimeStep)+\
- QuantumForceOld[i,j]*TimeStep*D
- wfnew = WaveFunction(PositionNew,alpha,beta)
- QuantumForceNew = QuantumForce(PositionNew,alpha, beta)
- GreensFunction = 0.0
- for j in range(Dimension):
- GreensFunction += 0.5*(QuantumForceOld[i,j]+QuantumForceNew[i,j])*\
- (D*TimeStep*0.5*(QuantumForceOld[i,j]-QuantumForceNew[i,j])-\
- PositionNew[i,j]+PositionOld[i,j])
-
- GreensFunction = exp(GreensFunction)
- ProbabilityRatio = GreensFunction*wfnew**2/wfold**2
- #Metropolis-Hastings test to see whether we accept the move
- if random() <= ProbabilityRatio:
- for j in range(Dimension):
- PositionOld[i,j] = PositionNew[i,j]
- QuantumForceOld[i,j] = QuantumForceNew[i,j]
- wfold = wfnew
- DeltaE = LocalEnergy(PositionOld,alpha,beta)
- energy += DeltaE
- if Printout:
- outfile.write('%f\n' %(energy/(MCcycle+1.0)))
- # We calculate mean values
- energy /= NumberMCcycles
- return energy
-
-#Here starts the main program with variable declarations
-NumberParticles = 2
-Dimension = 2
-# seed for rng generator
-seed()
-# Monte Carlo cycles for parameter optimization
-Printout = False
-NumberMCcycles= 10000
-# guess for variational parameters
-x0 = np.array([0.9,0.2])
-# Using Broydens method to find optimal parameters
-res = minimize(Energy, x0, method='BFGS', jac=EnergyDerivative, options={'gtol': 1e-4,'disp': True})
-x0 = res.x
-# Compute the energy again with the optimal parameters and increased number of Monte Cycles
-NumberMCcycles= 2**19
-Printout = True
-FinalEnergy = Energy(x0)
-EResult = np.array([FinalEnergy,FinalEnergy])
-outfile.close()
-#nice printout with Pandas
-import pandas as pd
-from pandas import DataFrame
-data ={'Optimal Parameters':x0, 'Final Energy':EResult}
-frame = pd.DataFrame(data)
-print(frame)
+ c++ -g -O0 -c mycode.cpp
+c++ -g -O0 -o mycode.exe mycode.o
This option generates debugging information allowing you to trace for example if an array is properly allocated. Some compilers work best with the no optimization option -O0.
+Depending on the compiler, one can add flags which generate code that catches integer overflow errors. +The flag -ftrapv does this for the CLANG compiler on OS X operating systems. +
+@@ -447,6 +707,16 @@
-
The next step is then to use the above data sets and perform a -resampling analysis using the blocking method -The blocking code, based on the article of Marius Jonsson is given here -
- +In general, irrespective of compiler options, it is useful to
+Here is an example of a part of a program where specific operations lead to a slower code
- +# Common imports
-import os
-
-# Where to save the figures and data files
-DATA_ID = "Results/EnergyMin"
-
-def data_path(dat_id):
- return os.path.join(DATA_ID, dat_id)
-
-infile = open(data_path("Energies.dat"),'r')
-
-from numpy import log2, zeros, mean, var, sum, loadtxt, arange, array, cumsum, dot, transpose, diagonal, sqrt
-from numpy.linalg import inv
-
-def block(x):
- # preliminaries
- n = len(x)
- d = int(log2(n))
- s, gamma = zeros(d), zeros(d)
- mu = mean(x)
-
- # estimate the auto-covariance and variances
- # for each blocking transformation
- for i in arange(0,d):
- n = len(x)
- # estimate autocovariance of x
- gamma[i] = (n)**(-1)*sum( (x[0:(n-1)]-mu)*(x[1:n]-mu) )
- # estimate variance of x
- s[i] = var(x)
- # perform blocking transformation
- x = 0.5*(x[0::2] + x[1::2])
-
- # generate the test observator M_k from the theorem
- M = (cumsum( ((gamma/s)**2*2**arange(1,d+1)[::-1])[::-1] ) )[::-1]
-
- # we need a list of magic numbers
- q =array([6.634897,9.210340, 11.344867, 13.276704, 15.086272, 16.811894, 18.475307, 20.090235, 21.665994, 23.209251, 24.724970, 26.216967, 27.688250, 29.141238, 30.577914, 31.999927, 33.408664, 34.805306, 36.190869, 37.566235, 38.932173, 40.289360, 41.638398, 42.979820, 44.314105, 45.641683, 46.962942, 48.278236, 49.587884, 50.892181])
-
- # use magic to determine when we should have stopped blocking
- for k in arange(0,d):
- if(M[k] < q[k]):
- break
- if (k >= d-1):
- print("Warning: Use more data")
- return mu, s[k]/2**(d-k)
+ k = n-1;
+for (i = 0; i < n; i++){
+ a[i] = b[i] +c*d;
+ e = g[k];
+}
+
+
A better code is
-x = loadtxt(infile) -(mean, var) = block(x) -std = sqrt(var) -import pandas as pd -from pandas import DataFrame -data ={'Mean':[mean], 'STDev':[std]} -frame = pd.DataFrame(data,index=['Values']) -print(frame) + +temp = c*d;
+for (i = 0; i < n; i++){
+ a[i] = b[i] + temp;
+}
+e = g[n-1];
Here we avoid a repeated multiplication inside a loop. +Most compilers, depending on compiler flags, identify and optimize such bottlenecks on their own, without requiring any particular action by the programmer. However, it is always useful to single out and avoid code examples like the first one discussed here. +
+@@ -291,6 +733,18 @@
- +
Present CPUs are highly parallel processors with varying levels of parallelism. The typical situation can be described via the following three statements.
+Before we proceed with a more detailed discussion of topics like vectorization and parallelization, we need to remind ourselves about some basic features of different hardware models.
+-For C++ programmers it is also worth keeping in mind that an array notation is preferred to the more compact use of pointers to access array elements. The compiler can often not tell if it is safe to vectorize the code. - -
-When dealing with arrays, you should also avoid memory stride, since this slows down considerably vectorization. When you access array element, write for example the inner loop to vectorize using unit stride, that is, access successively the next array element in memory, as shown here -
- -
for (int i = 0; i < n; i++) {
- for (int j = 0; j < n; j++) {
- a[i][j] += b[i][j];
- }
- }
-
-
- - -
-
- - -
One way of categorizing modern parallel computers is to look at the memory configuration.
The CPUs are connected by some network and may exchange messages.
+
-
- +
-How do we measure performance? What is wrong with this code to time a loop? -
+
clock_t start, finish;
- start = clock();
- for (int j = 0; j < i; j++) {
- a[j] = b[j]+b[j]*c[j];
- }
- finish = clock();
- double timeused = (double) (finish - start)/(CLOCKS_PER_SEC );
-
-
- +
@@ -624,29 +684,22 @@
- - + +
Vectorization is a special +case of Single Instructions Multiple Data (SIMD) to denote a single +instruction stream capable of operating on multiple data elements in +parallel. +We can think of vectorization as the unrolling of loops accompanied with SIMD instructions. +
-Vectorization is the process of converting an algorithm that performs scalar operations +(typically one operation at the time) to vector operations where a single operation can refer to many simultaneous operations. +Consider the following example +
--What happens when the code is executed? The assumption is that the code is ready to -execute. But + +
for (i = 0; i < n; i++){
+ a[i] = b[i] + c[i];
+}
+
+If the code is not vectorized, the compiler will simply start with the first element and +then perform subsequent additions operating on one address in memory at the time. +
@@ -627,29 +712,22 @@
- - - -
A SIMD instruction can operate on multiple data elements in one single instruction. +It uses the so-called 128-bit SIMD floating-point register. +In this sense, vectorization adds some form of parallelism since one instruction is applied +to many parts of say a vector. +
-The number of elements which can be operated on in parallel +range from four single-precision floating point data elements in so-called +Streaming SIMD Extensions and two double-precision floating-point data +elements in Streaming SIMD Extensions 2 to sixteen byte operations in +a 128-bit register in Streaming SIMD Extensions 2. Thus, vector-length +ranges from 2 to 16, depending on the instruction extensions used and +on the data type. +
+IN summary, our instructions operate on 128 bit (16 byte) operands
+-
- - - -
We start with the simple scalar operations given by
+ + +for (i = 0; i < n; i++){
+ a[i] = b[i] + c[i];
+}
+
+If the code is not vectorized and we have a 128-bit register to store a 32 bits floating point number, +it means that we have \( 3\times 32 \) bits that are not used. +
-We have thus unused space in our SIMD registers. These registers could hold three additional integers.
@@ -633,29 +704,22 @@
- - + +
The code
-for (i = 0; i < n; i++){
+ a[i] = b[i] + c[i];
+}
+
+has for \( n \) repeats
-
- - + +
If we vectorize the code, we can perform, with a 128-bit register four simultaneous operations, that is +we have +
--$$ -\left( \begin{array}{ccccc} - b_0 & c_0 & & & \\ - a_0 & b_1 & c_1 & & \\ - & & \ddots & & \\ - & & a_{m-3} & b_{m-2} & c_{m-2} \\ - & & & a_{m-2} & b_{m-1} - \end{array} \right) -\left( \begin{array}{c} - x_0 \\ - x_1 \\ - \vdots \\ - x_{m-2} \\ - x_{m-1} - \end{array} \right)=\left( \begin{array}{c} - f_0 \\ - f_1 \\ - \vdots \\ - f_{m-2} \\ - f_{m-1} \\ - \end{array} \right) -$$ + +
for (i = 0; i < n; i+=4){
+ a[i] = b[i] + c[i];
+ a[i+1] = b[i+1] + c[i+1];
+ a[i+2] = b[i+2] + c[i+2];
+ a[i+3] = b[i+3] + c[i+3];
+}
+
Four additions are now done in a single step.
-
-
- - - -
-The first step is to multiply the first row by \( a_0/b_0 \) and subtract it from the second row. This is known as the forward substitution step. We obtain then -$$ - a_i = 0, -$$ - - -$$ - b_i = b_i - \frac{a_{i-1}}{b_{i-1}}c_{i-1}, -$$ - -and -$$ - f_i = f_i - \frac{a_{i-1}}{b_{i-1}}f_{i-1}. -$$ - -At this point the simplified equation, with only an upper triangular matrix takes the form -$$ -\left( \begin{array}{ccccc} - b_0 & c_0 & & & \\ - & b_1 & c_1 & & \\ - & & \ddots & & \\ - & & & b_{m-2} & c_{m-2} \\ - & & & & b_{m-1} - \end{array} \right)\left( \begin{array}{c} - x_0 \\ - x_1 \\ - \vdots \\ - x_{m-2} \\ - x_{m-1} - \end{array} \right)=\left( \begin{array}{c} - f_0 \\ - f_1 \\ - \vdots \\ - f_{m-2} \\ - f_{m-1} \\ - \end{array} \right) -$$ -
+ +
For \( n/4 \) repeats assuming floats or integers
+-
- - -
-The next step is the backward substitution step. The last row is multiplied by \( c_{N-3}/b_{N-2} \) and subtracted from the second to last row, thus eliminating \( c_{N-3} \) from the last row. The general backward substitution procedure is -$$ - c_i = 0, -$$ - -and -$$ - f_{i-1} = f_{i-1} - \frac{c_{i-1}}{b_i}f_i -$$ - -All that ramains to be computed is the solution, which is the very straight forward process of -$$ -x_i = \frac{f_i}{b_i} -$$ +
We implement these operations in a simple c++ program that computes at the end the norm of a vector.
+ + + +#include <cstdlib>
+#include <iostream>
+#include <cmath>
+#include <iomanip>
+#include "time.h"
+
+using namespace std; // note use of namespace
+int main (int argc, char* argv[])
+{
+ // read in dimension of square matrix
+ int n = atoi(argv[1]);
+ double s = 1.0/sqrt( (double) n);
+ double *a, *b, *c;
+ // Start timing
+ clock_t start, finish;
+ start = clock();
+// Allocate space for the vectors to be used
+ a = new double [n]; b = new double [n]; c = new double [n];
+ // Define parallel region
+ // Set up values for vectors a and b
+ for (int i = 0; i < n; i++){
+ double angle = 2.0*M_PI*i/ (( double ) n);
+ a[i] = s*(sin(angle) + cos(angle));
+ b[i] = s*sin(2.0*angle);
+ c[i] = 0.0;
+ }
+ // Then perform the vector addition
+ for (int i = 0; i < n; i++){
+ c[i] += a[i]+b[i];
+ }
+ // Compute now the norm-2
+ double Norm2 = 0.0;
+ for (int i = 0; i < n; i++){
+ Norm2 += c[i]*c[i];
+ }
+ finish = clock();
+ double timeused = (double) (finish - start)/(CLOCKS_PER_SEC );
+ cout << setiosflags(ios::showpoint | ios::uppercase);
+ cout << setprecision(10) << setw(20) << "Time used for norm computation=" << timeused << endl;
+ cout << " Norm-2 = " << Norm2 << endl;
+ // Free up space
+ delete[] a;
+ delete[] b;
+ delete[] c;
+ return 0;
+}
+
-
- - + +
We can compile and link without vectorization using the clang c++ compiler
-+ +
clang -o novec.x vecexample.cpp
+
++
and with vectorization (and additional optimizations)
-Operation | Floating Point |
Memory Reads | \( 14(N-2) \) |
Memory Writes | \( 4(N-2) \) |
Subtractions | \( 3(N-2) \) |
Multiplications | \( 3(N-2) \) |
Divisions | \( 4(N-2) \) |
+ +
clang++ -O3 -Rpass=loop-vectorize -o vec.x vecexample.cpp
+
The speedup depends on the size of the vectors. In the example here we have run with \( 10^7 \) elements. +The example here was run on an IMac17.1 with OSX El Capitan (10.11.4) as operating system and an Intel i5 3.3 GHz CPU. +
--
-
+ +
Compphys:~ hjensen$ ./vec.x 10000000
+Time used for norm computation=0.04720500000
+Compphys:~ hjensen$ ./novec.x 10000000
+Time used for norm computation=0.03311700000
+
+This particular C++ compiler speeds up the above loop operations with a factor of 1.5 +Performing the same operations for \( 10^9 \) elements results in a smaller speedup since reading from main memory is required. The non-vectorized code is seemingly faster. +
-// Forward substitution
-// Note that we can simplify by precalculating a[i-1]/b[i-1]
- for (int i=1; i < n; i++) {
- b[i] = b[i] - (a[i-1]*c[i-1])/b[i-1];
- f[i] = g[i] - (a[i-1]*f[i-1])/b[i-1];
- }
- x[n-1] = f[n-1] / b[n-1];
- // Backwards substitution
- for (int i = n-2; i >= 0; i--) {
- f[i] = f[i] - c[i]*f[i+1]/b[i+1];
- x[i] = f[i]/b[i];
- }
-
+
Compphys:~ hjensen$ ./vec.x 1000000000
+Time used for norm computation=58.41391100
+Compphys:~ hjensen$ ./novec.x 1000000000
+Time used for norm computation=46.51295300
+
We will discuss these issues further in the next slides.
-
-
- - + +
We can compile and link without vectorization with clang compiler
+ + +clang++ -o -fno-vectorize novec.x vecexample.cpp
+
+and with vectorization
-+ +
clang++ -O3 -Rpass=loop-vectorize -o vec.x vecexample.cpp
+
+#include <cstdlib>
-#include <iostream>
-#include <cmath>
-#include <iomanip>
-#include "time.h"
+We can also add vectorization analysis, see for example
-using namespace std; // note use of namespace
-int main (int argc, char* argv[])
-{
- // read in dimension of square matrix
- int n = atoi(argv[1]);
- double **A, **B;
- // Allocate space for the two matrices
- A = new double*[n]; B = new double*[n];
- for (int i = 0; i < n; i++){
- A[i] = new double[n];
- B[i] = new double[n];
- }
- // Set up values for matrix A
- for (int i = 0; i < n; i++){
- for (int j = 0; j < n; j++) {
- A[i][j] = cos(i*1.0)*sin(j*3.0);
- }
- }
- clock_t start, finish;
- start = clock();
- // Then compute the transpose
- for (int i = 0; i < n; i++){
- for (int j = 0; j < n; j++) {
- B[i][j]= A[j][i];
- }
- }
+
+
+
+
+
+
+ clang++ -O3 -Rpass-analysis=loop-vectorize -o vec.x vecexample.cpp
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+or figure out if vectorization was missed
+
+
+
+
+
+
+
+ clang++ -O3 -Rpass-missed=loop-vectorize -o vec.x vecexample.cpp
+
+
+
+
+
+
+
+
+
+
+
+
+
- finish = clock();
- double timeused = (double) (finish - start)/(CLOCKS_PER_SEC );
- cout << setiosflags(ios::showpoint | ios::uppercase);
- cout << setprecision(10) << setw(20) << "Time used for setting up transpose of matrix=" << timeused << endl;
- // Free up space
- for (int i = 0; i < n; i++){
- delete[] A[i];
- delete[] B[i];
- }
- delete[] A;
- delete[] B;
- return 0;
-}
-
-
- +
Not all loops can be vectorized, as discussed in Intel's guide to vectorization
-+
An important criteria is that the loop counter \( n \) is known at the entry of the loop.
- -#include <cstdlib>
-#include <iostream>
-#include <cmath>
-#include <iomanip>
-#include "time.h"
+
+
+
+
+
+
+ for (int j = 0; j < n; j++) {
+ a[j] = cos(j*1.0);
+ }
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+The variable \( n \) does need to be known at compile time. However, this variable must stay the same for the entire duration of the loop. It implies that an exit statement inside the loop cannot be data dependent.
-using namespace std; // note use of namespace
-int main (int argc, char* argv[])
-{
- // read in dimension of square matrix
- int n = atoi(argv[1]);
- double s = 1.0/sqrt( (double) n);
- double **A, **B, **C;
- // Start timing
- clock_t start, finish;
- start = clock();
- // Allocate space for the two matrices
- A = new double*[n]; B = new double*[n]; C = new double*[n];
- for (int i = 0; i < n; i++){
- A[i] = new double[n];
- B[i] = new double[n];
- C[i] = new double[n];
- }
- // Set up values for matrix A and B and zero matrix C
- for (int i = 0; i < n; i++){
- for (int j = 0; j < n; j++) {
- double angle = 2.0*M_PI*i*j/ (( double ) n);
- A[i][j] = s * ( sin ( angle ) + cos ( angle ) );
- B[j][i] = A[i][j];
- }
- }
- // Then perform the matrix-matrix multiplication
- for (int i = 0; i < n; i++){
- for (int j = 0; j < n; j++) {
- double sum = 0.0;
- for (int k = 0; k < n; k++) {
- sum += B[i][k]*A[k][j];
- }
- C[i][j] = sum;
- }
- }
- // Compute now the Frobenius norm
- double Fsum = 0.0;
- for (int i = 0; i < n; i++){
- for (int j = 0; j < n; j++) {
- Fsum += C[i][j]*C[i][j];
- }
- }
- Fsum = sqrt(Fsum);
- finish = clock();
- double timeused = (double) (finish - start)/(CLOCKS_PER_SEC );
- cout << setiosflags(ios::showpoint | ios::uppercase);
- cout << setprecision(10) << setw(20) << "Time used for matrix-matrix multiplication=" << timeused << endl;
- cout << " Frobenius norm = " << Fsum << endl;
- // Free up space
- for (int i = 0; i < n; i++){
- delete[] A[i];
- delete[] B[i];
- delete[] C[i];
- }
- delete[] A;
- delete[] B;
- delete[] C;
- return 0;
-}
-
-
- +
+
An exit statement should in general be avoided. +If the exit statement contains data-dependent conditions, the loop cannot be vectorized. +The following is an example of a non-vectorizable loop +
- for (int j = 0; j < n; j++) {
+ a[j] = cos(j*1.0);
+ if (a[j] < 0 ) break;
+ }
+
Avoid loop termination conditions and opt for a single entry loop variable \( n \). The lower and upper bounds have to be kept fixed within the loop.
-
-
- - -
-The key is choosing the correct baseline for comparison - -
SIMD instructions perform the same type of operations multiple times. +A switch statement leads thus to a non-vectorizable loop since different statemens cannot branch. +The following code can however be vectorized since the if statement is implemented as a masked assignment. +
+ + + for (int j = 0; j < n; j++) {
+ double x = cos(j*1.0);
+ if (x > 0 ) {
+ a[j] = x*sin(j*2.0);
+ }
+ else {
+ a[j] = 0.0;
+ }
+ }
+
These operations can be performed for all data elements but only those elements which the mask evaluates as true are stored. In general, one should avoid branches such as switch, go to, or return statements or if constructs that cannot be treated as masked assignments.
-
-
- - -
-For parallel applications, speedup is typically defined as - -
Only the innermost loop of the following example is vectorized
+ + + for (int i = 0; i < n; i++) {
+ for (int j = 0; j < n; j++) {
+ a[i][j] += b[i][j];
+ }
+ }
+
The exception is if an original outer loop is transformed into an inner loop as the result of compiler optimizations.
-
-
- +
-The speedup on \( p \) processors can -be greater than \( p \) if memory usage is optimal! -Consider the case of a memorybound computation with \( M \) words of memory +
Calls to programmer defined functions ruin vectorization. However, calls to intrinsic functions like +\( \sin{x} \), \( \cos{x} \), \( \exp{x} \) etc are allowed since they are normally efficiently vectorized. +The following example is fully vectorizable +
- for (int i = 0; i < n; i++) {
+ a[i] = log10(i)*cos(i);
+ }
+
Similarly, inline functions defined by the programmer, allow for vectorization since the function statements are glued into the actual place where the function is called.
-
-
- +
One has to keep in mind that vectorization changes the order of operations inside a loop. A so-called +read-after-write statement with an explicit flow dependency cannot be vectorized. The following code +
+ + + double b = 15.;
+ for (int i = 1; i < n; i++) {
+ a[i] = a[i-1] + b;
+ }
+
+-Assume that almost all parts of a code are perfectly -parallelizable (fraction \( f \)). The remainder, -fraction \( (1-f) \) cannot be parallelized at all. - -
-That is, there is work that takes time \( W \) on one process; a fraction \( f \) of that work will take -time \( Wf/p \) on \( p \) processors. - -
is an example of flow dependency and results in wrong numerical results if vectorized. For a scalar operation, the value \( a[i-1] \) computed during the iteration is loaded into the right-hand side and the results are fine. In vector mode however, with a vector length of four, the values \( a[0] \), \( a[1] \), \( a[2] \) and \( a[3] \) from the previous loop will be loaded into the right-hand side and produce wrong results. That is, we have
+ + + a[1] = a[0] + b;
+ a[2] = a[1] + b;
+ a[3] = a[2] + b;
+ a[4] = a[3] + b;
+
and if the two first iterations are executed at the same by the SIMD instruction, the value of say \( a[1] \) could be used by the second iteration before it has been calculated by the first iteration, leading thereby to wrong results.
-
-
- - -
-On one processor we have -$$ -T_1 = (1-f)W + fW = W -$$ - -On \( p \) processors we have -$$ -T_p = (1-f)W + \frac{fW}{p}, -$$ - -resulting in a speedup of -$$ -\frac{T_1}{T_p} = \frac{W}{(1-f)W+fW/p} -$$ - -
-As \( p \) goes to infinity, \( fW/p \) goes to zero, and the maximum speedup is -$$ -\frac{1}{1-f}, -$$ - -meaning that if -if \( f = 0.99 \) (all but \( 1\% \) parallelizable), the maximum speedup -is \( 1/(1-.99)=100 \)! +
On the other hand, a so-called +write-after-read statement can be vectorized. The following code +
+ + + double b = 15.;
+ for (int i = 1; i < n; i++) {
+ a[i-1] = a[i] + b;
+ }
+
is an example of flow dependency that can be vectorized since no iteration with a higher value of \( i \) +can complete before an iteration with a lower value of \( i \). However, such code leads to problems with parallelization. +
-
-
- - -
-If any non-parallel code slips into the -application, the parallel -performance is limited. - -
-In many simulations, however, the fraction of non-parallelizable work -is \( 10^{-6} \) or less due to large arrays or objects that are perfectly parallelizable. - -
+
For C++ programmers it is also worth keeping in mind that an array notation is preferred to the more compact use of pointers to access array elements. The compiler can often not tell if it is safe to vectorize the code.
+ +When dealing with arrays, you should also avoid memory stride, since this slows down considerably vectorization. When you access array element, write for example the inner loop to vectorize using unit stride, that is, access successively the next array element in memory, as shown here
+ + + for (int i = 0; i < n; i++) {
+ for (int j = 0; j < n; j++) {
+ a[i][j] += b[i][j];
+ }
+ }
+
-
- - -
- +
The main memory contains the program data
+Loads and stores to memory can be as important as floating point operations when we measure performance.
-Our lectures will focus on both MPI and OpenMP. - --
-
- +
- +
Many of these performance features are not captured in most programming languages.
-
-
- +
+
How do we measure performance? What is wrong with this code to time a loop?
- clock_t start, finish;
+ start = clock();
+ for (int j = 0; j < i; j++) {
+ a[j] = b[j]+b[j]*c[j];
+ }
+ finish = clock();
+ double timeused = (double) (finish - start)/(CLOCKS_PER_SEC );
+
-
- - -
- -
+
-
- - -
-To install MPI is rather easy on hardware running unix/linux as operating systems, follow simply the instructions from the OpenMPI website. See also subsequent slides. -When you have made sure you have installed MPI on your PC/laptop, - -
- - -
# Compile and link
- mpic++ -O3 -o nameofprog.x nameofprog.cpp
- # run code with for example 8 processes using mpirun/mpiexec
- mpiexec -n 8 ./nameofprog.x
-
-
+
What happens when the code is executed? The assumption is that the code is ready to +execute. But +
+-
- - -
-If you wish to install MPI and OpenMP -on your laptop/PC, we recommend the following: - -
- - -
brew install libomp
-
-and compile and link as -
- - -
c++ -o <name executable> <name program.cpp> -lomp
-
-
+
-
- - -
-For linux/ubuntu users, you need to install two packages (alternatively use the synaptic package manager) -
- - -
sudo apt-get install libopenmpi-dev
- sudo apt-get install openmpi-bin
-
-For OS X users, install brew (after having installed xcode and gcc, needed for the -gfortran compiler of openmpi) and then install with brew -
- - -
brew install openmpi
-
-When running an executable (code.x), run as -
- - -
mpirun -n 10 ./code.x
-
-where we indicate that we want the number of processes to be 10. - -
-
+
-
- - -
-With openmpi installed, when using Qt, add to your .pro file the instructions here - -
-You may need to tell Qt where openmpi is stored. - -
-
+
-