Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

If the number of threads is greater than the number of sites, bad behavior ensues #17

Open
bjoo opened this issue Sep 3, 2016 · 2 comments
Assignees
Labels
Milestone

Comments

@bjoo
Copy link
Contributor

bjoo commented Sep 3, 2016

Running the t_mesplq with the default local volume (4x2x2x4) produces non-constant results when run repeatedly. The problem goes away if the number if threads is less than the number of sites.

Several fixes are possible:
Simplest: fix qdp_dispatch: only threads with ID < numSites will dispatch
Longer Term: Since OpenMP is now the norm, the dispatch function could be recoded:
instead of low/high site indices, the loops cold be hoisted out of the dispatched functions
into the dispatcher with an OMP for

@bjoo bjoo added the bug label Sep 3, 2016
@bjoo bjoo added this to the Fix in devel milestone Sep 3, 2016
@bjoo bjoo self-assigned this Sep 3, 2016
@bjoo
Copy link
Contributor Author

bjoo commented Sep 3, 2016

The issue appears to be more subtle than just qdp_dispatch. I have changed the code to not use dispatch and the issue persists.

@bjoo
Copy link
Contributor Author

bjoo commented Sep 3, 2016

Tracked it. The issue was threading in random()
there was an

#pragma omp parallel {

     #pragma omp for 
     for(...) 

    #pragma omp critical 
    {
        update seed 
    }

}
construct. This if there were more threads than sites, some would have tried to update the seed with their original wrong value. I've now guarded this with

   int myId = omp_get_thread_num()
   if ( myId < nodeSites ) { 
         // only active threads update seed
         #pragma omp critical 
         { 
             // update seed
         }
    }

I have replicated this pattern throughout qdp_parscalar_specific.h wherever critical occurs in a similar situation (mostly on sums/inner products).

NB: In case the question comes: why not use OMP reductions for sums, the answer is that we are summing into a complex type. I can experiment with using OMP reductions for that later.

SInce I am doing this in a devel branch I won't mark it closed just yet.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

1 participant