Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Mpi base implementation #31

Closed

Conversation

johnathanchann
Copy link
Collaborator

Not tested with actual 8 gpu. feel free to pr for changes

Copy link
Contributor

@rozukke rozukke left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not a good work division strategy, some weird remnants of C++ and inconsistent name style

Comment on lines +17 to +22
n_gpus=$(shell nvidia-smi --query-gpu=name --format=csv,noheader | wc -l); \
mpirun --oversubscribe -np $$n_gpus ./speed_gpu ./weights_and_biases.txt ./tensors 100000

test: build
./speed_gpu ./weights_and_biases.txt ./tensors 1000000
n_gpus=$(shell nvidia-smi --query-gpu=name --format=csv,noheader | wc -l); \
mpirun --oversubscribe -np $$n_gpus ./speed_gpu ./weights_and_biases.txt ./tensors 1000000
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We do not need to oversubscribe on the final build.

@@ -1,6 +1,7 @@
#include "matrix.cuh"
#include <dirent.h>
#include <iostream>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is there C++ in here

@@ -156,11 +157,23 @@ __global__ void infer(float* d_inputs, int* d_results, matrix** d_weights, matri
}

int main(int argc, char* argv[]) {
MPI_Init(&argc, &argv);
int TotalProcess, ProcessId;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Capitalisation codestyle

Comment on lines +223 to +224
int local_input_count = input_count / TotalProcess + (ProcessId < (input_count % TotalProcess) ? 1 : 0);
int start_idx = ProcessId * (input_count / TotalProcess) + std::min(ProcessId, input_count % TotalProcess);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is there weird capitalised naming and C++ functions

Comment on lines +233 to +245
int counter = 0;
while ((entry = readdir(dir)) != NULL) {
if (entry->d_type == DT_REG) {
strcpy(file_num_str, entry->d_name);
file_num_str[strlen(entry->d_name) - 7] = '\0';
file_num = atoi(entry->d_name);
strcpy(file_name, directory_path);
strcat(file_name, "/");
strcat(file_name, entry->d_name);
read_tensor((float*)&inputs[(file_num - 1) * 225], file_name);
if (file_num >= start_idx + 1 && file_num <= start_idx + local_input_count) {
strcpy(file_name, directory_path);
strcat(file_name, "/");
strcat(file_name, entry->d_name);
read_tensor(&inputs[counter * 225], file_name);
counter++;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do not divide inputs between gpus, divide inferences

Comment on lines -275 to +262
for (int i = 0; i < input_count; i++) {
for (int i = 0; i < local_input_count; i++) {
infer<<<BLOCKS, THREADS_PER_BLOCK>>>(d_inputs, d_results, d_weights, d_biases, it_num, i);
err = cudaGetLastError();
if (err != cudaSuccess) {
printf("CUDA error: %s\n", cudaGetErrorString(err));
}
CUDA_CHECK(cudaGetLastError());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rearrange to divide more healthily

Comment on lines +291 to +292
printf("Process %d - Total: %lu us\n", ProcessId,
(stop.tv_sec - start.tv_sec) * 1000000 + stop.tv_usec - start.tv_usec);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also want timing from the root process for how long the whole run takes

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants