A minimal test case to show the performance difference.
openmp_101.c:
#include <inttypes.h>
#include <limits.h>
#include <omp.h>
#include <stdint.h>
#include <stdio.h>
#include <stdlib.h>
int main() {
printf("omp_get_max_threads(): %u\n", omp_get_max_threads());
fflush(stdout);
double start = omp_get_wtime();
const uint64_t loops = INT_MAX;
uint64_t sum = 0, rand_sum = 0;
#pragma omp parallel for reduction(+ : sum) reduction(+ : rand_sum)
for (uint64_t i = 0; i < loops; i++) {
sum += 1;
rand_sum += rand(); // or: rand_sum += (i * i);
}
double end = omp_get_wtime();
// Print the time taken and the sums calculated
printf("Time taken (wtime): %.3f seconds\n", end - start);
printf("Loops: %" PRIu64 "\n", loops);
printf("Sum: %" PRIu64 "\n", sum);
printf("nRand_sum: %" PRIu64 "\n", rand_sum);
}
openmp-101.sh:
#!/bin/bash
#SBATCH --nodes=1
#SBATCH --ntasks=1
#SBATCH --cpus-per-task=28
#SBATCH --qos=high
cc -fopenmp -o openmp_101 ./openmp_101.c
export OMP_NUM_THREADS=56
srun ./openmp_101
Run with sbatch
:
sbatch openmp-101.sh
Run with srun
:
(
cc -fopenmp -o openmp_102 ./openmp_101.c
export OMP_NUM_THREADS=56
srun --nodes=1 --ntasks=1 --cpus-per-task=28 --qos=high ./openmp_102
)
Performance will differ, depending on whether there is a rand()
or not.
- If there is
rand()
: sbatch
will get result soon, while srun
will run forever (I just killed the program after enough waiting).
- If there is no
rand()
: sbatch
is 10x slower than srun
.
Note that rand()
has implementation-defined thread-safety (https://en.cppreference.com/w/c/numeric/random/srand), not sure if this has anything to do with the issue.
But indeed sbatch
can be slower than srun
, and this is a problem.