5:53pm Fri 6th Sep, ANONYMOUS

A minimal test case to show the performance difference.


#include <inttypes.h>
#include <limits.h>
#include <omp.h>
#include <stdint.h>
#include <stdio.h>
#include <stdlib.h>

int main() {
    printf("omp_get_max_threads(): %u\n", omp_get_max_threads());

    double start = omp_get_wtime();

    const uint64_t loops = INT_MAX;
    uint64_t sum = 0, rand_sum = 0;

#pragma omp parallel for reduction(+ : sum) reduction(+ : rand_sum)
    for (uint64_t i = 0; i < loops; i++) {
        sum += 1;
        rand_sum += rand(); // or: rand_sum += (i * i);

    double end = omp_get_wtime();

    // Print the time taken and the sums calculated
    printf("Time taken (wtime): %.3f seconds\n", end - start);
    printf("Loops: %" PRIu64 "\n", loops);
    printf("Sum: %" PRIu64 "\n", sum);
    printf("nRand_sum: %" PRIu64 "\n", rand_sum);


#SBATCH --nodes=1
#SBATCH --ntasks=1
#SBATCH --cpus-per-task=28
#SBATCH --qos=high

cc -fopenmp -o openmp_101 ./openmp_101.c
srun ./openmp_101

Run with sbatch:


Run with srun:

    cc -fopenmp -o openmp_102 ./openmp_101.c
    export OMP_NUM_THREADS=56
    srun --nodes=1 --ntasks=1 --cpus-per-task=28 --qos=high  ./openmp_102

Performance will differ, depending on whether there is a rand() or not.

  • If there is rand(): sbatch will get result soon, while srun will run forever (I just killed the program after enough waiting).
  • If there is no rand(): sbatch is 10x slower than srun.

Note that rand() has implementation-defined thread-safety (, not sure if this has anything to do with the issue.

But indeed sbatch can be slower than srun, and this is a problem.

