It's UWAweek 47

help5507

This forum is provided to promote discussion amongst students enrolled in CITS5507 High Performance Computing.

Please consider offering answers and suggestions to help other students! And if you fix a problem by following a suggestion here, it would be great if other interested students could see a short "Great, fixed it!"  followup message.

How do I ask a good question?
Displaying selected article
Showing 1 of 148 articles.
Currently 1 other person reading this forum.


 UWA week 36 (2nd semester, mid-semester break) ↓
SVG not supported

Login to reply

👍?
helpful
5:53pm Fri 6th Sep, ANONYMOUS

A minimal test case to show the performance difference.

openmp_101.c:

#include <inttypes.h>
#include <limits.h>
#include <omp.h>
#include <stdint.h>
#include <stdio.h>
#include <stdlib.h>

int main() {
    printf("omp_get_max_threads(): %u\n", omp_get_max_threads());
    fflush(stdout);

    double start = omp_get_wtime();

    const uint64_t loops = INT_MAX;
    uint64_t sum = 0, rand_sum = 0;

#pragma omp parallel for reduction(+ : sum) reduction(+ : rand_sum)
    for (uint64_t i = 0; i < loops; i++) {
        sum += 1;
        rand_sum += rand(); // or: rand_sum += (i * i);
    }

    double end = omp_get_wtime();

    // Print the time taken and the sums calculated
    printf("Time taken (wtime): %.3f seconds\n", end - start);
    printf("Loops: %" PRIu64 "\n", loops);
    printf("Sum: %" PRIu64 "\n", sum);
    printf("nRand_sum: %" PRIu64 "\n", rand_sum);
}

openmp-101.sh:

#!/bin/bash

#SBATCH --nodes=1
#SBATCH --ntasks=1
#SBATCH --cpus-per-task=28
#SBATCH --qos=high

cc -fopenmp -o openmp_101 ./openmp_101.c
export OMP_NUM_THREADS=56
srun ./openmp_101

Run with sbatch:

sbatch openmp-101.sh

Run with srun:

(
    cc -fopenmp -o openmp_102 ./openmp_101.c
    export OMP_NUM_THREADS=56
    srun --nodes=1 --ntasks=1 --cpus-per-task=28 --qos=high  ./openmp_102
)

Performance will differ, depending on whether there is a rand() or not.

  • If there is rand(): sbatch will get result soon, while srun will run forever (I just killed the program after enough waiting).
  • If there is no rand(): sbatch is 10x slower than srun.

Note that rand() has implementation-defined thread-safety (https://en.cppreference.com/w/c/numeric/random/srand), not sure if this has anything to do with the issue.

But indeed sbatch can be slower than srun, and this is a problem.

The University of Western Australia

Computer Science and Software Engineering

CRICOS Code: 00126G
Written by [email protected]
Powered by history
Feedback always welcome - it makes our software better!
Last modified  8:08AM Aug 25 2024
Privacy policy