Problem Statement
When I run the command on my local machine, I observe a noticeable performance boost as the number of threads increases.
However, on Setonix, regardless of whether I specify 1 or 128 threads, the performance remains consistently slow and unchanged. I checked the thread count in the job output, and it does appear to spawn the correct number of threads, but the overall performance stays the same.
Local output
% ./openmp_basics 1 100000 10000 10
Using 1 threads..
Time taken: 9.465 seconds.
% ./openmp_basics 8 100000 10000 10
Using 8 threads..
Time taken: 2.030 seconds.
Setonix output
tail slurm-15112799.out
Using 128 threads..
Time taken: 12.112 seconds.
Here is my batch file:
#!/bin/bash
#SBATCH --account=courses0101
#SBATCH --partition=work
#SBATCH --nodes=1
#SBATCH --ntasks=1
#SBATCH --ntasks-per-node=1
#SBATCH --cpus-per-task=128
#SBATCH --mem=200G
#SBATCH --time=00:30:00
export OMP_NUM_THREADS=${SLURM_CPUS_PER_TASK}
cc -fopenmp -o openmp_basics ./openmp_basics.c
srun ./openmp_basics $OMP_NUM_THREADS 100000 10000 10
I have tried many things, still stuck here. Any idea?