when I tried to alloc some memory on setonix, with script:
#!/bin/bash
#SBATCH --nodes=1
#SBTACH --ntasks=28
#SBATCH --partition=work
#SBATCH --account=courses0101
#SBATCH --mem=200G
export OMP_SCHEDULE="static,3125"
gcc -m64 -fopenmp -o project ./project.c
srun ./project
I tried: -mem=200G, 120G and 100G.
my program just call one function:
int* generate_sparse_matrix(int* maxNonZero, double probability){
int* matrix = alloc_matrix(ROW_NUM,COLUMN_NUM);
int* maxValues = malloc(sizeof(int) * ROW_NUM);
return;
}
If I run this program with the script , it is fine. however if I add openmp paralel:
like:
int* generate_sparse_matrix(int* maxNonZero, double probability){
int* matrix = alloc_matrix(ROW_NUM,COLUMN_NUM);
int* maxValues = malloc(sizeof(int) * ROW_NUM);
omp_set_num_threads(GENERATE_THREAD_NUM);
#pragma omp parallel
{
//operations without allocate memory
}
}
then it gives a hint:
slurmstepd: error: Detected 1 oom_kill event in StepId=15608085.0. Some of the step tasks have been OOM Killed.
srun: error: nid002280: task 0: Out Of Memory
srun: Terminating StepId=15608085.0
GENERATE_THREAD_NUM = 1.
Is there any clue about this issue?
Btw, I always feel the setonix doesn't give us really enough resource for calculating or memroy.
You can use squeue -u $USER to get all of your currently running job, and then use sstat -j <job-id> to show the details of your job, including an entry representing memory usage.
It's important to note that simply allocating memory via malloc() doesn't mean the system is actually using that memory. Most modern operating systems implement virtual memory management, which loads memory pages on demand. So, when you call malloc(), you're just modifying the page table, and the actual memory isn't allocated until it's accessed.
Before assuming there’s an issue with the system, I recommend carefully reviewing your code to ensure that memory is being used as expected.
And a reminder: if anything is not working as expected, you can use scancel <job-id> to kill your job so you will not have many jobs stuck in the queue.
I understand that but I dont think virual memeory management would affect the use of memory, this is what operation system and c library should gurantee.
If malloc return sucess I should be able to use it freely. In fact I have run my program on my laptop with a smaller memory size and it works fine.
So that is why I suspect setonix doesn't give enough resource. And I know some other students have similar issue about that.
Anyway, I'll have a test again but I don't expect that works since I have tested many times.
Probably the only solution is not to create the original matrix but I am not sure if it fits the project guidline.
By default, Linux follows an optimistic memory allocation
strategy. This means that when malloc() returns non-NULL there
is no guarantee that the memory really is available. In case it
turns out that the system is out of memory, one or more processes
will be killed by the OOM killer.
Sorry, the first line of the previous reply is incomplete:
Your computer may also have pagefile.sys or swapfile enabled, so it is possible that you will not encounter an OOM (Out of Memory) error as it will swap memory between physical memory and disk. However, it is never safe to assume that malloc will reliably return usable memory. If you turn off disk swapping it is likely that you will notice a similar behavior.
Thanks, probably this is the reason why it get killed.
However the tricky part is that if I run it without script, just ./project.
It can run sucessfully sometimes.