It seems the project 2 marks have been made available on csmarks. May I know where should the inquiry about the project goes? And what is the average score for the project?
Additionally, what should we do if N is so large that we cannot gather the results into a single process? When N is large, even if the original matrix has a low non-zero rate, the resulting matrix is likely to have a non-zero rate close to 1. So for ...
Given that the project involves working with large values of N, there's a risk of integer overflow if the program doesn't handle numerical limits carefully. I would suggest that fellow students take care to avoid this issue, and the marker should b...
Thank you for your response; I appreciate it. I didn t expect to see staff available on such a weekend.
I believe I have a third option fully generating the operands for all processes while ensuring that the results are consistent across them. This...
In Project 1, the process was straightforward you generated the operand matrices, and they were shared across all threads. However, in Project 2, we may need to broadcast the operands to different processes, at least partially.
Is this what we're ex...
I mentioned in another thread that a multiplication runtime of around 600 seconds is quite reasonable.
BTW, I noticed that you are using sbatch which is supposed to be 10 times slower than srun .
I have posted about this before, but there is no offi...
Sorry, the first line of the previous reply is incomplete
Your computer may also have pagefile.sys or swapfile enabled, so it is possible that you will not encounter an OOM (Out of Memory) error as it will swap memory between physical memory and disk...
You can use squeue -u USER to get all of your currently running job, and then use sstat -j job-id to show the details of your job, including an entry representing memory usage.
It's important to note that simply allocating memory via malloc() ...
I believe a proper implementation should be able to perform the multiplication for p 0.05 and N 100,000 in under ten minutes. In fact, ten minutes is a generous estimate, and your program will likely complete the task in less time.
I agree. It might work in later versions, but earlier versions of OpenMP could exhibit undefined behavior.
In this case, I think we might just need to choose the option that makes the most sense.
I wrote two separate functions. One generates the compressed form directly, without relying on an existing X; The other generates the compressed form from an existing X.
I hope this covers what they're asking for, as at least one of these approaches s...
OpenMP offers five scheduling schemes for the pragma omp parallel for directive.
However, out of these, only static , dynamic , and guided are the real deal when it comes to scheduling.
The other two, runtime and auto , aren t true scheduli...
A minimal test case to show the performance difference.
openmp 101.c
c
include inttypes.h
include limits.h
include omp.h
include stdint.h
include stdio.h
include stdlib.h
int main()
printf("omp get max threads() u n", omp get max ...
Moreover I have never tried to use sbatch before because I thought running a program with srun should have identical or at least similar performance.
For example, if I were you I would run openmp 101-b.sh as
bash
cc -fopenmp -o openmp 101 . op...
First I would like to point out that your code may have undefined behavior. int i and long loops have different data type, and loops have value equal to 10000000000 which is greater than INT MAX, so the loop should never finish. But since the beh...
In addition to my previous response, I realize that it may be impossible to perfectly time a multi-threaded function.
Although clock gettime(CLOCK THREAD CPUTIME ID, t) excludes scheduling delays, we will always call the function in the master thre...
Using --cpus-per-task 128 is the correct way to specify the thread count. Since we don not have control over how Setonix operates, we must work with its scheduling behavior as best we can.
I still cannot see the C code, so I am not sure how you meas...
I just assumed that all matrices are square and defined a constant N for both rows and columns. I hope our UC does not impose a strict limit on this. After all, these are just minor details; the main focus of the project should be on the parallel alg...
I don't expect this to be answered within the week. The funny thing is, the deadline for our lab report is next Friday, which falls in the same week as numerous mid-semester tests. Alas, best wishes to everyone.
SYSTEM WARNING YOUR STUDENT NUMBER HAS BEEN RECORDED FOR NOT BEING RESPECTFUL TO TEACHING STAFF. YOUR INTERNET BROWSER HISTORY IS NOW BEING REVIEWED FOR FURTHER EVIDENCE OF INAPPROPRIATE BEHAVIOR.
Just kidding. I agree with you mate.
I only check the result if N 10,000 .
100,000 3 is an enormous number, even for Setonix. The computation for the compressed matrix is feasible only because we reduce it by a factor of p 2 , where p is 0.01, 0.02, or 0.05. I doubt any...
I would say for N 100000 , it is more efficient to store the result in an uncompressed form. When N is large, even if the original matrix has a low non-zero rate, the resulting matrix is likely to have a non-zero rate close to 1 .
Hi Professor,
If we have a matrix A with N rows and N columns, we can compress it into two new matrices, AX and AY , each with N rows and M columns. The problem is, we don t know what M is when allocating memory for ...