It's UWAweek 47

help5507

This forum is provided to promote discussion amongst students enrolled in CITS5507 High Performance Computing.

Please consider offering answers and suggestions to help other students! And if you fix a problem by following a suggestion here, it would be great if other interested students could see a short "Great, fixed it!"  followup message.

How do I ask a good question?
Displaying the 13 articles in this topic
Showing 13 of 148 articles.
Currently 16 other people reading this forum.


 UWA week 36 (2nd semester, mid-semester break) ↓
SVG not supported

Login to reply

👍?
helpful
9:27am Fri 6th Sep, ANONYMOUS

My sparse multiplication algorithm used about 1 sec for size 10000 but after that the time growth in a exponential speed and very likely would take about an hour for size 100000. I guessing I might not using the optimal algorithm, is there any hint?


SVG not supported

Login to reply

👍?
helpful
11:17am Fri 6th Sep, ANONYMOUS

Thats not far off the times I'm having as well, I'm under the assumption it is supposed to take quite a long time. Though I could also not be using the optimal algorithm but regardless I'm assuming it will take a long time.


SVG not supported

Login to reply

👍?
helpful
6:33pm Fri 6th Sep, ANONYMOUS

Likewise - 10000 took around a second, and for me doubling the size led to a 10x time increase, so mine will take somewhere around 3 hours Feels quite long however, it will make repeated tests and comparisons difficult


 UWA week 37 (2nd semester, week 7) ↓
SVG not supported

Login to reply

👍?
helpful
10:17pm Mon 9th Sep, ANONYMOUS

Hi, Just wondering how many threads you are using to get 1 second. Currently I am using one thread to get the output around 1. Thanks.


SVG not supported

Login to reply

👍?
helpful
8:27am Tue 10th Sep, Benjamin W.

What does your sbatch script contain? My Setonix jobs sit in a queue for a full day before executing, and OpenMP has been doing nothing for me. What values do you have for --nodes and --ntasks, and are you using the --cpus-per-task directive?


SVG not supported

Login to reply

👍?
helpful
3:08pm Tue 10th Sep, Jiandong W.

Hi Benjamin, At this moment, I just use command to test my program and set the total thread number to 1. I will do the batch bash program later to add the nodes or cores for getting more test results. Thanks, Joey


SVG not supported

Login to reply

👍?
helpful
12:36pm Thu 12th Sep, Daniel L.

For me I've currently been using: #SBATCH --nodes=1 #SBATCH --ntasks=1 #SBATCH --cpus-per-task={I change this for the thread count} #SBATCH --partition=work #SBATCH --account=courses0101 #SBATCH --mem=220G (definitely overkill, but works) #SBATCH --time=01:30:00
>From my understanding, you don't need to change the nodes since OpenMP is used (in this project) to maximise the performance on a single node, and neither the ntasks (I could be wrong about the ntasks).
Hope this helps regarding the Setonix variables.


 UWA week 38 (2nd semester, week 8) ↓
SVG not supported

Login to reply

👍x1
helpful
10:30am Mon 16th Sep, ANONYMOUS

Isn't that changing the number of "cores". It seems --cpus-per-task can be up to 128, but I assume we should set it 28 since the task says it should use 28 cores. Then we can change the number of threads used with omp_set_num_threads while keeping cores the same? I'd assume that there would be no optimal value (U shape graph) for --cpus-per-task, but rather it would continually get faster the higher the number of cores that are used.


SVG not supported

Login to reply

👍?
helpful
12:48pm Mon 16th Sep, ANONYMOUS

Yeah I agree with the one above. I fixed core to 28 and got a perfect U shape by setting different thread numbers.


SVG not supported

Login to reply

👍?
helpful
12:51pm Mon 16th Sep, ANONYMOUS

I believe a proper implementation should be able to perform the multiplication for p = 0.05 and N = 100,000 in under ten minutes. In fact, ten minutes is a generous estimate, and your program will likely complete the task in less time.


SVG not supported

Login to reply

👍x1
helpful
1:18pm Mon 16th Sep, Jinqiang L.

I don't think it make too much sense to use a thread number larger then core number since they block each other and could never be faster...


SVG not supported

Login to reply

👍x1
helpful
1:31pm Tue 17th Sep, ANONYMOUS

Well actually it would usually make sense. Pretend we have 1 thread and 1 core. Every time one of the items in a row or column of the matrix needs to be accessed, that data needs to be pulled from the RAM or a cache into the CPU, this can take hundreds of CPU cycles. During these cycles, the CPU is typically idle, waiting for the data to arrive. If there were two threads however, the second could have already preloaded its data in the cache so that when thread 1 is finished executing and needs to recall the next piece of data, the CPU can immediately switch to thread 2 and consume that one's preloaded data, while waiting for thread 1 to retrieve more data from the RAM.


SVG not supported

Login to reply

👍?
helpful
12:03am Wed 18th Sep, ANONYMOUS

On the Pawsey documentation they refer to the --cpus-per-task as the max number of threads in an OpenMP application: "--cpus-per-task=<c> (or -c <c>): specifies the number of cores assigned to each process (or task related to the --ntasks option). For OpenMP jobs, and multithreaded programs in general, this implies each task may have up to c threads running in parallel. The value for the --cpus-per-task should correspond to the one associated with the OMP_NUM_THREADS variable for OpenMP applications."

The University of Western Australia

Computer Science and Software Engineering

CRICOS Code: 00126G
Written by [email protected]
Powered by history
Feedback always welcome - it makes our software better!
Last modified  8:08AM Aug 25 2024
Privacy policy