help5507

It's UWAweek 47

help5507

This forum is provided to promote discussion amongst students enrolled in CITS5507 High Performance Computing.

Please consider offering answers and suggestions to help other students! And if you fix a problem by following a suggestion here, it would be great if other interested students could see a short "Great, fixed it!" followup message.

How do I ask a good question?

Displaying selected article

Showing 1 of 148 articles.
Currently 3 other people reading this forum.

UWA week 36 (2nd semester, mid-semester break) ↓

Need help with getting performance on Setonix (all 11)

8:18pm Thu 5th Sep, ANONYMOUS

In addition to my previous response, I realize that it may be impossible to perfectly time a multi-threaded function.

Although clock_gettime(CLOCK_THREAD_CPUTIME_ID, &t) excludes scheduling delays, we will always call the function in the master thread. If the master thread finishes before other threads, the recorded CPU time for the master thread may be shorter than the CPU time consumed by the longest-running thread in the execution. Ideally, as shown in the figure, we would want to capture the CPU time consumed by the longest-running thread, which is the child thread in this case, but it is just impossible.

   Child Start - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - Child Complete
  /                                                                                        \
Clock() - Master Start - Master Complete - (Wait for Child, does not consume CPU time) - Clock()

#include <bits/stdc++.h>
#include <omp.h>
using namespace std;

double myclock() {
  struct timespec t;
  clock_gettime(CLOCK_THREAD_CPUTIME_ID, &t);
  return t.tv_sec + t.tv_nsec / 1e9;
}

int main() {
  cout << omp_get_max_threads() << endl;
  double a = myclock();

#pragma omp parallel for
  for (int i = 0; i < 2; ++i) {
    if (omp_get_thread_num() != 0) {
      double c = myclock();
      // simulate some work
      for (uint j = 0; j < 1000000000; ++j)
        ;
      double d = myclock();
      cout << omp_get_thread_num() << endl;
      cout << (d - c) << endl;
    }
  }

  double b = myclock();
  cout << (b - a) << endl;
}

$ g++ 1.cpp -fopenmp; ./a.out # run without optimization so the long loop will not be optimized out
256
1
0.653028
0.0163417

help5507 articles all ANNOUNCEMENTS recent articles login to post a new article help5507 topics from today and yesterday from the past week from the past two weeks ALL topics, most recent first	login chapter0 unitinfo research projects csforum csse-feedback csbreakdown cssubmit csmarks 2nd semester help fora Algebra Advanced Algorithms Cloud Computing High Performance Computing Intelligent Agents Professional Computing Relational Database Management Systems Systems Programming