It's UWAweek 48

help2003/help4407

This forum is provided to promote discussion amongst students enrolled in Open Source Tools and Scripting.

Please consider offering answers and suggestions to help other students! And if you fix a problem by following a suggestion here, it would be great if other interested students could see a short "Great, fixed it!"  followup message.

How do I ask a good question?
Displaying the 11 articles in this topic
Showing 11 of 564 articles.
Currently 4 other people reading this forum.


 UWA week 19 (1st semester, week 10) ↓
SVG not supported

Login to reply

👍?
helpful
12:11pm Thu 12th May, ANONYMOUS

Hi Michael, The requirement of code efficiency is pretty vague to me. What is considered inefficient? For sure reading the same file multiple times is. But if you read the file once and then perform similar operations on it multiple times (like grepping for a string) is that also considered inefficient? Cheers


SVG not supported

Login to reply

👍?
helpful
1:21pm Thu 12th May, Michael W.

Hi, grepping multiple times, esp if it relates to a file, can be very slow. Clearly the texts of the input files need to be transformed, and doing everything in one scripts is a recipe for disaster. Better a separation of concerns, as suggested by the Tips and Tricks section. However, at each stage unnecessary data is being ignored, so downstream computations will have less to do. Using Awk arrays can also be helpful. Bottom line. We have not covered that in class, and for many people in the class, it's their first experience of programming, so I'm really concerned here about gross inefficiency, which will be pretty evident because the code will be very slow. Cheers MichaelW


SVG not supported

Login to reply

👍?
helpful
1:37pm Thu 12th May, ANONYMOUS

May I just add, the focus in this course is putting pre-built programs together. We haven't studied their underlying implementations. So it would be difficult to ascertain the O complexity of our programs.


SVG not supported

Login to reply

👍?
helpful
2:34pm Thu 12th May, Michael W.

Hi, Sorry for being unclear. I am only referring to the programs you write. For example, using cut multiple times, as you may have had to do in the last assignment because we'd not covered awk, is very inefficient (which is why efficiency was not mentioned). Another example: you can use multiple greps, which each search for one thing at a time, or you can use a single Awk (or perhaps Sed) script, which trawls the data once. That said, given that we've not really touched on this I'm really only focused on big picture stuff. Cheers MichaelW


SVG not supported

Login to reply

👍?
helpful
2:56pm Thu 12th May, ANONYMOUS

Michael, thank you for clarifying. Can I ask, when using a Sed script and running several commands, does it trawl through the data just once, or again for each command? For instance, if converting text to tokens, and I do sed -e (command that tokenises commas) -e (command that tokenises question marks), will this be considered inefficient?


SVG not supported

Login to reply

👍?
helpful
3:03pm Thu 12th May, Michael W.

Hi, The big cost is slurping the lines out of the files, which is slower than having what you need in memory already. What Sed and Awk do is grab a line at a time. Your program can then do whatever with the line, before a new line is obtained. Another trick is to get rid of data you don't need early so there is less for downstream analyses to trawl through. Cheers MichaelW


SVG not supported

Login to reply

👍?
helpful
4:41pm Thu 12th May, Daniel S.

As Michael has said, both sed and awk go one line at a time through a file. So `sed -e (something) -e (something else)` is one pass through a file, but `sed -e (something; sed -e (something else)` is two passes.


SVG not supported

Login to reply

👍x1
helpful
8:29am Fri 13th May, Leon W.
Edited: 2 mins later

Edit: see below


SVG not supported

Login to reply

👍?
helpful
8:29am Fri 13th May, Leon W.
Edited: 2 mins later

Edit: doubled post


SVG not supported

Login to reply

👍?
helpful
8:29am Fri 13th May, ANONYMOUS

I had a think about this (as I'm doing the same thing), and I realized that in terms of efficiency, it's not a big difference. UNLESS you're cutting data on each sed call to reduce the number of steps. If you loop over n lines 50 times, and perform 2 sed operations, that's 100 steps. If you loop over n lines 50 times for one operation, and do it again for a second operation, that's still 100 steps. In both cases, the big O complexity is O(n). If I've misunderstood, please correct me! As I understand, we won't be tested on efficiency this way anyway (Michael said we'll just be tested on how fast our program runs).


SVG not supported

Login to reply

👍?
helpful
8:47am Fri 13th May, Michael W.

No, you have well understood it 99% correctly. The other 1% is the land of the Gotcha. In this case, Yes, the computational complexity is exactly the same. The other 1% is that working in the computer's memory is faster than going out to the long term storage. That was particularly the case when long term storage was spinning disc. Even in the era of RAM disc, it's still significantly slower. That's why large data science applications require HUGE amounts of working memory. Cheers MichaelW

The University of Western Australia

Computer Science and Software Engineering

CRICOS Code: 00126G
Written by [email protected]
Powered by history
Feedback always welcome - it makes our software better!
Last modified  1:17AM Sep 14 2022
Privacy policy