Faculty of Engineering and Mathematical Sciences 
Not logged in (login)

help4407


This forum is provided to promote discussion amongst students enrolled in Open Source Tools and Scripting (CITS4407).

Assignment 2 and Clarifications.
 
Options:
RSS cloud
Jump to:

Output for Question 2 in Exercise 3

9 of 352 articles shown, currently no other people reading this forum.
photo
From: Rohit A.
Date: Thu 26th Mar, 9:54pm
Summary: sorcerer
Actions: 
        Login-to-reply
Hi, 

Q2 of exercise 3 states that we need to print the unique words from the text file  
unix-1969-1971.txt and defined words 'a sequence of one-or-more alphabetic 
characters' i.e No numbers and punctuation marks/symbols. 

I've used the following commands to print the same, but my solution words list 
starts with the following words: 

cmd: cat unix-1969-1971.txt | tr " " "\n" | tr -d '[:punct:] [:digit:]' | sort | 
uniq

o/t: 

a
about
actual
adapted
almost
also
an
and
another
as
ASR
at
barely
batch
be
beasts

Which is different from the mentioned output in question. 
The output as per the question should begin with the following words: 

ASR
Bell
But
Computer
He
John
Ken
Laboratories

Can anyone shine some light on where I went wrong? 

Thank you 

Output for Question 2 in Exercise 3

photo
From: David M.
Date: Thu 26th Mar, 10:13pm
Actions: 
        Login-to-reply
"Rohit Atri"                               wrote:

> Hi, 
> 
> Q2 of exercise 3 states that we need to print the unique words from the text file  
> unix-1969-1971.txt and defined words 'a sequence of one-or-more alphabetic 
> characters' i.e No numbers and punctuation marks/symbols. 
> 
> I've used the following commands to print the same, but my solution words list 
> starts with the following words: 
> 
> cmd: cat unix-1969-1971.txt | tr " " "\n" | tr -d '[:punct:] [:digit:]' | sort | 
> uniq
> 
> o/t: 
> 
> a
> about
> actual
> adapted
> almost
> also
> an
> and
> another
> as
> ASR
> at
> barely
> batch
> be
> beasts
> 
> Which is different from the mentioned output in question. 
> The output as per the question should begin with the following words: 
> 
> ASR
> Bell
> But
> Computer
> He
> John
> Ken
> Laboratories
> 
> Can anyone shine some light on where I went wrong? 
> 
> Thank you 

Output for Question 2 in Exercise 3

photo
From: David M.
Date: Thu 26th Mar, 10:17pm
Actions: 
        Login-to-reply
The order of words in your output looks unusual: Ie.g.  would expect uppercase "A" to precede lower case "a".

Could this warning from the man page for SORT(1) be relevent to your problem?

***  WARNING  ***  The  locale specified by the environment affects sort order.  Set LC_ALL=C to get the traditional sort order that uses native byte values.

Output for Question 2 in Exercise 3

photo
From: Khushboo S.
Date: Mon 30th Mar, 2:00pm
Actions: 
        Login-to-reply
I'm using the same command and the output is totally incorrect. I am also getting 
duplicate words. Not sure what is going wrong.

$ cat unix-1969-1971.txt | tr -d "[:digit:][:punct:]" | sort -u | tr -s " " "\n"

Output is :

Computer
hardware
was
at
that
time
more
primitive
than
even
people
who
Laboratories
Ken
Thompson
Thompson
had
been
a
researcher
on
the
Multics
Thompson
was
left
with
some
Multicsinspired

Output for Question 2 in Exercise 3

photo
From: Christopher M.
Date: Mon 30th Mar, 2:15pm
Actions: 
        Login-to-reply
"Khushboo Soni"                               wrote:

> I'm using the same command and the output is totally incorrect. I am also getting 
> duplicate words. Not sure what is going wrong.
> 
> $ cat unix-1969-1971.txt | tr -d "[:digit:][:punct:]" | sort -u | tr -s " " "\n"

Hello,

When you have a problem like this - unexpected output from a long pipeline of commands - 
perform each of the subparts of the command "by hand" to ensure that what you *hope* is 
passing through each pipe is really what's happening.

For example, with your command, try the subparts:

shell>  cat unix-1969-1971.txt | tr -d "[:digit:][:punct:]"

shell>  cat unix-1969-1971.txt | tr -d "[:digit:][:punct:]" | sort -u

and so on  (in fact, that very first subpart is not doing what you want it to do).
I hope that that helps.

________

Incidently, we never need to run:

shell>  cat filename | command ...

as we can run (the quicker)

shell>  command .... < filename

Output for Question 2 in Exercise 3

photo
From: Rohit A.  O.P.
Date: Thu 2nd Apr, 2:11pm
Summary: beer
Actions: 
        Login-to-reply
Hi David, 

Yes, this worked. 
Thank you David. 

I believe its because Ubuntu is not using a POSIX locale and hence the sort was producing a different kind of output. 

Thanks 

Output for Question 2 in Exercise 3

photo
From: Khushboo S.
Date: Thu 2nd Apr, 5:49pm
Actions: 
        Login-to-reply
How did you set the sort locale?

Thanks.

"Rohit Atri"                               wrote:

> Hi David, 
> 
> Yes, this worked. 
> Thank you David. 
> 
> I believe its because Ubuntu is not using a POSIX locale and hence the sort was producing a different kind of output. 
> 
> Thanks 

Output for Question 2 in Exercise 3

photo
From: David M.
Date: Thu 2nd Apr, 6:27pm
Actions: 
        Login-to-reply
Locales are a complicated topic.

Basically if you are using a non-English language computing environment you may need to run a command such as

$ export LC_ALL=C

for locale aware programs to behave as they do for US English language environments.

Output for Question 2 in Exercise 3

photo
From: Christopher M.
Date: Thu 2nd Apr, 6:56pm
Actions: 
        Login-to-reply
"David May"                               wrote:

> Locales are a complicated topic.
> 
> Basically if you are using a non-English language computing environment you may need to run a command such as
> 
> $ export LC_ALL=C

Thanks David;
when working in the US I usually also set:

  LANG=en_AU.US-ASCII

but I don't think that that's required for this exercise.

The command  man environ  also provides a little more explanation of these variables.
This Page


Program written by: [email protected]
Feedback welcome
Last modified:  8:27am May 24 2020