ANONYMOUS wrote:
> Hi Arran,
> In lab 3.1 Binary Search-question b, you said the input domain consists of a set of all possible chars s (there are 256 of them). I am curious why there are 256 possible chars.
That's a simplifying assumption – I'll make a note of that in the lab sheet.
> Is it because each character is 8 bits long and thus has 256 possible characters? But I look it up in java doc(https://docs.oracle.com/javase/8/docs/api/java/lang/Character.html). It says Java characters now use Unicode Standard, and "The range of legal code points is now U+0000 to U+10FFFF". Does it mean there are 2^16 (or 65,536) possible chars now?
Actually, Java chars
are 2 bytes (16 bits) in size. They can therefore take on 65,5356 distinct values. (Also, note that you linked to version 8 of the Java standard library; the current version is actually Java 22, which has
different documentation for the Character
class.)
But that's not the same as the range of "legal code points" (U+0000 to U+10FFFF). There are 1,114,112 distinct code points, but not
all of them have been assigned to actual characters – e.g. in Unicode version 16.0, only 154,998 characters have been assigned so far.
In the version of Java you linked to (Java 8), version 6.2 of the Unicode standard was being used, which would have had fewer
assigned characters.
In Java, when a method needs to take an argument representing a "character", it can do so in two ways.
-
It can take a char
argument. In that case, it will be limited to the 65,536 values a char
can represent. The binary search method
in the lab is an example of a method like this.
-
It can take an int
argument, and in that case, the full range of Unicode characters can be used (from the version of Unicode being used
by that particular Java standard). However, a Java int
is 32 bits (4 bytes) in size, so many possible int
values don't represent
code points at all, and of the values that do represent possible code points, not all will have actually been assigned.
To find out how the method handles invalid values, you need to read the documentation for the method.
An example of a method like this is the indexOf
method of
the java.lang.String
class.
Now, that method says what the behaviour of the function will be if an int
is passed that falls in the U+0000 to U+10FFFF range (1,114,112
values), but doesn't say what the behaviour will be if an int
outside that range is passed; so we must take the behaviour to be undefined.
When answering question about ISP in tests or exams, you're welcome to make
simplifying assumptions, if needed. It would probably be a good idea to do so, if you have a question involving
character or String types, since you might not remember the exact size of a Java char
(16 bits) and may not have
access to the Unicode standard telling you how many assigned characters there are.
So you could make the simplifying assumption that we're only considering ASCII characters. (You should clearly state
that you're making this assumption, however, and that the actual number of possible characters is larger.)
In this case, you might say "If we assume for simplicity that a char
can represent 256 values ...", and then answer the
question on that basis. This is fine, because the markers are generally more interested in your reasoning than in
whether you can accurately recall exactly how many valid characters there are in the current Java standard.
Does that help at all?
Cheers
Arran