I am not Amitava or A lab facilitator so they may have better answers than mine but:
For examples of referential locality. I would provide the examples of the activation stack and the executable code data.
Executable code will frequently be executed sequentially (so the next instruction it needs is stored directly after the current one). Or in loops, so a small section of code (likely stored in a clump together) will be executed repeatedly. This means that there is a very high chance the next instruction we want is nearby to the current one.
As for the stack, we only need to be able to access/view the top section of the stack (the section relevant to the current function call we are in). Hence the next bit of data we want to get from the stack will be 'clustered' in that top section of the stack, unless a return call occurs obviously.
Both of these can be related to virtual memory and paging, in that this means we will typically access a small number of pages very frequently, and the rest of the pages quite infrequently. By bring in data in chunks (pages), this allows us to reduce the amount of swapping in a virtual memory system. (Maximising efficiency). Depending on the replacement algorithm you could also take this into account when removing pages, to reduce the chance it will need to be immediately swapped back in.
(possibly could also argue that by removing non cluster pages we can minimise the working size needed for processes).
Hopefully that helps in the place of a Josh or Amitava response for now.