I recently came across
an article from Dijkstra(
photo) explaining a fact that looked obvious to me before: Why should start indexing an array from 0. I was amused to find out that there is some clever thinking behind this.
All popular languages, like C/C++, Java or Perl start indexing an array from 0 w

hile the last index is the array length minus 1. While this is usual to most developers, it is not a de facto situation for all programming languages. For example in Fortran, when you declare an array with
'integer a(10)', aka an int array having 10 elements, the index starts from 1 and ends at 10 (however you can override this behavior using a statement like,
integer a(0:9), declaring an array with indices from 0 to 9, but anyway).
The discussion over why the array index should start at zero is not a trivial one and relates to interesting concepts from computer science. First of all, it has strong relation to language design. For example in C, the name of an array is essentially a pointer, a reference to a memory location, and so the expression
array[n] refers to a memory location
n-elements away from the starting element. This means that the index is used as an
offset. The first element of the array is exactly contained in the memory location that
array refers (0 elements away), so it should be denoted as
array[0]. Most programming languages have been designed this way, so indexing from 0 is pretty much inherent to the language.
However, Dijkstra explains why we
should index from 0. This is a problem on how to denote a subsequence of
natural numbers, say for example 1,2,3,...,10. We have four solutions available:
a.
0<i<11
b. 1<=i<11
c. 0<i<=10
d. 1<=i<=10
Dijkstra argues that the
proper notation should be able to denote
naturally the two following cases:
1. The subsequence includes the smallest natural number, 0
2. The subsequence is empty
Requirement
1. leaves out
a. and
c. since they would have the form
-1<i which uses a number not lying in the natural number set (Dijkstra says this is
ugly). So we are left with
b. and
d. Now requirement
2. leaves out
d. since for a set
including 0 that is
shrunk to the empty one,
d. takes the form
0<=i<=-1, which is a little...well, messed up! Subtracting the ranges in
b. we also get the
sequence length, which is another plus. Hence we are left with
b. which is by far the most widely used notation in programming now.
Now you know. So, remember and take pride in the fact that each time you write something like
for(i=0;i<N;i++)
{
sum+= a[i];
}
you are not just following the rules of language notation. You are also promoting mathematical beauty!