Tuesday, April 15, 2008

"Time Is What Prevents Everything From Happening At Once.." - John Wheeler (1911-2008)

The quote continues: "Space is what prevents everything from happening to me!"
John Wheeler, has for a long time, been studying advanced concepts of theoretical physics, which for the most people are somewhat taken for granted, space,time, gravity and so on. He passed away in Sunday, 13th of April 2008, at the age of 97.

John Wheeler was one among the last of the 'big school' of theoretical physics. He has worked with legendary figures such as Niels Bohr, Oppenheimer and Albert Einstein(related photo) and he is the inventor of the popular term 'black hole', this strange 'creature' that is so big that even its own image cannot evade from its infinite gravity (visible light cannot reflect or be emitted) At some point in 1941 Wheeler joined the famous (or infamous) Manhattan project. Among his graduate students as a professor in Princeton, were Richard Feynman and Kip Thorne.

I suppose, everybody from a CS background has developed at some stage of his life a certain interest for astronomy and physics. I, my self, at a young age was a fan of such fields along with science fiction. I never actually understood what was going on with these strange quantum equations but I think I find my self equally astonished when confronted with concepts like 'black holes' or 'time bending'. Astonished at least.

You can find lots more information in the official press release from the Princeton University site.

Sunday, April 6, 2008

Why The Array Index Should Start From 0

I recently came across an article from Dijkstra(photo) explaining a fact that looked obvious to me before: Why should start indexing an array from 0. I was amused to find out that there is some clever thinking behind this.

All popular languages, like C/C++, Java or Perl start indexing an array from 0 while the last index is the array length minus 1. While this is usual to most developers, it is not a de facto situation for all programming languages. For example in Fortran, when you declare an array with 'integer a(10)', aka an int array having 10 elements, the index starts from 1 and ends at 10 (however you can override this behavior using a statement like, integer a(0:9), declaring an array with indices from 0 to 9, but anyway).

The discussion over why the array index should start at zero is not a trivial one and relates to interesting concepts from computer science. First of all, it has strong relation to language design. For example in C, the name of an array is essentially a pointer, a reference to a memory location, and so the expression array[n] refers to a memory location n-elements away from the starting element. This means that the index is used as an offset. The first element of the array is exactly contained in the memory location that array refers (0 elements away), so it should be denoted as array[0]. Most programming languages have been designed this way, so indexing from 0 is pretty much inherent to the language.

However, Dijkstra explains why we should index from 0. This is a problem on how to denote a subsequence of natural numbers, say for example 1,2,3,...,10. We have four solutions available:
a. 0<i<11
b. 1
<=i<11
c. 0
<i<=10
d. 1
<=i<=10

Dijkstra argues that the proper notation should be able to denote naturally the two following cases:
1. The subsequence includes the smallest natural number, 0
2. The subsequence is empty


Requirement 1. leaves out a. and c. since they would have the form -1<i which uses a number not lying in the natural number set (Dijkstra says this is ugly). So we are left with b. and d. Now requirement 2. leaves out d. since for a set including 0 that is shrunk to the empty one, d. takes the form 0<=i<=-1, which is a little...well, messed up! Subtracting the ranges in b. we also get the sequence length, which is another plus. Hence we are left with b. which is by far the most widely used notation in programming now.

Now you know. So, remember and take pride in the fact that each time you write something like

for(i=0;i
<N;i++)
{
sum+= a[i];
}
you are not just following the rules of language notation. You are also promoting mathematical beauty!

Thursday, April 3, 2008

Perl by Example: English Dictionary In 22 Lines

[Update #1] This post has been cited from Princeton's WordNet Project as a nice Perl extension. You can find lots of more of information about WordNet here.

[Update #2] Many thanks to my brother Costas, for buying me the ultimate Perl book: "Programming Perl, 3rd edition" by Larry Wall. Thanx bro! :)

WordNet is an online English dictionary from the Princeton University. It has some hundreds of thousands of words and it keeps growing. You have absolute access to the resource and there are interfaces in practically all programming languages. It is not tough to build one from your own, but I ceize the opportunity to make an introduction to a really special programming language: Perl. Of course a crappy blog post cannot introduce a such extensive language as Perl, but additional Perl articles are sure to follow!

For the history, Perl was designed as a so-called 'glue' language for Unix, simply for doing things that with other tools were somewhat difficult. Created in the mid-80's by Larry Wall, in a one-man project as a fast reporting tool, it is now spread to all machines and applications. I have never used Perl for large GUI-demanding projects, but it has been a killer for simple but heavy tasks.

Whatever the objections, you may find Perl great for one simple but rare thing: Perl has culture. Its creator and all people actively involved in Perl's development share a number of values: Freedom, innovation and a unique sense of humor. To explain, I have two Perl books in my library: "Learning Perl" and "Programming Perl", affectionately called by the Perl community as the "Alpaca Book" and the "Camel Book" for the animals on the cover (Camel is the Perl mascot as shown in the figure). For me they are the most easy-read programming books that I have read. Anything else I use as a reference because I find programming books too hard to follow for long. But these differ. The style of writing and the humor that they transmit make you read them in literally one breath.

Anyway, the best way to introduce a language is by example, so here is mine for Perl: To build an English dictionary based on Princeton's WordNet. Believe it or not, all in all it takes 22 lines to do it! You may see the interface with the program in the following video. When you are done you might want to check the source code with some comments for better understanding.


The Perl source code follows:

my $word; #declares a scalar variable.Perl has two basic types: scalars and plurals(lists,hashes)
our $message = "
\nPlease enter a word to look for:\n>";
print $message;
#prints our message
while($word = <>) #reads the word the user searches for
{
chomp($word);
#remove last character (the '\n', new line)
if($word eq "<") { system("cls"); #if user types "<" clear screen
print $message; }
else {
print "Connecting..";
my $url = "http://wordnet.princeton.edu/perl/webwn"."?"."s=".$word;
#WordNet URL
`wget -q -Oindex.html $url`; #download the results
open FILE,"<","index.html"; #open the result web page locally
print "\rOk.Results:";
while(<FILE>)
#for every line in file
{
if(/<\s*b>$word<\/b>[^\(]+\(([^\)]*)\)/)
#match a pattern, don't be scared!
{ print "\n#def#$1"; } #..and print the word definition !
}
close FILE;
#close file
print "\n>";
}
}



What might scare you off is the pattern matching line, especially if you have never come across regular expressions before. The logic is this: In WordNet result page, each word definition comes with original word in bold and the explanation in parentheses. For example if we search for 'developer' it may have the following format:
...developer...(a man who develops)...
...developer...(sometimes referring people creating software)...and so on
What the pattern matching does it to match if the line has the word we search for in bold and then extracts the sentence in parentheses.
When it comes to pattern matching, Perl is the king.

That's all. If you already have a Perl distribution, you can test it yourself. If you don't and you would like to give it try, for Windows there mainly two distributions. The one is from ActiveState and you can download a Windows Perl distribution named ActivePerl here (just click the download button). The other is called Strawberry Perl, it recently came out of beta and you can find it here. I haven't tried but it may well be good.

Needless to say, Perl is free and always will be! So ride the camel and enjoy!