Thursday, March 22, 2007

Error Will Robison

After my post on the mental challenge of juggling multiple programming languages, I realized another reason I like to stick to a few: grokking the error messages.

In an ideal world the various messages kicked out from ill-formed or ill-performing code would always precisely and instantly finger the exact problem -- in which case the programming environment would just go fix them. Some programming languages do try to assist you a lot. For example, Perl often guesses that you really didn't mean to have a quoted string run over many lines, and thereby shows you where the long quote starts. Similarly, it will often suggest that a semicolon addition that might cure the problem.

But not always are the error messages on the mark -- sometimes the wrong quote is really a bit before where it points out, or a missing semi-colon is not the problem. Worse, when the Perl interpreter chokes on a program, it generally spits out one of two error messages: Out of Memory or Segmentation Fault. In either case one goes looking for an inadvertant infinite loop or endless recursion (the Perl debugger catches these quite often with a more informative message). One common Perl trap is the one letter deletion which converts nicely behaving code such as:

while (/([A-Z])/g) { push(@array,$1); }

into
while (/([A-Z])/) { push(@array,$1); }


Another gotcha (or should I say, gotme) is the wrong loop end condition
for ($i=10; $i>0; $i++ { print "$i\n"; }
. Again, the symptom tends to be running out of memory on a trivial task.

My problems don't tend to call for much recursion, so when I get 100 levels in it must be a mistake. Most commonly it is due to a botched lazy initiation -- a scheme by which some complex object doesn't set up internal states until they are asked for. I do this a lot right now as I have objects representing complex data collections stored in a relational database, and it doesn't make time sense to slurp every last piece of data from the database when you only want a few. However, one must be careful:


sub getId
{
my ($this)=@_;
unless (defined $this->{'foo'})
{
$this->createFoo();
}
}

sub createFoo
{
my ($this)=@_;
my $id=$this->getId(); # round-and-round we go!
}


My errors with R tend to fall into a small number of categories, and the error messages are generally informative. Out of memory means I really did blow out memory. Trivial syntax errors (changing the assignment <- to <= or =), passing nulls (or no data) to something which doesn't care for it, etc.

On the other hand, I'm glad that I don't do a lot of Oracle (SQL) programming, or at least a wide variety of it, because the error messages there are as clear as mud to me. Luckily, there is a small number of mistakes I make; probably 95% fall into : misspelling a table name or alias, misspelling a column name, letting Perl-isms slip in ($column), missing commas, and extraneous commas. The only that is SQL-specific is botching GROUP BY columns and functions. The only runtime errors I tend to get are either minor hiccups from database inconsistencies or queries that never seem to return because of a botched join.

It looks like I might have a real need to learn C#, which means reverting a bit of a decade (when I used C++). Learning the language is one thing; learning the hidden language of error messages always takes a lot longer

1 comment:

Pedro Beltrao said...

I could not find your contact information so I ask this in a comment.
I am organizing this month's edition of Bio::Blogs, a monthly bioinformatic blog carnival and I am planning to include a link this post.
Alongside the blog post I will try to make an PDF version for offline reading. Would you authorize that I include this blog post in the PDF version as well ? Full attribution and a link to the online version will be included. If you still read this today you can reach me at bioblogs in gmail. Many thanks.