How I came to choose this topic to blog about first requires sharing the following story... which I'll preface with a little background.
Dr. Wife and I both do work related to mathematical biology, and frequently use computers to tackle some of the more unruly bits of math we encounter. Computer software for doing mathematics can roughly be divided into two main categories: The numerical software that crunches numbers (not a technical term, I just like calling them number crunchers), and the computer algebra systems (or CASs). This later variety do symbolic (or algebraic) manipulations like reducing fractions, and working with xs and ys instead of numbers.
For example, if you wanted to find a formula for the integral of
x^2+sin(x) from x=a to x=b,software like Maple, Mathematica or the free software Maxima would tell you the answer is
If it involves symbols, and algebraic manipulations - you want a CAS!
Now,if you just wanted to do something like plot data, generate random numbers or do statistical work -- computations using explicit numbers not just symbols -- then you'd want to use software that excels at crunching numbers. Something like Matlab, Octave, R or programming languages like fortran, C, or python (just to name a few).
Quick tangent: while I'm a fan of Maple and Matlab, they cost money. Enough money that I'd rather use free alternatives when I can, especially if I might want to share my code with friends or collaborators but don't want them to spend money just to run it. Anyway, that's my plug for Maxima (a CAS) and R (statistical software that is also a great alternative to Matlab) -- but lets get back to the story...
The other day, Dr. Wife and I were working at home. She had a bug in some R code that -- after much hair-pulling -- was finally attributed to the following unexpected behavior of R's round() function:
Notice there's something weird going on here... 1.5 gets rounded UP to 2 (as it should, if you think of rounding like most Americans), but 2.5 gets rounded DOWN to 2! What's going on with the round() function in R?! Just to make sure I wasn't screwed over by all of my math teachers, I also checked in Matlab which gave the expected result...> x = c(1, 1.5, 2, 2.5); > rbind(x=x, round = round(x)); [,1] [,2] [,3] [,4] x 1 1.5 2 2.5 round 1 2.0 2 2.0 >
So what's going on here???> x=[1 1.5 2 2.5]; > [x; round(x)] ans = 1.0000 1.5000 2.0000 2.5000 1.0000 2.0000 2.0000 3.0000 >
Quoting the wikipedia page on Rounding, there are different ways to round numbers depending on the task at hand...
Sadly, my favorite number crunching software (that would be R) uses one of the dumbest rules out there (well, dumb from a mathematical perspective) to decide what to do with numbers ending on ".5". Both R and Matlab use the "round to the nearest neighbor" rule -- but the way they deal with the half-way point turns out to be the the source of the discrepancy above.
- round to nearest: q is the integer that is closest to y...
- round towards zero (or truncate): q is the integer part of y, without its fraction digits.
- round down (or take the floor): q is the largest integer that does not exceed y.
- round up (or take the ceiling): q is the smallest integer that is not less than y.
- round away from 0: if y is an integer, q is y; else q is the integer that is closest to 0 and is such that y is between 0 and q.
In school, most of us learned this rule (again, from the wikipedia page on Rounding):
Round half upThis is what Matlab does, and is what most people think of when they round numbers.
The following tie-breaking rule, called round half up, is widely used in many disciplines:
That is, half-way values y are always rounded up. For example, by this rule the value 23.5 gets rounded to 24, but -23.5 gets rounded to -23...
- If the fraction of y is exactly .5, then q = y + 0.5
The numerical routines in R implement a different rule (italics added for emphasis):
Round half to evenThat's right - R rounds to even. Yes, it's primarily a stats platform. Yes, this is all explicitly stated in the documentation for the round() function, and of course I still love R, but seriously -- round to even? To make matters worse, that day we also had other bugs that were essentially caused by the fact that (in both Matlab and R)
A tie-breaking rule that is even less biased is round half to even, namely
Thus, for example, +23.5 becomes +24, +22.5 becomes +22, -22.5 becomes -22, and -23.5 becomes -24. This variant of the round-to-nearest method is also called unbiased rounding, convergent rounding, statistician's rounding, Dutch rounding, Gaussian rounding, or bankers' rounding...
- If the fraction of y is 0.5, then q is the even integer nearest to y.
So what's the moral of the story? Well I'm not quite sure yet ... I still like R, and I certainly will continue to use it. I'll probably end up reading through the documentation for frequently used functions -- not just unfamiliar functions -- and as much as I'd like to pretend it doesn't matter,I'll certainly keep an eye out for ways my code might get sabotaged by round-off error.> (0.1+0.05) > 0.15  TRUE >