The Obligate Scientist: R, I still love you, but I hate your round() function!

You all remember the concept of rounding, right? I first learned to round numbers by taking them to the nearest integer (e.g. rounding 2.1 gives 2, rounding 3.9 gives 4, and so on) and in cases like 2.4 -- rounding up to the nearest integer. I'll admit that as a child I didn't always pay attention in math class, but I was a bit surprised to learn recently that there are a number of different rules for rounding (and that being unaware of this fact can totally ruin your day).

How I came to choose this topic to blog about first requires sharing the following story... which I'll preface with a little background.

Dr. Wife and I both do work related to mathematical biology, and frequently use computers to tackle some of the more unruly bits of math we encounter. Computer software for doing mathematics can roughly be divided into two main categories: The numerical software that crunches numbers (not a technical term, I just like calling them number crunchers), and the computer algebra systems (or CASs). This later variety do symbolic (or algebraic) manipulations like reducing fractions, and working with xs and ys instead of numbers.

For example, if you wanted to find a formula for the integral of

x^2+sin(x) from x=a to x=b,

software like Maple, Mathematica or the free software Maxima would tell you the answer is

cos(a)-cos(b)-(a^3-b^3)/3.

If it involves symbols, and algebraic manipulations - you want a CAS!

Now,if you just wanted to do something like plot data, generate random numbers or do statistical work -- computations using explicit numbers not just symbols -- then you'd want to use software that excels at crunching numbers. Something like Matlab, Octave, R or programming languages like fortran, C, or python (just to name a few).

Quick tangent: while I'm a fan of Maple and Matlab, they cost money. Enough money that I'd rather use free alternatives when I can, especially if I might want to share my code with friends or collaborators but don't want them to spend money just to run it. Anyway, that's my plug for Maxima (a CAS) and R (statistical software that is also a great alternative to Matlab) -- but lets get back to the story...

The other day, Dr. Wife and I were working at home. She had a bug in some R code that -- after much hair-pulling -- was finally attributed to the following unexpected behavior of R's round() function:

> x = c(1, 1.5, 2, 2.5);
> rbind(x=x, round = round(x));
      [,1] [,2] [,3] [,4]
x        1  1.5    2  2.5
round    1  2.0    2  2.0
>

Notice there's something weird going on here... 1.5 gets rounded UP to 2 (as it should, if you think of rounding like most Americans), but 2.5 gets rounded DOWN to 2! What's going on with the round() function in R?! Just to make sure I wasn't screwed over by all of my math teachers, I also checked in Matlab which gave the expected result...

> x=[1 1.5 2 2.5];
> [x; round(x)]
ans =
    1.0000    1.5000    2.0000    2.5000
    1.0000    2.0000    2.0000    3.0000
>

So what's going on here???

Quoting the wikipedia page on Rounding, there are different ways to round numbers depending on the task at hand...

round to nearest: q is the integer that is closest to y...

round towards zero (or truncate): q is the integer part of y, without its fraction digits.

round down (or take the floor): q is the largest integer that does not exceed y.

round up (or take the ceiling): q is the smallest integer that is not less than y.

round away from 0: if y is an integer, q is y; else q is the integer that is closest to 0 and is such that y is between 0 and q.

Sadly, my favorite number crunching software (that would be R) uses one of the dumbest rules out there (well, dumb from a mathematical perspective) to decide what to do with numbers ending on ".5". Both R and Matlab use the "round to the nearest neighbor" rule -- but the way they deal with the half-way point turns out to be the the source of the discrepancy above.

In school, most of us learned this rule (again, from the wikipedia page on Rounding):

Round half up

The following tie-breaking rule, called round half up, is widely used in many disciplines:

If the fraction of y is exactly .5, then q = y + 0.5

That is, half-way values y are always rounded up. For example, by this rule the value 23.5 gets rounded to 24, but -23.5 gets rounded to -23...

This is what Matlab does, and is what most people think of when they round numbers.

The numerical routines in R implement a different rule (italics added for emphasis):

Round half to even

A tie-breaking rule that is even less biased is round half to even, namely

If the fraction of y is 0.5, then q is the even integer nearest to y.

Thus, for example, +23.5 becomes +24, +22.5 becomes +22, -22.5 becomes -22, and -23.5 becomes -24. This variant of the round-to-nearest method is also called unbiased rounding, convergent rounding, statistician's rounding, Dutch rounding, Gaussian rounding, or bankers' rounding...

That's right - R rounds to even. Yes, it's primarily a stats platform. Yes, this is all explicitly stated in the documentation for the round() function, and of course I still love R, but seriously -- round to even? To make matters worse, that day we also had other bugs that were essentially caused by the fact that (in both Matlab and R)

> (0.1+0.05) > 0.15
[1] TRUE 
>

So what's the moral of the story? Well I'm not quite sure yet ... I still like R, and I certainly will continue to use it. I'll probably end up reading through the documentation for frequently used functions -- not just unfamiliar functions -- and as much as I'd like to pretend it doesn't matter,I'll certainly keep an eye out for ways my code might get sabotaged by round-off error.

8 comments:

Posted by: James Dowden | 2/16/2010 8:10 AM: Have you considered adding 0.0001 before rounding?
Posted by: Paul | 2/16/2010 10:52 AM: We had the same thought, but ended up using signif() for dealing with potential round-off error, and redefined the round() function to "round up" instead of "round to even"...

> round = function(x) trunc(x+0.5);

That seems to have done the trick ;)
Posted by: Lucas | 7/16/2014 7:24 PM: This same thing just happened to me. Thanks for the post.
Posted by: Gauti | 10/01/2014 11:20 AM: Do you have any idea why you get:
> (0.1+0.05) > 0.15
[1] TRUE
>
A similar thing is giving me some problems.
Posted by: Paul | 10/01/2014 11:42 AM: Unfortunately, most software will give you those problems. I'm fairly certain this is from the round-off error that occurs during floating-point arithmetic:

> (0.1 + 0.05) > 0.15
[1] TRUE
> (0.1 + 0.05) - 0.15
[1] 2.775558e-17

For more, see http://cran.r-project.org/doc/FAQ/R-FAQ.html#Why-doesn_0027t-R-think-these-numbers-are-equal_003f and the Goldberg 1991 paper that is mentioned there.
Posted by: Unknown | 5/02/2016 10:58 PM: Solution for rounding half away from zero
https://github.com/ajucode/learnR/blob/master/R_base/session_26.R
Posted by: John Walker | 6/03/2016 3:49 AM: The roundup method suffers from a lack of symmetry around zero and so introduces a bias into summation functions. To illustrate, I created symmetrical sequence around zero.

> mynums <- seq(-10,10,by=0.5)
> mynums
[1] -10.0 -9.5 -9.0 -8.5 -8.0 -7.5 -7.0 -6.5 -6.0 -5.5 -5.0 -4.5
[13] -4.0 -3.5 -3.0 -2.5 -2.0 -1.5 -1.0 -0.5 0.0 0.5 1.0 1.5
[25] 2.0 2.5 3.0 3.5 4.0 4.5 5.0 5.5 6.0 6.5 7.0 7.5
[37] 8.0 8.5 9.0 9.5 10.0
>sum(mynums)
[1] 0
#using the standard R round function
> sum(round(mynums))
[1] 0

# so the round to even rule gives same sum as original sequence.

#now we define a simple rounding function that rounds up as specified
> roundUp <- function(x){ifelse(round(abs(x-trunc(x)),1) == 0.5, trunc(x+0.5),round(x))}

#apply it to mynums to check function
> roundUp(mynums)
[1] -10 -9 -9 -8 -8 -7 -7 -6 -6 -5 -5 -4 -4 -3 -3 -2 -2 -1 -1
[20] 0 0 1 1 2 2 3 3 4 4 5 5 6 6 7 7 8 8 9
[39] 9 10 10
# using the round up rule -9.5 should be -9 and it is. 9.5 should be 10 and it is. So the function appears to work.

#now sum the rounded up sequence to check for bias

>sum(roundUp(mynums))
[1] 10

#The sum is not zero, hence the roundUp procedure and the rule it follows
#introduce a bias into summation functions. Not a very desirable #characteristic
Posted by: John Walker | 6/03/2016 4:22 AM: In my previous comment I used the term roundup method but I meant the round half up method as a look at the one line function, roundUp, will show. It's easy to show mathematically that the introduced error is proportional to the number of 0.5 values in a sequence (offset=n*5*(significantdigit), where significant digit indicates at what level the rounding occured. e.g. a sequence of numbers with 4 signifcant digits, 1.251 has a significantdigit of 0.001, so in a sequence that contains 5 numbers ending in a 5 we have an bias of 5*5*0.001=0.025). A correction may be made on this basis.

The Obligate Scientist

R, I still love you, but I hate your round() function!

8 comments:

Post a Comment

Suggestions? Questions?

Search this blog...

Creation "Museum" in KY?

Support Science!

Blog Archive

Topic Labels...