The Obligate Scientist: math/science computation

Showing posts with label math/science computation. Show all posts

Shame on you, R... again! (But not really...)

By Paul on Monday, January 17, 2011 at 2:25 PM

Remember how a few months ago I lamented the fact that the round() function in R uses a non-standard rule for rounding to the nearest integer? Instead of rounding k+0.5 to k+1 (k being an integer) R rounds to whichever integer k or k+1 is even. Well here's another example of R offending our mathematical sensibilities... R seems to think that even though

1 * Inf = Inf

somehow it can get away with telling us that

1 * (Inf + 0i) = Inf + NaNi?

"Gasp!" I know, insane, right? What's going on here? Whatever happened to "anything times one is equal to that same number"? Granted, infinity isn't really a number so sometimes we can't assign a value to an expression like Inf*0, but deep down inside I can't shake the feeling that 1 * Inf really should be Inf!

It turns out that R and I are both right - we're just making different assumptions about how we interpret all these 1s, 0s and Infs in these two statements. Let me explain...

Despite using sound, puppy-approved logic in this case, R gives the offending result because of how it implements everyone's favorite section in Calculus class: computing limits. To understand why, take a closer look at how the multiplication is happening in each case above. The first case is hopefully straightforward. In the second case 1 is treated as a complex number instead of a scalar which gives

1*(Inf+0i) = (1+0i)*(Inf+0i) =

= Inf + (0*Inf+0)i = Inf + NaNi

We could also throw in a third case and multiply these two complex numbers in the more natural context of polar coordinates. Writing each in terms of their modulus r (distance from the origin) and argument θ (angle off of the positive real axis) instead of in terms of their real and imaginary parts, we have

1*(Inf+0i) = (1+0i)*(Inf+0i)

= 1exp(i0)* Inf exp(i0)

= (1*Inf) exp (i0)

= Inf exp(i0)

= Inf + 0i

Whew! So what's "wrong" with multiplying things in x+yi form??

R recognizes that any computations involving infinity really require the algebra of limits, and acts appropriately (albeit conservatively) to evaluate such expressions. This discord then comes from what R assumes is the result of taking some limit and what is to be treated as a constant. Unless you've taught a calculus class recently some explanation might be in order.

In general, expressions involving infinity are treated as limits where some unspecified variable is going to infinity: For example, statements like Inf*0 can't be assigned a value because in it's most general interpretation we're asking "What is the limit of the product of x*y as x→Inf and y→0?" Here, whether y goes to 0 from above (e.g. y=1/x) or below (y=-1/x) or neither will determine where the limit of the product goes to zero, some non-zero number, plus or minus infinity, or will have no limit at all. (Open any calculus text to the sections on limits for examples leading to these different outcomes). Note this example does have an answer if y is always assumed to be 0, since it's always the case that x*0=0.

That means that, depending on how we interepret the zero, our example might equal either

Inf*0=NaN

Inf*0=0

This is exactly what's going on above.

Returning to the two statements at the top of this post, we can now understand why R gives these two different answers. By making the zero implicit vs. explicit R treats these expressions differently. R interprets Inf as "the limit of x + 0i as x→Inf," allowing for the result that

1 * Inf = Inf

whereas in the second case R treats Inf + 0i as "the limit of x + yi as x → Inf and y → 0" which has no general answer and therefore gets assigned a value of NaN.

The take-home message: as soon as there's an Inf in an expression, R proceeds assuming everything is a limit, even though it might be clear to the user that some of those key 1s and 0s should be treated as constants.

Data Visualization in R: Part... 0

By Paul on Friday, January 14, 2011 at 3:26 PM

Labels: math/science computation, R

I haven't forgotten that I promised to do a series of posts on data visualization using R - just a bit busy catching up after some excellent holiday R&R. Hopefully I'll get a post out soon!

In the mean time, check out these two posts from the R-bloggers network.

Data Visualization: 200 Years of Health and Wealth

By Paul on Wednesday, December 15, 2010 at 10:48 PM

Labels: educational, entertainment, math/science computation, R, science basics, science literacy

This video is super awesome! It's part of BBC 4's program The Joy of Stats and you can learn a little more about the data here or play with it using this web app on http://www.gapminder.org/. Now don't you wish you could do that with data?

The reason I wanted to share this video (beyond the fact that it's so amazingly awesome) is to let you in on a little secret... are you ready? Here it comes...

Data visualization is easy, and anyone with a computer can do it!

Seriously, it is not that hard! YOU can make cool little wobbling bubble graphs just like in the video! Aren't you excited to learn how?! Yeah? Fantastic!

Now that you're all psyched to visualize some data, I should mention that I am being a bit misleading here... because it does require a bit of computer know-how, and sometimes (ok, almost always) takes a bit of tinkering with the data to find the best ways of boiling down to just the relevant information. But frankly, these things aren't all that hard to learn and aren't always necessary if we're just poking around to get a feel for the data, so none of these words of caution should give you much pause. Add to that the fact you can always hit up the internet for examples to ~~download and use~~ study and learn from and many of these obstacles are reduced to mere speed bumps. If you've got a computer, we can get it to plot some data.

Figure 1. Tourist hot spots based on Flickr data. #1 of flowingdata's Top Ten Data Visualization Projects of 2010.

So here's the deal... there are some really cool data available from http://www.gapminder.org/, and I'm going to have a little free time these next few weeks in between birding trips, visiting family and friends, and doing thesis work. Assuming that free time stays free, I'm going to walk through an example or two of plotting some of this data in R. If you'd like to follow along, you'll need to download and install R on your computer, and if you don't already have software that can open excel spreadsheets, you'll also want to install something (free) like OpenOffice.

Sound good? Excellent! Feel free to share any questions or suggestions in the comments section below. Now hurry along and go install R!

Fast and Sloppy Root Finding

By Paul on Saturday, December 4, 2010 at 3:21 PM

Labels: educational, math/science computation, R

Disclaimer: While the approach to root finding mentioned below is both slow and imprecise, it's also a cheap and incredibly handy approach when all you need to do is get "close enough". If you like R quick and dirty (hey now, get your mind out of the gutter...) this is totally the root finding method for you!

I just read a post on Root Finding (original here) by way of R-bloggers.com which was timely given that only yesterday I'd needed to do some root finding in R to make a figure for a manuscript -- something like the following image.

The blog post prompted me to mention here how I did my root finding for two reasons:

Precision and computation time sometimes don't matter all that much; and
The way I did my root finding was way easier to implement than convergence-based methods described in the post above.

So here's what I was aiming for, and how I implemented it in R.

The Task: Suppose you're plotting two curves (say, y=f(x) and y=g(x)) and would like to indicate their intersection with an empty circle (i.e. pch=21). In my case, the intersection of these two curves was equilibrium point for a dynamic model, and I wanted to indicate it as such.

If you can find their intersection mathematically (i.e. set f(x)=g(x) and solve for x) then awesome -- do that if you can. But if for some reason you can't, and you know a single root exists in some interval a≤x≤b, you can find that root quickly using some straightforward vector tricks.

The Solution: Lets use the example of finding the intersection of f(x) = x/(1+x) and g(x) = (5-x)/5 over the interval (0,5).

Step 1: Define an x vector full of values between a and b. The smaller the step size (or, the longer the list) the better.

> x = seq(0, 5, length=5000);

Step 2: Compute the square of the difference of your two functions over that interval using x. This is as simple as the line of code...

> fgdiff = ( x/(1+x) - (5-x)/5 )^2;

Step 3: Using the which() function, we can pick out the index for the smallest value in our list of squared differences... Once we know this index (call it j) we know the intersection occurs at, or very near, the x value x[j], and we're basically done!

> j = which(fgdiff==min(fgdiff))[1];
> j; x[j]; ## show the value of j, x[j]

For a closer look at what's going on with that which() statement, check out the help for which() and following example.

## A Closer look at the which(fgdiff==min(fgdiff))
> ?which
> which(c(F,F,T,T,F))
[1] 3 4
> which(c(F,F,T,T,F))[1]
[1] 3
> xample = c(5:1,2:5); xample
[1] 5 4 3 2 1 2 3 4 5
> min(xample)
[1] 1
> xample==min(xample)
[1] FALSE FALSE FALSE FALSE TRUE FALSE FALSE FALSE FALSE
> which(xample==min(xample))
[1] 5
> which(xample==min(xample))[1]
[1] 5

Step 4: Since both functions are (approximately) equal at this x value, it only remains to decide whether you want to indicate the point of intersection using (x, f(x)) or (x, g(x)).

> points(x[j], x[j]/(1+x[j]), pch=19, cex=2)

All done!

If you'd like to tinker with this example, here's the code to produce the image above.

...Continue Reading This Post »

The Power of Data Visualization & Comparison

By Paul on Tuesday, November 2, 2010 at 8:22 PM

Labels: entertainment, math/science computation, science and society, science literacy

David McCandless: The beauty of data visualization (TED Talk)

Computational statistics and computer programming abilities are -- and will continue to be -- valuable skills in the job market (and in the sciences). If I could offer any career advice to students, it is to work hard to learn these two things well!

The Math Behind Morphing Faces: Linear Algebra

By Paul on Sunday, October 24, 2010 at 3:08 PM

Labels: entertainment, math/science computation, R

Animations of morphing faces or combinations of multiple images into one can be quite a thing of beauty. But how exactly are those photos so carefully blended together?

While the answer to that question is beyond the scope of what I could put into a single blog post, understanding that answer requires some basic knowledge of one very important are of mathematics: linear algebra. It's important not just for the number-crunching tools it provides, but because it helps us think about things differently and know how to ask the right questions and know whether or not those questions have answers. Before I get too far ahead of myself lets first take a look at the video which motivated this post in the first place, which strings together 60 years of female actors from CBS (click the button in the lower right corner to watch it full-screen):

CBS - 60 Years of Actresses from Philip Scott Johnson on Vimeo.
More videos by Philip Scott Johnson (including CBS -
60 Years of Actors) can be found on vimeo and on youtube.

So how are these animations created?

If you replay part of the video, you'll notice that there are two things going on: 1) facial features in each image are stretched and rotated to line up with the facial features of the next image, and 2) there's a fade from one image to the next. The fade seems simple enough, so lets just focus on the first process of stretching and rotating facial features.

...Continue Reading This Post »

Home birth death toll rising in Colorado?

By Paul on Friday, October 15, 2010 at 12:04 PM

Labels: complimentary and alternative medicine, math/science computation, medicine, R

Dr. Amy Tuteur, the Skeptical OB, has a blog post up entitled 'Inexcusable homebirth death toll keeps rising in Colorado.' Now I'm a big fan of science-based medicine (and of Tuteur's blog), however I have to call foul when it comes to that "rising" part of her post. Yes, I think it's pretty minor point since the real comparison to consider is the home birth vs. hospital birth mortality rates - but this is a nice opportunity to do some basic stats. Having left a few comments to that effect on her blog, I figured I would summarize them here.

...Continue Reading This Post »

Free Online Math Books?

By Paul on Tuesday, October 5, 2010 at 7:48 PM

Labels: education, math/science computation, mathematical biology, mathematics

I was poking around the web for a copy of Euclid's Elements, and came across a nice list of over 75 freely available online math books. There's a good mix of material there, ranging from centuries old classics up to modern day course topics and modern application areas - something for everybody. Check it out!

R Tip: Listing Loaded Packages

By Paul on Monday, September 13, 2010 at 2:22 PM

Labels: math/science computation, R

A friend recently asked how you list the packages currently loaded into R's workspace, as opposed to listing all available packages which is what library() does. The answer?

> (.packages())

Software for Science & Math (part II): Getting started with R

By Paul on Saturday, June 12, 2010 at 3:26 PM

Labels: educational, math/science computation, R

A while back I wrote the first post in a series where I'll cover important concepts from Calculus, Probability and Statistics that (IMO) everyone should be familiar with. I wanted to occasionally involve two free software platforms (R and Maxima) in those posts, and I've finally gotten around to a post dedicated to getting started with R.

R is a handy computing platform and great way to learn basic programming skills. It can do basic statistics, plot data or mathematical functions, and provides access to a menagerie of advanced tools via R packages. And, it's all free. R's broad functionality and statistical capabilities makes familiarity with R a valuable skill in the natural sciences.

Getting Started with R

If you haven't already installed R on your computer you should check out this website on downloading and installing R or you can just pick your nearest CRAN mirror (e.g. at UCLA, NCI in Maryland, etc.) and download and install the appropriate version per their instructions. If the install isn't working, feel free to post questions in the comments below.

Basic Interactive Examples

...Continue Reading This Post »

Software for Science & Math: R and Maxima (part I)

By Paul on Sunday, April 11, 2010 at 7:11 PM

Labels: education, math/science computation, R, science literacy

In the coming months, I plan to write a series of posts reviewing "must-know" mathematics everyone should be familiar with: important concepts from Calculus, Probability and Statistics. Here I begin by introducing some free software you can use to follow along, or use for your own computational tasks. In future posts, I'll encourage a little hands-on learning of these applications by providing code and other information so you can recreate my figures and results.

Science has emerged as humankind's most effective way of understanding reality. The success of the scientific method is largely a product of two key components: (1) a strong reliance on empirical data, and (2) a precise and powerful theoretical framework to properly formulate hypotheses, make predictions about experimental outcomes, etc.

Skipping over the importance of data (for now), I'd like to introduce some computational tools that you might consider installing on your computer. The applications are the computing platform known simply as R, and the software for doing symbolic manipulations (e.g. algebra) known as Maxima. I should mention this software isn't just for goofing around and writing blog posts -- these applications can be used to do research-level mathematical, statistical and numerical work. So you may find on or both to be valuable assets.

Oh, right -- and did I mention they're both free?

...Continue Reading This Post »

How to make pi ... using R

By Paul on Thursday, March 11, 2010 at 12:04 PM

Labels: education, entertainment, math/science computation, R

There's another irresistible post over at Dot Physics, this time on a nifty way to estimate the value of pi using random numbers. Check out that post for details, then hop back over here!

(brief pause...)

Ok, you're back! And you loved the post - awesome. But didn't it feel like it was lacking a little... pie? I too had the very same feeling, so I wrote my own version of Rhett's simulations (using R) but with some fanciful graphics just to jazz things up a bit.

Figure 1: The fraction of points that fall "on the pie" (left) give a reasonable

approximation of π=3.14159... Lines on the right show π and our estimate.

If you have R installed on your computer, run the code below for the animated version.

...Continue Reading This Post »

R, I still love you, but I hate your round() function!

By Paul on Monday, February 15, 2010 at 10:19 PM

Labels: math/science computation, R

You all remember the concept of rounding, right? I first learned to round numbers by taking them to the nearest integer (e.g. rounding 2.1 gives 2, rounding 3.9 gives 4, and so on) and in cases like 2.4 -- rounding up to the nearest integer. I'll admit that as a child I didn't always pay attention in math class, but I was a bit surprised to learn recently that there are a number of different rules for rounding (and that being unaware of this fact can totally ruin your day).

How I came to choose this topic to blog about first requires sharing the following story... which I'll preface with a little background.

Dr. Wife and I both do work related to mathematical biology, and frequently use computers to tackle some of the more unruly bits of math we encounter. Computer software for doing mathematics can roughly be divided into two main categories: The numerical software that crunches numbers (not a technical term, I just like calling them number crunchers), and the computer algebra systems (or CASs). This later variety do symbolic (or algebraic) manipulations like reducing fractions, and working with xs and ys instead of numbers.

For example, if you wanted to find a formula for the integral of

x^2+sin(x) from x=a to x=b,

software like Maple, Mathematica or the free software Maxima would tell you the answer is

cos(a)-cos(b)-(a^3-b^3)/3.

If it involves symbols, and algebraic manipulations - you want a CAS!

Now,if you just wanted to do something like plot data, generate random numbers or do statistical work -- computations using explicit numbers not just symbols -- then you'd want to use software that excels at crunching numbers. Something like Matlab, Octave, R or programming languages like fortran, C, or python (just to name a few).

Quick tangent: while I'm a fan of Maple and Matlab, they cost money. Enough money that I'd rather use free alternatives when I can, especially if I might want to share my code with friends or collaborators but don't want them to spend money just to run it. Anyway, that's my plug for Maxima (a CAS) and R (statistical software that is also a great alternative to Matlab) -- but lets get back to the story...

The other day, Dr. Wife and I were working at home. She had a bug in some R code that -- after much hair-pulling -- was finally attributed to the following unexpected behavior of R's round() function:

> x = c(1, 1.5, 2, 2.5);
> rbind(x=x, round = round(x));
      [,1] [,2] [,3] [,4]
x        1  1.5    2  2.5
round    1  2.0    2  2.0
>

Notice there's something weird going on here... 1.5 gets rounded UP to 2 (as it should, if you think of rounding like most Americans), but 2.5 gets rounded DOWN to 2! What's going on with the round() function in R?! Just to make sure I wasn't screwed over by all of my math teachers, I also checked in Matlab which gave the expected result...

> x=[1 1.5 2 2.5];
> [x; round(x)]
ans =
    1.0000    1.5000    2.0000    2.5000
    1.0000    2.0000    2.0000    3.0000
>

So what's going on here???

Quoting the wikipedia page on Rounding, there are different ways to round numbers depending on the task at hand...

round to nearest: q is the integer that is closest to y...

round towards zero (or truncate): q is the integer part of y, without its fraction digits.

round down (or take the floor): q is the largest integer that does not exceed y.

round up (or take the ceiling): q is the smallest integer that is not less than y.

round away from 0: if y is an integer, q is y; else q is the integer that is closest to 0 and is such that y is between 0 and q.

Sadly, my favorite number crunching software (that would be R) uses one of the dumbest rules out there (well, dumb from a mathematical perspective) to decide what to do with numbers ending on ".5". Both R and Matlab use the "round to the nearest neighbor" rule -- but the way they deal with the half-way point turns out to be the the source of the discrepancy above.

In school, most of us learned this rule (again, from the wikipedia page on Rounding):

Round half up

The following tie-breaking rule, called round half up, is widely used in many disciplines:

If the fraction of y is exactly .5, then q = y + 0.5

That is, half-way values y are always rounded up. For example, by this rule the value 23.5 gets rounded to 24, but -23.5 gets rounded to -23...

This is what Matlab does, and is what most people think of when they round numbers.

The numerical routines in R implement a different rule (italics added for emphasis):

Round half to even

A tie-breaking rule that is even less biased is round half to even, namely

If the fraction of y is 0.5, then q is the even integer nearest to y.

Thus, for example, +23.5 becomes +24, +22.5 becomes +22, -22.5 becomes -22, and -23.5 becomes -24. This variant of the round-to-nearest method is also called unbiased rounding, convergent rounding, statistician's rounding, Dutch rounding, Gaussian rounding, or bankers' rounding...

That's right - R rounds to even. Yes, it's primarily a stats platform. Yes, this is all explicitly stated in the documentation for the round() function, and of course I still love R, but seriously -- round to even? To make matters worse, that day we also had other bugs that were essentially caused by the fact that (in both Matlab and R)

> (0.1+0.05) > 0.15
[1] TRUE 
>

So what's the moral of the story? Well I'm not quite sure yet ... I still like R, and I certainly will continue to use it. I'll probably end up reading through the documentation for frequently used functions -- not just unfamiliar functions -- and as much as I'd like to pretend it doesn't matter,I'll certainly keep an eye out for ways my code might get sabotaged by round-off error.

Shame on you, R...

By Paul on Sunday, October 25, 2009 at 7:35 PM

Labels: math/science computation, R

... as such a fine, upstanding free software package that allows one access to all the latest statistical methods and modeling packages, you really should know better than to go about telling people that

> 1/0
[1] Inf

Math, Computers and Intelligent Design Pseudoscience

By Paul on Friday, August 21, 2009 at 12:53 PM

Labels: flawed argument, intelligent design (creationism), math/science computation, mathematics

Here's a recently published paper (PDF reprint) by intelligent design (creationism) proponent William Dembski. Strong criticisms of the paper (and how it's being misused by other intelligent design creationism proponents) have already popped up on blogs in some posts like these, and even in Dembski's own blog - which he promptly put a stop to by disabling comments.

Instead of focusing on the paper itself, I wanted to illustrate how it (and other mathematical or computational results) can be misused in promoting ID. Before I begin, feel free to read through the blog posts and skim the paper.

After that, we can start in on this blurb from the Discovery Institute as an example of this sort of misuse by asking "So does this paper support intelligent design creationism??"

The paper itself specifies (in the abstract) what they did, and how they applied it (here and below, I have used a bold font for emphasis):

This paper develops a methodology based on these information measures to gauge the effectiveness with which problem-specific information facilitates successful search. It then applies this methodology to various search tools widely used in evolutionary search.

I happen to know a thing or two about using mathematical models in science, and this paper is a fantastic example of what I consider mathematical equivocation - using the power and complexities of mathematics as a logical tool to try and back a claim that really isn't backed by the math. It's significant because, unlike typical rhetorical arguments, the math obfuscates the assumptions and logical arguments being made, and can at times require graduate level background to decipher - so the equivocation is a bit harder (if not impossible, for some) to actually notice.

While the paper says nothing about intelligent design creationism, others (including Dembski himself) claim that it applies. Lets start with the title of the Discovery Institute piece. First, they claim this is a pro-ID publication. Second, we have our first logical fallacy: the appeal to authority. It seems that the holy grail of ID creationist efforts is to have some science-cred to wave around, and a "peer-reviewed scientific article" is exactly that. So what about the pro-ID claim?

In this blog post we get the Discovery Institute's take on what the paper is really about:

A new article titled "Conservation of Information in Search: Measuring the Cost of Success," in the journal IEEE Transactions on Systems, Man and Cybernetics A: Systems & Humans by William A. Dembski and Robert J. Marks II uses computer simulations and information theory to challenge the ability of Darwinian processes to create new functional genetic information.

To understand why the paper has absolutely nothing to do with real functional genetic information (despite this claim), requires knowing about information theory (an unknown topic to the vast majority of people, scientists included) and that the kind of "information" discussed in the paper is very different from genetic information in a biological context. Indeed, the word genetic only appears on the first page of the article when mentioning "genetic algorithms", and there's no mention of "functional genetic" anything in the paper.

Unfortunately, intelligent design creationists frequently misuse or improperly apply the concept of "information" (which can be defined in a number of ways). Demanding a clear definition is always a good way to keep on track with what's actually being said.

Here's the closest we come to definitions for interpreting information in this paper:

endogenous information, which measures the difficulty of finding a target using random search;
exogenous information, which measures the difficulty that remains in finding a target once a search takes advantage of problem specific information; and
active information, which, as the difference between endogenous and exogenous information, measures the contribution of problem-specific information for successfully finding a target.

Getting back to the Disco Institutes blog post, besides the equivocation this blurb is also largely fueled by another common logical no-no found in many pro-ID arguments - the combined false dichotomy and straw man ~~arguments~~ fallacies whereby (in a debate context) you misrepresent your opponents position with something you can refute, then pretend you refuted your opponent's true position, and then assert that their being wrong makes your position correct.

Here's the Discovery Institute spinning the article against evolution "unguided" by an intelligent designer:

Darwinian evolution is, at [it's] heart, a search algorithm that uses a trial and error process of random mutation and unguided natural selection to find genotypes (i.e. DNA sequences) that lead to phenotypes (i.e. biomolecules and body plans) that have high fitness (i.e. foster survival and reproduction). Dembski and Marks' article explains that unless you start off with some information indicating where peaks in a fitness landscape may lie, any search — including a Darwinian one — is on average no better than a random search.

Note that the implication (at least to me) is that evolution requires "some information" (e.g. an intelligent designer?) if it's going to work "better than a random search." Also note the false dichotomy at play here, as the blog post seems to imply that it's a "pro-ID" publication because it allegedly refutes evolution.

Just to clarify the staw man, evolution by natural selection is equated with evolutionary algorithms, which are then criticized and it's all put forth as being "pro-ID".

So far, I don't see how any of this supports intelligent design creationism or refutes evolution by natural selection. Feel free to correct me if I'm wrong here!

So what's really the take home message from this paper?

Returning to the matter of equivocation, MarkCC's blog post helps to clarify the meaning of "information" in the paper, although only implicitly:

In terms of information theory, you can look at a search algorithm and how it's shaped for its search space, and describe how much information is contained in the search algorithm about the structure of the space it's going to search.
What D&M do in this paper is work out a formalism for doing that - for quantifying the amount of information encoded in a search algorithm, and then show how it applies to a series of different kinds of search algorithms.

In terms of evolution, it's really just asking how much of the information about the space (i.e. the fitness landscape) is tied up in the algorithm (i.e. natural selection), which given that natural selection is all about the relative number of offspring contributed to the next generation, these results really don't seem surprising or problematic.

The conclusions of the paper seem to be awkwardly stated and kind of amusing:

CONCLUSION
Endogenous information represents the inherent difficulty of a search problem in relation to a random-search baseline. If any search algorithm is to perform better than random search, active information must be resident. If the active information is inaccurate (negative), the search can perform worse than random...

Hmm... So in order for evolution by natural selection (as an algorithm) to work better than random, it needs to include correct information from genetics, developmental biology, ecology, and so on? Got it. If we get it wrong, it'll perform worse than some other random hypotheses? Got it.

This section also has some either very ironic or very well chosen wording...

... Accordingly, attempts to characterize evolutionary algorithms as creators of
novel information are inappropriate. To have integrity, search algorithms, particularly computer simulations of evolutionary search, should explicitly state as follows: 1) a numerical measure of the difficulty of the problem to be solved, i.e., the endogenous information, and 2) a numerical measure of the amount of problem-specific information resident in the search algorithm, i.e., the active information.

Nice - "attempts to characterize" are inappropriate because we're not using your particular definition of information? Gee, now THAT sounds familiar.

To be honest, I have no real expertise in algorithms, but from what I could dig up I don't think this use of "integrity" means anything out of the ordinary. I'll run this by some computer science friends of mine and any insights will appear below.

Until then, my best response to the question "So does this paper support intelligent design creationism??" is decidedly, No.

HIV: Modeling the experiment & the problem of evolution

By Paul on Wednesday, February 11, 2009 at 8:22 PM

Labels: human diseases, math/science computation, mathematical biology, medicine, science

I just noticed a small article in the ScienceNOW Daily News on using microbicide gels to decrease the risk of contracting HIV. Give it a read!

So why did this article (and this more detailed information from the NIH) catch my attention?

Right now, as I type this, over 9,400 women in Africa are participating in a second, even larger clinical trial - the subject of some other interesting research I'll get into below. The results of that study will in large part determine whether or not this product makes it to market. Being my usual critical self, I immediately have two questions come to mind: "Will it be effective?" and "Is it safe?"

This first question will get a strong answer via this study - after all, 'effective' is relatively straight forward thing to describe and measure. But what do we mean by "safe"?? This brings me to the other big reason this article grabbed my attention: Dr. Sally Blower.

This past fall I had the pleasure of meeting Dr. Sally Blower, a mathematical biologist at UCLA, while I was visiting Ohio State's Mathematical Biosciences Institute during a workshop. She presented some of her research taking a critical look at the second study mentioned in the ScienceNOW article. Her technical paper on the matter can be found on her website.

To briefly summarize the work she presented, she and her colleagues were interested in addressing the possible risk of drug resistant strains arising from the use of these microbicide gels. HIV has a relatively high mutation rate (leading to lots of genetic variation in a viral population) and anyone already infected with HIV who is exposed to anti-retroviral drugs (ARVs) could unknowingly be facilitating natural selection on the virus, leading to drug resistant strains of HIV. Unfortunately, this is a very real problem in the fight against HIV/AIDS, and to let a high-risk product pass clinical testing could come at a price in the long run.

So to understand how well the experiment could assess this risk, as well as the efficiency of the microbicide gels as a means of protection against HIV infection, she and her colleagues created a computer model of the experiment. They began by simulating a population of women and men in which HIV was being transmitted.

As the omniscient creators of this virtual world, they were able to include and manipulate many key factors in the transmission process, including other means of protection (e.g. condoms), the efficacy of the gels, and so on. They were able to "parameterize [the] transmission model using epidemiological, clinical, and behavioral data to predict the consequences of widescale usage of high-risk microbicides" in the population. They then collected data from a number of simulations, following the same type of protocol as the real study, which they could then compare to the actual transmission process in the simulated population.

This clever use of mechanistic models and real world data accomplished two things. First, the computer model allowed them to assess the limitations of the real world experimental protocol, which helps researchers in their interpretation of the real-world experimental results. Second, because they were free to vary the model parameters and run the simulated experiment repeatedly, they could explore the simulated transmission process under different scenarios and describe how the factors included in the model contribute to the eventual outcome.

So did we learn anything from all of this? Among their results, they found that the "planned trial designs could mask resistance risks and therefore enable high-risk microbicides to pass clinical testing" - unfortunate news. On the other hand, their findings suggest that "even if ARV-based microbicides are high-risk and only moderately efficacious, they could reduce HIV incidence."

I can't say what the future holds for these microbicide gels, although I certainly hope they prove to be another means to battle against HIV worldwide. If you'd like more information on HIV/AIDS, check out the 2008 report on the global AIDS epidemic (I'd recommend browsing the "Media kit") from the United Nations Programme on HIV/AIDS.

The Obligate Scientist

Shame on you, R... again! (But not really...)

Data Visualization in R: Part... 0

Data Visualization: 200 Years of Health and Wealth

Fast and Sloppy Root Finding

The Power of Data Visualization & Comparison

The Math Behind Morphing Faces: Linear Algebra

Home birth death toll rising in Colorado?

Free Online Math Books?

R Tip: Listing Loaded Packages

Software for Science & Math (part II): Getting started with R

Getting Started with R

Basic Interactive Examples

Software for Science & Math: R and Maxima (part I)

How to make pi ... using R

R, I still love you, but I hate your round() function!

Shame on you, R...

Math, Computers and Intelligent Design Pseudoscience

HIV: Modeling the experiment & the problem of evolution

Suggestions? Questions?

Search this blog...

Creation "Museum" in KY?

Support Science!

Blog Archive

Topic Labels...