ЯтомизоnoR

R, Statistics

How to know the reliability of integrate function in R

Adam Fleischhacker’s question was impressive for me, because I’m using the R numerical integrate function. But I have not thought deeply about that. Now the time to dig it.


I planned an experiment. Reading the source code will reveal all questions, but it requires more time and patience. I love experiments than reading source codes.

Fig. 1. Step Function (offset=3)

Fig. 1. Step Function (offset=3)

  1. make a step function that have a square peak of unit width and height.
    position of the peak is (offset, offset + 1).
  2.  integrate the step function by the range of (0, offset + 1).
  3. check the results by variable offset.

The value of integrate must be exactly 1. But the value of numerical integrate differs.

Fig. 2. value and subdivisions of integrate function

Fig. 2. value and subdivisions of integrate function

sf0 <- function(x, offset)
  ifelse(x < offset, 0, ifelse(x < offset + 1, 1, 0))

sf1 <- function(n, ...)
  unclass(integrate(function(x) sf0(x, n), 0, n + 1, ...))

sf2 <- function(n, ...)
  sapply(as.list(n), function(a) sf1(a, ...)$value)

sf3 <- function(n, ...)
  sapply(as.list(n), function(a) sf1(a, ...)$subdivisions)

sf4 <- function(n, ...)
  t(sapply(as.list(n), function(a) {
    b <- sf1(a, ...)
    c(value=b$value, subdivisions=b$subdivisions)
  })
)

Fig.1 <- function() {
  StepFunction <- function(x) sf0(x,3)
  curve(StepFunction, 0, 5, col='purple')
} 

Fig.2 <- function() {
  par(mfrow=c(2,1), mar=c(4,4,1,1))
  curve(sf2, 0, 700, xlab='offset', ylab='value', col='blue')
  curve(sf3, 0, 700, xlab='offset', ylab='subdivisions', col='red')
}

At offset of 460, the value quickly turns to zero, and the subdivisions also drops to 1. This is the point of unreliability begins. The function cannot integrate the step function when the offset is greater than 459. At this offset, the integrand is 99.8% flat and the width of step peak is 0.2%. We must doubt the result, when the integrand has a very narrow peak that has a major contribution to the integrate while the other range is almost flat or periodical. In this case, the function is fooled, and thinks the entire range is flat.

> sf4(458:462)
         value subdivisions
[1,] 0.9999569           21
[2,] 0.9999932           21     offset 459
[3,] 0.0000000            1     offset 460
[4,] 0.0000000            1
[5,] 0.0000000            1

The subdivisions can likely be used as a signal of reliability. That is a number of division to calculate the whole integration. So if the number is 1 or a few, the integrate function thought the integrand function is flat and calculate the integration as a single rectangle. If the integrand is really flat, the value may take 1 and the value is reliable. At least, checking the lower value of subdivisions is a good way to ensure the integrate value.

Fig. 3. Colored by subdivisions number

Fig. 3. Colored by subdivisions number

I colored the result of integrate by this value. The red color is warnings. At the lowest point (offset=0), a red point is seen, but it is correct because the integrand is really a flat line. And the bad values are all colored by red.

Fig.3 <- function() {
  x<-sf4(0:700)
  y <- ifelse(x[,'subdivisions'] <= 1, 2, 1)
  plot(x[,'value'], col=c('#00ff0022','#ff000066')[y], 
       cex=c(6,4)[y], pch=c(4,21)[y], xlab='offset', ylab='value')
}

Leave a comment

Information

This entry was posted on April 23, 2013 by and tagged , .
The stupidest thing...

Statistics, genetics, programming, academics

ЯтомизоnoR

R, Statistics