ЯтомизоnoR

R, Statistics

Quantiles: median, quartiles, octiles, hexadeciles, …

Among statistical summaries, quantiles are alternatives to means and variances, and are prefered sometimes. The benefit of quantiles is derived by its simplicity of calculation.

  • Only the order.
  • No assumptions to the shape of distributions.


So, the quantile summary gives you a quick and elegant look over unknown distributed data. The most famous tool is the box and whisker plot with five number summary. The R has median, fivenum and boxplot functions in the base package.

How about next ones?
Nine number summary, seventeen number summary, and so on. I introduce here a midpoints function that can generates many of them.

Fig. 1. Median, Quartiles, Octiles and Hexadeciles

Fig. 1. Median, Quartiles, Octiles and Hexadeciles

> midpoints(1:17,1)
[1]  1  9 17
> midpoints(1:17,2)
[1]  1  5  9 13 17
> midpoints(1:17,3)
[1]  1  3  5  7  9 11 13 15 17
> midpoints(1:17,4)
[1]  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17
> midpoints(1:17,5)
[1]  1.0  1.5  2.0  2.5  3.0  3.5  4.0  4.5  5.0  5.5  6.0  6.5  7.0  7.5
[15]  8.0  8.5  9.0  9.5 10.0 10.5 11.0 11.5 12.0 12.5 13.0 13.5 14.0 14.5
[29] 15.0 15.5 16.0 16.5 17.0

The midpoints(x, n) function calculate (2^n+1) number summary, or (2^n) quantiles. Using this function, both ninenum and seventeennum functions can be defined easily.

ninenum <- function(x, ...) midpoints(x, 3, ...)
seventeennum <- function(x, ...) midpoints(x, 4, ...)

The full code is available here.

It calculates as similar as the fivenum function of R. Though the calculation of quantiles is very simple, there are still a little different manners. Mainly, that comes from how to round the fraction.

I also tried an another definition (Zar 2010) to check how much they differ. These charts uses a sequence of data, namely (1:n), to show the difference of the index itself.

Fig. 2. Differences between two methods of rounding

Fig. 2. Differences between two methods of rounding

Fig. 3. The relative difference converges

Fig. 3. The relative difference converges

The differences depend the number of samples and have clear periodicity. Because their absolute values are less or equal 0.5, the relative difference to the range of sample are negligible as the number of sample increases.

References

  1. Zar, J.H., 2010. Biostatistical analysis, Prentice-Hall/Pearson.
    ISBN 9780131008465
Advertisements

3 comments on “Quantiles: median, quartiles, octiles, hexadeciles, …

  1. Pingback: Making of elliplot package | ЯтомизоnoR

  2. Pingback: elliplot 1.1.0 package released | ЯтомизоnoR

  3. Pingback: Draw an Ellipse Summary Plot in R | ЯтомизоnoR

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Information

This entry was posted on April 28, 2013 by and tagged , , , , , , , , , .
The stupidest thing...

Statistics, genetics, programming, academics

ЯтомизоnoR

R, Statistics

%d bloggers like this: