r - Wrong result from mean(x, na.rm = TRUE) -


i want compute mean, min , max of series of managers returns, follows:

managerret <-data.frame(diff(managerprices)/lag(managerprices,k=-1)) 

i replace return = 0 nan since data extracted database , not dates populated.

managerret = replace(managerret,managerret==0,nan) 

i have following 3 function

> min(managerret,na.rm = true) [1] -0.0091716  > max(managerret,na.rm = true) [1] 0.007565  > mean(managerret,na.rm = true)*252 [1] nan 

why mean function returns nan value while min , max performe calculation properly?

below can find zoo object mangerret

> managerret                manager 2011-10-04         nan 2011-10-05         nan 2011-10-06         nan 2011-10-07         nan 2011-10-11         nan 2011-10-12         nan 2011-10-13         nan 2011-10-14         nan 2011-10-17         nan 2011-10-18         nan 2011-10-19         nan 2011-10-20         nan 2011-10-21         nan 2011-10-24         nan 2011-10-25         nan 2011-10-26         nan 2011-10-27         nan 2011-10-28         nan 2011-10-31  6.3832e-04 2011-11-01 -4.4625e-06 2011-11-02  2.8142e-03 2011-11-03  5.1114e-04 2011-11-04 -1.0105e-03 2011-11-07  7.5650e-03 2011-11-08  2.1002e-03 2011-11-09 -9.1716e-03 2011-11-10  1.1173e-03 2011-11-14 -6.9207e-03 2011-11-15  2.6241e-04 2011-11-16  1.7520e-03 2011-11-17 -2.6443e-05 2011-11-18 -1.4169e-03 2011-11-21  3.7602e-04 2011-11-22  4.3982e-05 2011-11-23 -6.7328e-06 2011-11-25  1.1571e-05 2011-11-28  1.4016e-07 2011-11-29 -2.0426e-07 

additional info required

> sessioninfo() r version 2.15.2 (2012-10-26) platform: i386-w64-mingw32/i386 (32-bit)  locale: [1] lc_collate=italian_italy.1252  lc_ctype=italian_italy.1252    [3] lc_monetary=italian_italy.1252 lc_numeric=c                   [5] lc_time=italian_italy.1252      attached base packages: [1] stats     graphics  grdevices utils     datasets  methods   base       other attached packages:  [1] gwidgetsrgtk2_0.0-81       gwidgets_0.0-52             [3] rgtk2_2.20.25              lattice_0.20-15             [5] moments_0.13               data.table_1.8.8            [7] tseries_0.10-30            timedate_2160.97            [9] performanceanalytics_1.1.0 xts_0.9-3                  [11] zoo_1.7-9                  rodbc_1.3-6                 loaded via namespace (and not attached): [1] grid_2.15.2    quadprog_1.5-4 

you should using colmeans this:

colmeans(managerret, na.rm=true) ##       manager  ## -6.826297e-05  

if had been data.frame, have received warning (but correct output).

here, have exposed inconsistency in way data.frame , zoo object subsetted [ logical matrix index. appears bug in [.zoo. have emailed maintainer.

the problem occurs @ step within mean.default:

if (na.rm)      x <- x[!is.na(x)] 

here going awry:

managerret[!is.na(managerret)] ##   1  ## nan  

!is.na(managerret) looks expected, isn't:

class(!is.na(managerret)) [1] "matrix" 

this class unexpected in [.zoo. these lines present:

if (all(class(i) == "logical"))      <- which(rep(i, length.out = n2)) else if (inherits(i, "zoo") && all(class(coredata(i)) ==      "logical")) {     <- which(coredata(merge(zoo(, time(x)), i))) } else if (!((all(class(i) == "numeric") || all(class(i) ==      "integer"))))      <- which(match(index(x), i, nomatch = 0l) > 0l) 

the last line here run in case, producing incorrect results.

the structure:

> dput(managerret) structure(c(nan, nan, nan, nan, nan, nan, nan, nan, nan, nan,  nan, nan, nan, nan, nan, nan, nan, nan, 0.00063832, -4.4625e-06,  0.0028142, 0.00051114, -0.0010105, 0.007565, 0.0021002, -0.0091716,  0.0011173, -0.0069207, 0.00026241, 0.001752, -2.6443e-05, -0.0014169,  0.00037602, 4.3982e-05, -6.7328e-06, 1.1571e-05, 1.4016e-07,  -2.0426e-07), .dim = c(38l, 1l), .dimnames = list(c("2011-10-04",  "2011-10-05", "2011-10-06", "2011-10-07", "2011-10-11", "2011-10-12",  "2011-10-13", "2011-10-14", "2011-10-17", "2011-10-18", "2011-10-19",  "2011-10-20", "2011-10-21", "2011-10-24", "2011-10-25", "2011-10-26",  "2011-10-27", "2011-10-28", "2011-10-31", "2011-11-01", "2011-11-02",  "2011-11-03", "2011-11-04", "2011-11-07", "2011-11-08", "2011-11-09",  "2011-11-10", "2011-11-14", "2011-11-15", "2011-11-16", "2011-11-17",  "2011-11-18", "2011-11-21", "2011-11-22", "2011-11-23", "2011-11-25",  "2011-11-28", "2011-11-29"), "manager"), index = 1:38, class = "zoo") 

old code - colmeans proper way this: specifying "column" $ gets around this:

mean(managerret, na.rm=true) ## [1] nan mean(managerret$manager, na.rm=true) ## [1] -6.826297e-05 

Comments

Popular posts from this blog

asp.net mvc 3 - Using mvc3, I need to add a username/password to the sql connection string at runtime -

kineticjs - draw multiple lines and delete individual line -

thumbnails - jQuery image rotate on hover -