r - Linear regression on a log-log plot - plot lm() coefficients manually as abline() is producing a bad fit -
i apologise have asked question along same lines before answer working until now. have produced 6 plots looked using method, i've gotten 2 weird ones. can see "lack of fit" using example:
x=c(9222,187720,42162,7005,3121,7534,21957,272901,109667,1394312,12230,69607471,79183,6389,64859,32479,3535,9414098,2464,67917,59178,2278,33064,357535,11876,21036,11018,12499632,5160,84574) y=c(0,4,1,0,1,0,0,1,5,13,0,322,0,0,1,1,1,32,0,0,0,0,0,0,0,0,0,33,1,1) lin=lm(y~x) plot(x, y, log="xy") abline(lin, col="blue", untf=true)
this plot have produced using real data (log-log on left, normal on right):
i wasn't concerned missing 0 values assumed lin still take these account, can see on log plot line not start near (1,1). how looks expect see points @ around (1000,10).
anyone know what's going on? manually plotting coefficients of lin help? if so, can explain me how this?
first let's @ leverage plot of linear model:
plot(lin,which=5)
as see points 12 (y=322) , 28 (y=33) influential. furthermore scatter around fitted line becomes larger increasing x values. thus, seems appropriate weighted regression:
lin2 <- lm(y~x,weights=1/x) summary(lin2) call: lm(formula = y ~ x, weights = 1/x) weighted residuals: min 1q median 3q max -0.006699 -0.003383 -0.002407 0.002521 0.012733 coefficients: estimate std. error t value pr(>|t|) (intercept) 3.099e-01 1.092e-01 2.838 0.00835 ** x 4.317e-06 5.850e-07 7.381 4.89e-08 *** --- signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 residual standard error: 0.005674 on 28 degrees of freedom multiple r-squared: 0.6605, adjusted r-squared: 0.6484 f-statistic: 54.47 on 1 , 28 df, p-value: 4.888e-08 plot(lin2,which=5)
this better.
plot(x, y, log="xy",ylim=c(0.1,350)) abline(lin, col="blue", untf=true) abline(lin2, col="green", untf=true)
(keep in mind, 0 values not plotted here)
depending on data describe, might consider using generalized linear model.
Comments
Post a Comment