r - Linear regression on a log-log plot - plot lm() coefficients manually as abline() is producing a bad fit -


i apologise have asked question along same lines before answer working until now. have produced 6 plots looked using method, i've gotten 2 weird ones. can see "lack of fit" using example:

x=c(9222,187720,42162,7005,3121,7534,21957,272901,109667,1394312,12230,69607471,79183,6389,64859,32479,3535,9414098,2464,67917,59178,2278,33064,357535,11876,21036,11018,12499632,5160,84574) y=c(0,4,1,0,1,0,0,1,5,13,0,322,0,0,1,1,1,32,0,0,0,0,0,0,0,0,0,33,1,1) lin=lm(y~x) plot(x, y, log="xy") abline(lin, col="blue", untf=true) 

this plot have produced using real data (log-log on left, normal on right):

freaky slope

i wasn't concerned missing 0 values assumed lin still take these account, can see on log plot line not start near (1,1). how looks expect see points @ around (1000,10).

anyone know what's going on? manually plotting coefficients of lin help? if so, can explain me how this?

first let's @ leverage plot of linear model:

plot(lin,which=5) 

leverage plot of linear model

as see points 12 (y=322) , 28 (y=33) influential. furthermore scatter around fitted line becomes larger increasing x values. thus, seems appropriate weighted regression:

lin2 <- lm(y~x,weights=1/x) summary(lin2)  call: lm(formula = y ~ x, weights = 1/x)  weighted residuals:       min        1q    median        3q       max  -0.006699 -0.003383 -0.002407  0.002521  0.012733   coefficients:              estimate std. error t value pr(>|t|)     (intercept) 3.099e-01  1.092e-01   2.838  0.00835 **  x           4.317e-06  5.850e-07   7.381 4.89e-08 *** --- signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1   residual standard error: 0.005674 on 28 degrees of freedom multiple r-squared: 0.6605, adjusted r-squared: 0.6484  f-statistic: 54.47 on 1 , 28 df,  p-value: 4.888e-08    plot(lin2,which=5) 

leverage plot of weighted linear model

this better.

plot(x, y, log="xy",ylim=c(0.1,350)) abline(lin, col="blue", untf=true) abline(lin2, col="green", untf=true) 

results (keep in mind, 0 values not plotted here)

depending on data describe, might consider using generalized linear model.


Comments

Popular posts from this blog

monitor web browser programmatically in Android? -

Shrink a YouTube video to responsive width -

wpf - PdfWriter.GetInstance throws System.NullReferenceException -