Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create rad models for logseries and lognormal #48

Open
piklprado opened this issue May 21, 2015 · 15 comments
Open

Create rad models for logseries and lognormal #48

piklprado opened this issue May 21, 2015 · 15 comments

Comments

@piklprado
Copy link
Contributor

Check May (1975) and Wilson, J. B. (1991) for analytic expressions.

@andrechalom andrechalom added this to the sads 0.2.0 milestone May 22, 2015
@andrechalom
Copy link
Member

Examining code from radpred, I believe this is a very general way of setting "rank distribution" functions:

drlnorm <- function(x, meanlog, sdlog, S) {
 qlnorm(ppoints(S)[x], meanlog, sdlog, lower.tail=FALSE) / sum(qlnorm(ppoints(S), meanlog, sdlog))
}

Does this make any sense?

@piklprado
Copy link
Contributor Author

Checking with Broken-Stick gave a good agreement, but not perfect

> drbs2 <- function(x, N, S) {
+  qbs(ppoints(S)[x], N, S, lower.tail=FALSE) / sum(qbs(ppoints(S), N, S))
+ }
> all.equal(drbs2(1:10, N=100, S=10), drbs(1:10, N=100, S=10))
[1] "Mean relative difference: 0.043779"

@piklprado
Copy link
Contributor Author

But looking forward for fitings something went wrong, at least for logseries

> drls <- function(x, alpha, N, S) {
+  qls(ppoints(S)[x], N, alpha, lower.tail=FALSE) / sum(qlnorm(ppoints(S), N, alpha))
+ }
> okland.ranks <- rep(1:length(okland),sort(okland, decreasing=TRUE))
> LL2 <- function(alpha)
+     -sum(log(drls(okland.ranks, alpha, sum(okland), length(okland))))
> (okland.ls <- fitls(okland))
Maximum likelihood estimation
Type: discrete species abundance distribution
Call:
mle2(minuslogl = function (N, alpha) 
-sum(dls(x, N, alpha, log = TRUE)), start = list(alpha = 3.37806926793836), 
    method = "Brent", fixed = list(N = 1689), data = list(x = list(
        438, 257, 248, 157, 143, "etc")), lower = 0, upper = 21L)

Coefficients:
          N       alpha 
1689.000000    3.378058 

Log-likelihood: -112.41 
> (okland.rls <- mle2(LL2, list(alpha=3.37)))
Erro em optim(par = 3.37, fn = function (p)  : 
  valor inicial em vmmin não é finito
## Trying one-dimensional fit
> (okland.rls <- mle2(LL2, list(alpha=3.37), method="Brent", lower=1, upper=10))

Call:
mle2(minuslogl = LL2, start = list(alpha = 3.37), method = "Brent", 
    lower = 1, upper = 10)

Coefficients:
alpha 
   10 

Log-likelihood: -Inf 
Houve 37 avisos (use warnings() para vê-los)

Also, fitting ws extremely slow. But this can be solved using the estimated coeficients from the faster fitls and eval.only=TRUE

@piklprado
Copy link
Contributor Author

Same problem when trying to fit the lognormal:

> drlnorm <- function(x, meanlog, sdlog, S) {
+  qlnorm(ppoints(S)[x], meanlog, sdlog, lower.tail=FALSE) / sum(qlnorm(ppoints(S), meanlog, sdlog))
+ }
> okland.ranks <- rep(1:length(okland),sort(okland, decreasing=TRUE))
> LL1 <- function(mu, sigma)
+     -sum(log(drlnorm(okland.ranks, mu, sigma ,length(okland))))
> (okland.ln <- fitlnorm(okland))
Maximum likelihood estimation
Type: continuous species abundance distribution
Call:
mle2(minuslogl = function (meanlog, sdlog) 
-sum(dlnorm(x, meanlog, sdlog, log = TRUE)), start = list(meanlog = 3.36487212445623, 
    sdlog = 1.54523635728825), data = list(x = list(438, 257, 
    248, 157, 143, "etc")))

Coefficients:
 meanlog    sdlog 
3.364872 1.507996 

Log-likelihood: -109.09 
> (okland.rln <- mle2(LL1, list(mu=1, sigma=1)))

Call:
mle2(minuslogl = LL1, start = list(mu = 1, sigma = 1))

Coefficients:
      mu    sigma 
1.000000 1.368092 

Log-likelihood: -3926.41 
> (okland.rln <- mle2(LL1, list(mu=100, sigma=1)))

Call:
mle2(minuslogl = LL1, start = list(mu = 100, sigma = 1))

Coefficients:
        mu      sigma 
100.000000   1.368092 

Log-likelihood: -3926.41 

@andrechalom
Copy link
Member

The difference for drbs2 is probably due to qbs returning an integer, while drbs returns a float:

> drbs(1:10, 100, 10)*100
 [1] 29.29 19.29 14.29 10.96 8.46 6.46 4.79 3.36 2.11 1.00
> qbs(ppoints(10,a=1/2),100,10, lower.tail=F)
 [1] 28 19 14 11  8  6  4  3  1  1

However, rlnorm seems to be constant on meanlog. It appears that qlnorm and sum(qlnorm) scale by the same factor with changing meanlog:

> drlnorm(1:5, 1, 0.4, 30)
[1] 0.07234592 0.05963141 0.05370137 0.04974786 0.04675000
> drlnorm(1:5, 100, 0.4, 30)
[1] 0.07234592 0.05963141 0.05370137 0.04974786 0.04675000
> drlnorm(1:5, 0.01, 0.4, 30)
[1] 0.07234592 0.05963141 0.05370137 0.04974786 0.04675000

@andrechalom
Copy link
Member

Wilson 91 gives the following expression for the "log-abundance of the i-th species of S":
ln A_i = meanlog + meansd \Phi^{-1} (ppoints(S)[S-i])

As we are interested only on the effects of meanlog, I will write this as
ln A_i = meanlog + F(etc)

In order to transform this into a density, we need to normalize it:

density (i, meanlog, etc) = exp(meanlog + F(etc)) / sum(exp(meanlog + F(etc)))
= exp(meanlog) * exp(F(etc)) / [ exp(meanlog) * sum(exp(F(etc))) ]
= exp(F(etc)) / [ sum(exp(F(etc))) ]

So it seems that density(i, meanlog, etc) is actually constant on meanlog. I find this very intriguing.

@andrechalom
Copy link
Member

OK, back to the logseries: first, note that there is a typo in drls, where it calls sum(qlnorm)). Fixed it as

> drls <- function(x, alpha, N, S) {
+  qls(ppoints(S)[x], N, alpha, lower.tail=FALSE) / sum(qlnorm(ppoints(S), N, alpha))
+ }

The original function is also slow as hell, because qls() is being called for EACH value in okland.ranks, which is a vector with several repeated elements. The following function is optimized to deal with this situation:

> drls2 <- function(x, alpha, N, S) {
+  myq <- qls(ppoints(S), N, alpha, lower.tail=FALSE) / sum(qls(ppoints(S), N, alpha))
+  myq[x]
+ }
> library(microbenchmark)
> microbenchmark(drls2(okland.ranks, alpha, N, S), drls(okland.ranks, alpha, N, S), times=5)

However, this function is very ill-behaved, and seems to converge to an alpha around 27:

> (okland.rls <- mle2(LL2, list(alpha=27.53629), method="Brent", lower=2, upper=35, eval.only=T))
> bbmle::profile(okland.rls, prof.lower=0, prof.upper=30, std.err=0.17)->okland.p
> plotprofmle(okland.p)

@andrechalom
Copy link
Member

After some days working on this issue with very little progress, I'm voting on moving it to a next milestone, and releasing sads 0.2.0 on CRAN after merging #53

@piklprado
Copy link
Contributor Author

Many open theoretical and comp questions. Needs further thinking, not a milestone for version 0.2 anymore.

@piklprado piklprado removed this from the sads 0.2.0 milestone Jun 11, 2015
@andrechalom
Copy link
Member

Ideas to test:

  • 2 random lognormal communities with the same sigma but different mu have the same rad distribution?
  • fitrlnorm(...) and fitrlnorm(start = fitlnorm(...), eval.only=TRUE) have the same AIC?
  • Can we develop a general "sadtorad" method based on the formula above and "eval.only=TRUE"?

@andrechalom andrechalom added this to the sads 0.3.0 milestone Jun 13, 2015
@piklprado
Copy link
Contributor Author

The main purpose here is to make likelihoods from sad and rad fits comparable.
An alternative solution is to look for a as.fitsad method or function to coerce fitrad objects to fitsad, maybe with eval.only=TRUE.

@andrechalom
Copy link
Member

The answer to the first question above is: yes.

N = 1e5 
x1 <- rsad(S=N, frac=1, sad="lnorm", meanlog=5)
x2 <- rsad(S=N, frac=1, sad="lnorm", meanlog=7)
x3 <- rsad(S=N, frac=1, sad="lnorm", meanlog=9)
plot(rad(x1), type='l', xlim=c(0, 1e5), ylim=c(1, max(x3)))
lines(rad(x2), col='blue')
lines(rad(x3), col='red')

48

@andrechalom
Copy link
Member

Implemented a tentative as.fitrad on branch https://github.com/piLaboratory/sads/tree/rlnorm. Right now, as.fitrad(fitls(...)) and as.fitrad(fitlnorm(...)) seem to be working well. Both logLik() and AIC() return reasonable values for these objects, well in range of the other fitrads.

I have used a general function drad, and this code can easily be adapted to other distributions. The main issue that we should work out now is how to deal with the existing methods for fitrad objects, such as plots, radpreds, etc. In the way we have built the package so far, these functions expect the function dXXX to provide the density for the rad distribution XXX, but now we need to use drad("XXX", ...) for rad distributions that have been generated from a sads distribution.

The performance is also a little better than in the first versions, as I have used the optimized code above, and eval.only=TRUE, so as.fitrad(fitls(birds)) is only taking a couple of seconds on my computer.

@andrechalom
Copy link
Member

It is a bit strange to see that fitlnorm seem to provide a better fit for birds, but the fitrad version shows the opposite. The ordering of bs/rbs seems more consistent:

> AICtab(fitls(birds), fitlnorm(birds), fitbs(birds), base=T)
                AIC   dAIC  df
fitlnorm(birds) 934.8   0.0 2 
fitls(birds)    945.4  10.6 1 
fitbs(birds)    992.8  58.0 0 
> AICtab(as.fitrad(fitls(birds)), as.fitrad(fitlnorm(birds)), fitrbs(birds), base=T)
                           AIC      dAIC     df
as.fitrad(fitls(birds))     98555.6      0.0 2 
as.fitrad(fitlnorm(birds))  99163.4    607.8 2 
fitrbs(birds)              102071.2   3515.6 0 

The relevant code from drad seems ok, so maybe this is a problem similar to the "trueLL" weirdness.

Should we merge this? Should we add more distributions to the "as.fitrad" code? Should we investigate something else?

@piklprado
Copy link
Contributor Author

Too weird, back to the drawing board. Moving milestone to 1.0.0

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants