Create rad models for logseries and lognormal #48

piklprado · 2015-05-21T21:56:47Z

Check May (1975) and Wilson, J. B. (1991) for analytic expressions.

andrechalom · 2015-05-29T01:26:56Z

Examining code from radpred, I believe this is a very general way of setting "rank distribution" functions:

drlnorm <- function(x, meanlog, sdlog, S) {
 qlnorm(ppoints(S)[x], meanlog, sdlog, lower.tail=FALSE) / sum(qlnorm(ppoints(S), meanlog, sdlog))
}

Does this make any sense?

piklprado · 2015-05-29T04:41:15Z

Checking with Broken-Stick gave a good agreement, but not perfect

> drbs2 <- function(x, N, S) {
+  qbs(ppoints(S)[x], N, S, lower.tail=FALSE) / sum(qbs(ppoints(S), N, S))
+ }
> all.equal(drbs2(1:10, N=100, S=10), drbs(1:10, N=100, S=10))
[1] "Mean relative difference: 0.043779"

piklprado · 2015-05-29T04:45:32Z

But looking forward for fitings something went wrong, at least for logseries

> drls <- function(x, alpha, N, S) {
+  qls(ppoints(S)[x], N, alpha, lower.tail=FALSE) / sum(qlnorm(ppoints(S), N, alpha))
+ }
> okland.ranks <- rep(1:length(okland),sort(okland, decreasing=TRUE))
> LL2 <- function(alpha)
+     -sum(log(drls(okland.ranks, alpha, sum(okland), length(okland))))
> (okland.ls <- fitls(okland))
Maximum likelihood estimation
Type: discrete species abundance distribution
Call:
mle2(minuslogl = function (N, alpha) 
-sum(dls(x, N, alpha, log = TRUE)), start = list(alpha = 3.37806926793836), 
    method = "Brent", fixed = list(N = 1689), data = list(x = list(
        438, 257, 248, 157, 143, "etc")), lower = 0, upper = 21L)

Coefficients:
          N       alpha 
1689.000000    3.378058 

Log-likelihood: -112.41 
> (okland.rls <- mle2(LL2, list(alpha=3.37)))
Erro em optim(par = 3.37, fn = function (p)  : 
  valor inicial em vmmin não é finito
## Trying one-dimensional fit
> (okland.rls <- mle2(LL2, list(alpha=3.37), method="Brent", lower=1, upper=10))

Call:
mle2(minuslogl = LL2, start = list(alpha = 3.37), method = "Brent", 
    lower = 1, upper = 10)

Coefficients:
alpha 
   10 

Log-likelihood: -Inf 
Houve 37 avisos (use warnings() para vê-los)

Also, fitting ws extremely slow. But this can be solved using the estimated coeficients from the faster fitls and eval.only=TRUE

piklprado · 2015-05-29T04:47:07Z

Same problem when trying to fit the lognormal:

> drlnorm <- function(x, meanlog, sdlog, S) {
+  qlnorm(ppoints(S)[x], meanlog, sdlog, lower.tail=FALSE) / sum(qlnorm(ppoints(S), meanlog, sdlog))
+ }
> okland.ranks <- rep(1:length(okland),sort(okland, decreasing=TRUE))
> LL1 <- function(mu, sigma)
+     -sum(log(drlnorm(okland.ranks, mu, sigma ,length(okland))))
> (okland.ln <- fitlnorm(okland))
Maximum likelihood estimation
Type: continuous species abundance distribution
Call:
mle2(minuslogl = function (meanlog, sdlog) 
-sum(dlnorm(x, meanlog, sdlog, log = TRUE)), start = list(meanlog = 3.36487212445623, 
    sdlog = 1.54523635728825), data = list(x = list(438, 257, 
    248, 157, 143, "etc")))

Coefficients:
 meanlog    sdlog 
3.364872 1.507996 

Log-likelihood: -109.09 
> (okland.rln <- mle2(LL1, list(mu=1, sigma=1)))

Call:
mle2(minuslogl = LL1, start = list(mu = 1, sigma = 1))

Coefficients:
      mu    sigma 
1.000000 1.368092 

Log-likelihood: -3926.41 
> (okland.rln <- mle2(LL1, list(mu=100, sigma=1)))

Call:
mle2(minuslogl = LL1, start = list(mu = 100, sigma = 1))

Coefficients:
        mu      sigma 
100.000000   1.368092 

Log-likelihood: -3926.41

andrechalom · 2015-05-29T21:00:35Z

The difference for drbs2 is probably due to qbs returning an integer, while drbs returns a float:

> drbs(1:10, 100, 10)*100
 [1] 29.29 19.29 14.29 10.96 8.46 6.46 4.79 3.36 2.11 1.00
> qbs(ppoints(10,a=1/2),100,10, lower.tail=F)
 [1] 28 19 14 11  8  6  4  3  1  1

However, rlnorm seems to be constant on meanlog. It appears that qlnorm and sum(qlnorm) scale by the same factor with changing meanlog:

> drlnorm(1:5, 1, 0.4, 30)
[1] 0.07234592 0.05963141 0.05370137 0.04974786 0.04675000
> drlnorm(1:5, 100, 0.4, 30)
[1] 0.07234592 0.05963141 0.05370137 0.04974786 0.04675000
> drlnorm(1:5, 0.01, 0.4, 30)
[1] 0.07234592 0.05963141 0.05370137 0.04974786 0.04675000

andrechalom · 2015-05-29T22:38:12Z

Wilson 91 gives the following expression for the "log-abundance of the i-th species of S":
ln A_i = meanlog + meansd \Phi^{-1} (ppoints(S)[S-i])

As we are interested only on the effects of meanlog, I will write this as
ln A_i = meanlog + F(etc)

In order to transform this into a density, we need to normalize it:

density (i, meanlog, etc) = exp(meanlog + F(etc)) / sum(exp(meanlog + F(etc)))
= exp(meanlog) * exp(F(etc)) / [ exp(meanlog) * sum(exp(F(etc))) ]
= exp(F(etc)) / [ sum(exp(F(etc))) ]

So it seems that density(i, meanlog, etc) is actually constant on meanlog. I find this very intriguing.

andrechalom · 2015-06-02T18:02:45Z

OK, back to the logseries: first, note that there is a typo in drls, where it calls sum(qlnorm)). Fixed it as

> drls <- function(x, alpha, N, S) {
+  qls(ppoints(S)[x], N, alpha, lower.tail=FALSE) / sum(qlnorm(ppoints(S), N, alpha))
+ }

The original function is also slow as hell, because qls() is being called for EACH value in okland.ranks, which is a vector with several repeated elements. The following function is optimized to deal with this situation:

> drls2 <- function(x, alpha, N, S) {
+  myq <- qls(ppoints(S), N, alpha, lower.tail=FALSE) / sum(qls(ppoints(S), N, alpha))
+  myq[x]
+ }
> library(microbenchmark)
> microbenchmark(drls2(okland.ranks, alpha, N, S), drls(okland.ranks, alpha, N, S), times=5)

However, this function is very ill-behaved, and seems to converge to an alpha around 27:

> (okland.rls <- mle2(LL2, list(alpha=27.53629), method="Brent", lower=2, upper=35, eval.only=T))
> bbmle::profile(okland.rls, prof.lower=0, prof.upper=30, std.err=0.17)->okland.p
> plotprofmle(okland.p)

andrechalom · 2015-06-02T23:25:24Z

After some days working on this issue with very little progress, I'm voting on moving it to a next milestone, and releasing sads 0.2.0 on CRAN after merging #53

piklprado · 2015-06-11T22:29:01Z

Many open theoretical and comp questions. Needs further thinking, not a milestone for version 0.2 anymore.

andrechalom · 2015-06-12T22:01:34Z

Ideas to test:

2 random lognormal communities with the same sigma but different mu have the same rad distribution?
fitrlnorm(...) and fitrlnorm(start = fitlnorm(...), eval.only=TRUE) have the same AIC?
Can we develop a general "sadtorad" method based on the formula above and "eval.only=TRUE"?

piklprado · 2015-11-20T23:06:14Z

The main purpose here is to make likelihoods from sad and rad fits comparable.
An alternative solution is to look for a as.fitsad method or function to coerce fitrad objects to fitsad, maybe with eval.only=TRUE.

andrechalom · 2015-11-25T22:02:16Z

The answer to the first question above is: yes.

N = 1e5 
x1 <- rsad(S=N, frac=1, sad="lnorm", meanlog=5)
x2 <- rsad(S=N, frac=1, sad="lnorm", meanlog=7)
x3 <- rsad(S=N, frac=1, sad="lnorm", meanlog=9)
plot(rad(x1), type='l', xlim=c(0, 1e5), ylim=c(1, max(x3)))
lines(rad(x2), col='blue')
lines(rad(x3), col='red')

andrechalom · 2015-11-26T19:23:24Z

Implemented a tentative as.fitrad on branch https://github.com/piLaboratory/sads/tree/rlnorm. Right now, as.fitrad(fitls(...)) and as.fitrad(fitlnorm(...)) seem to be working well. Both logLik() and AIC() return reasonable values for these objects, well in range of the other fitrads.

I have used a general function drad, and this code can easily be adapted to other distributions. The main issue that we should work out now is how to deal with the existing methods for fitrad objects, such as plots, radpreds, etc. In the way we have built the package so far, these functions expect the function dXXX to provide the density for the rad distribution XXX, but now we need to use drad("XXX", ...) for rad distributions that have been generated from a sads distribution.

The performance is also a little better than in the first versions, as I have used the optimized code above, and eval.only=TRUE, so as.fitrad(fitls(birds)) is only taking a couple of seconds on my computer.

andrechalom · 2015-12-17T19:31:08Z

It is a bit strange to see that fitlnorm seem to provide a better fit for birds, but the fitrad version shows the opposite. The ordering of bs/rbs seems more consistent:

> AICtab(fitls(birds), fitlnorm(birds), fitbs(birds), base=T)
                AIC   dAIC  df
fitlnorm(birds) 934.8   0.0 2 
fitls(birds)    945.4  10.6 1 
fitbs(birds)    992.8  58.0 0 
> AICtab(as.fitrad(fitls(birds)), as.fitrad(fitlnorm(birds)), fitrbs(birds), base=T)
                           AIC      dAIC     df
as.fitrad(fitls(birds))     98555.6      0.0 2 
as.fitrad(fitlnorm(birds))  99163.4    607.8 2 
fitrbs(birds)              102071.2   3515.6 0

The relevant code from drad seems ok, so maybe this is a problem similar to the "trueLL" weirdness.

Should we merge this? Should we add more distributions to the "as.fitrad" code? Should we investigate something else?

piklprado · 2015-12-29T21:38:42Z

Too weird, back to the drawing board. Moving milestone to 1.0.0

piklprado added the enhancement label May 21, 2015

andrechalom added this to the sads 0.2.0 milestone May 22, 2015

piklprado removed this from the sads 0.2.0 milestone Jun 11, 2015

andrechalom added this to the sads 0.3.0 milestone Jun 13, 2015

andrechalom added the help wanted label Oct 9, 2015

andrechalom mentioned this issue Nov 30, 2015

[q] codes can be optimized #104

Closed

andrechalom added question waiting checks labels Dec 17, 2015

piklprado modified the milestones: sads 1.0.0, sads 0.3.0 Dec 29, 2015

andrechalom removed the waiting checks label Jan 4, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Create rad models for logseries and lognormal #48

Create rad models for logseries and lognormal #48

piklprado commented May 21, 2015

andrechalom commented May 29, 2015

piklprado commented May 29, 2015

piklprado commented May 29, 2015

piklprado commented May 29, 2015

andrechalom commented May 29, 2015

andrechalom commented May 29, 2015

andrechalom commented Jun 2, 2015

andrechalom commented Jun 2, 2015

piklprado commented Jun 11, 2015

andrechalom commented Jun 12, 2015

piklprado commented Nov 20, 2015

andrechalom commented Nov 25, 2015

andrechalom commented Nov 26, 2015

andrechalom commented Dec 17, 2015

piklprado commented Dec 29, 2015

Create rad models for logseries and lognormal #48

Create rad models for logseries and lognormal #48

Comments

piklprado commented May 21, 2015

andrechalom commented May 29, 2015

piklprado commented May 29, 2015

piklprado commented May 29, 2015

piklprado commented May 29, 2015

andrechalom commented May 29, 2015

andrechalom commented May 29, 2015

andrechalom commented Jun 2, 2015

andrechalom commented Jun 2, 2015

piklprado commented Jun 11, 2015

andrechalom commented Jun 12, 2015

piklprado commented Nov 20, 2015

andrechalom commented Nov 25, 2015

andrechalom commented Nov 26, 2015

andrechalom commented Dec 17, 2015

piklprado commented Dec 29, 2015