Forum: Missing values in cca and rda: please test

Posted by: Jari Oksanen
Date: 2009-09-09 09:27
Summary: Missing values in cca and rda: please test
Project: vegan - Community Ecology Package

Development Status : 5 - Production/Stable
Intended Audience : End Users/Desktop
License : GNU General Public License (GPL)
Natural Language : English
Operating System : OS Independent
Programming Language : C/C++ : Fortran : R
Topic : Environmetrics : Multivariate Statistics : Ecological Analysis

Content:

Functions cca() and rda() can handle missing data in constraints or conditions (external variables) in current R-Forge version of vegan (1.16-27 and later). The functions have new argument na.action. The default is na.fail which stops the analysis if there were missing values in external variables. Other alternatives na.omit and na.exclude remove all variables with missing data in used external variables (constraints, conditions). With option na.exclude, the results for sites are padded to the same number of rows as the original data: the LC scores will be NA, but WA scores for sites are estimated even for rows with missing constraints in non-partial models. The LC scores are linear combinations of constraints and they are missing for missing data. The WA scores depend on community data, and they can be estimated even for missing constraints. In partial models with term Condition() the effect of conditions is removed from the community data, and these cannot be estimated with missing data, and therefore WA scores are not available for these.

The change was made so that the old internal structures in the cca or rda result objects were unchanged, except that the rows with missing data were removed. Therefore functions handling directly cca objects should work normally even with na.actions na.exclude or na.omit. Functions using scores() interface will be influenced by the na.action: with na.omit, the number of rows for site data does not match the original data, with na.exclude it has the same dimensions but may contain NA values.

At the moment there are no open issues, and the functions are ready for testing. Please try these functions, and report problems to the vegan forums in R-Forge or directly to Jari Oksanen at Oulu.fi.

Happy testing! Jari Oksanen

Latest News

vegan moves to GitHub and new release 2.2-0

Jari Oksanen - 2014-09-18 15:58 -

vegan moves to GitHub and new release 2.2-0

Jari Oksanen - 2014-09-18 15:57 -

vegan 2.0-6 in CRAN

Jari Oksanen - 2013-02-11 13:39 -

Trying GitHub

Jari Oksanen - 2012-02-26 16:27 -

Trying GitHub

Jari Oksanen - 2012-02-26 16:26 -

...

Submit News

Monitor Forum |

Start New Thread

open issues: step and anova(..., by = "term") [ Reply ]
By: Jari Oksanen on 2009-09-08 07:54

[forum:1907]

After the previous message, I found two open issues. In some sequential analyses the number of observations can change due to missingness. This concerned at least anova.cca(..., by = "term") and step (calling add1 and drop1).

Now anova.ccabyterm function checks if there are missing values, and stops with error if it finds any. The only alternative may be to have a listwise deletion of missing values for all terms, but this needs redesign of anova.ccabyterm.

The step() issue is more difficult, because here we mainly rely on standard stats functions step(), add1.default() and drop1.default(). The vegan functions add1.cca and drop1.cca are only used for additional analyses if the user asks for permutation tests (test = "permutation"). Therefore the detection of change of data should happen in these stats functions. These functions find the number of observations as length(object$residuals), and vegan cca and rda objects do not have that item, and meaningfully they cannot have. We cannot either bail out with missing data like we did in anova.ccabyterm, becaue this would happen only after fitting model with mixed numbers of observations.

In both cases, ideas are welcome.

Jari Oksanen