windlkak.blogg.se - Data dredging eda

This is despite the fact that latent variable modeling (LVM) is also considered as an approach to reducing collinearity (see below).įinally, I highly recommend a very comprehensive paper on the topic by Dormann, Elith, Bacher, Buchmann, Carl, Carré et al. While much of literature is focused on dealing with collinearity in multiple regression settings, it should be noted that researchers, who use structural equation modeling (SEM) in their studies, face similar issues of collinearity (Grewal, Cote & Baumgartner, 2004). Other approaches to dealing with (mainly reducing) collinearity include: increasing sample size and transforming predictors (Baguley, 2012) using principal component analysis (PCA), using simple regression between highly correlated variables ( sequential regression) and calculating ratio of correlated variables (Balling, n.d.) a priori modeling and ridge regression (Graham, 2003). He also mentions that doing nothing should be considered as one of the valid approaches to dealing with collinearity as well. Before mentioning other solutions, it is worth to say that sometimes recommended option of dropping predictors is considered as rather bad one - see this blog post or the blog author's book (Baguley, 2012). However, while I ran across mentioning this approach several times, it was unclear to me which designs exactly are helpful in that regard and why (while StatsStudent mentions one such method - stratified sampling, relevant sources are not provided). Potential solutions for preventing / avoiding / dealing with collinearity include using appropriate research designs, which reduce collinearity. This is likely what Ieno and Zuur (2015) mean by their phrase, which you've cited in your question above.

However, collinearity can be prevented/avoided to some degree prior to data analysis, that is, during research design planning or, possibly, exploratory data analysis (EDA) phases. Therefore, a particular data set has certain levels of collinearity (or the lack of).

As far as I understand, collinearity or multicollinearity (hereafter referred to simply as collinearity) cannot be prevented/avoided during data analysis, because collinearity is a built-in "feature" of data.