propmiss <- function(dataframe) lapply(dataframe,function(x) data.frame(nmiss=sum(is.na(x)), n=length(x), propmiss=sum(is.na(x))/length(x)))
Let's try it out.
#simulate some fake data
fakedata=data.frame(var1=c(1,2,NA,4,NA,6,7,8,9,10),var2=c(11,NA,NA,14,NA,16,17,NA,19,NA))
print(fakedata)
var1 var2
1 1 11
2 2 NA
3 NA NA
4 4 14
5 NA NA
6 6 16
7 7 17
8 8 NA
9 9 19
10 10 NA
# summarize the missing data
propmiss(fakedata)
$var1
nmiss n propmiss
1 2 10 0.2
$var2
nmiss n propmiss
1 5 10 0.5
Running that function returns a list of data.frame objects. You can access the proportion missing for var1 by running propmiss(fakedata)$var1$propmis.
*Edit 2011-02-23*
Commenter A. Friedman asked for a version of this function that gives you the output as a data frame. The function's a bit uglier because something was being coerced as a list, but this does the trick:
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
propmiss <- function(dataframe) { | |
m <- sapply(dataframe, function(x) { | |
data.frame( | |
nmiss=sum(is.na(x)), | |
n=length(x), | |
propmiss=sum(is.na(x))/length(x) | |
) | |
}) | |
d <- data.frame(t(m)) | |
d <- sapply(d, unlist) | |
d <- as.data.frame(d) | |
d$variable <- row.names(d) | |
row.names(d) <- NULL | |
d <- cbind(d[ncol(d)],d[-ncol(d)]) | |
return(d[order(d$propmiss), ]) | |
} | |