» An ultimate catalog of computer data
Forum posts for df.exe
Is there a better way to find the percent of one column that meets a criteria for each value…
I have a data frame with columns grade.equivalent and scaled.score, both numeric. I'd like to find the percent of students at or above a given scaled.score for all students at or above each grade.equivalent.
For example, given the following data frame:
df.ex <- data.frame(grade.equivalent=c(2.4,2.7,3.1,2.5,1.4,2.2,2.3,1.7,1.3,2.2),
scaled.score=c(187,277,308,268,236,305,298,246,241,138)
)
I'd like to know for each grade.equivalent, what percent of students scored above 301 out of all students scoring at or above that grade.equivalent.
To do this I did the following:
find.percent.basic <- function(cut.ge, data, cut.scaled.score){
df.sub <- subset(data, grade.equivalent >= cut.ge & !is.na(scaled.score))
denom <- nrow(df.sub)
df.sub <- subset(df.sub, scaled.score >= cut.scaled.score)
numer <- nrow(df.sub)
return(numer/denom)
}
grade.equivs <- unique(df.ex$grade.equivalent)
grade.equivs <- grade.equivs[order(grade.equivs)]
just.percs <- sapply(grade.equivs, find.percent.basic, data=df.ex, cut.scaled.score=301)
new.df <- data.frame(grade.equivalent=grade.equivs, perc=just.percs)
I plan to wrap this in a function and use it with plyr.
My question is, is there a better way to do this? It seems like this might be a base function of r or a common package that I just don't know about.
Thanks for any thoughts.
EDIT for clarification
The code above produces the following result, which is what I'm looking for:
grade.equivalent perc
1 1.3 0.2000000
2 1.4 0.2222222
3 1.7 0.2500000
4 2.2 0.2857143
5 2.3 0.2000000
6 2.4 0.2500000
7 2.5 0.3333333
8 2.7 0.5000000
9 3.1 1.0000000
Edited for clarification a second time, per observations from @DWin
View complete forum thread with replies
Other posts related to df.exe
See Related Forum Messages: Follow the Links Below to View Complete Thread
R - Create subset data frame using variable
Exploding date range as row is R
Is there a better way to find the percent of one column that meets a criteria for each value…
Error: This name does not have a type, and must have an explicit type
Fortran program errors
Fortran “Error: The shapes of the array expressions do not conform.”