data processing using r -


i have file .ped contains several columns, , want extract informations it. here sample of data (there no header):

1  1  1  1  2  1 2  3  2 3  4  1 3  5  2 ... 

the first column indicates id family, second id individual, third sex of individual.

i read table dataframe

ped <- read.table("pedigree.ped", header=false) 

how can compute number of families exist (one family can appear more 1 time , want consider them one)? have sex column 1 designate male , 2 female, how can distribution of males , females in data set?

i'm newbie r, if can give code!

thanks in advanced.

since new r, suggest looking excel first. operations asking simple , can done in excel.

if want use r data.frame indexing, subsetting etc.

if familiar sql, in sqldf package

number of families:

numfamilies <- length(unique(ped[,1])) 

number of males & females:

nummales <- sum(ped[,3] == 1) numfemales <- sum(ped[,3] == 2) 

Comments