A little trick in reading data

· Data manipulation, GNU-R
Authors

Census 1991 Primary Census Abstract files use a variable which is an 18 digit string. The string is a code that comprises codes for district, tehsil, block, panchayat and village. The village and town directories, however, give data for each village where villages are identified by three variables that capture the same codes for district, block and village (but in three different variables). To match the two sets of data, the 18 digit string had to be split into five different variables. Here is a little piece of code that did it.

#take out the variable code from distvc into a separate data frame called code
data.frame(distvc$CODE)->code
#write this data frame into a text file
write.table(code,”code”,col.names=FALSE,row.names=FALSE,quote=FALSE)

#read this text file using read.fwf, reading five different variables
read.fwf(“code”,width=c(2,4,4,4,4))->code2

#assign names to these variables
names(code2)=c(“DIST_CODE”, “THSIL_CODE”, “BLOCK_CODE”, “PANCH_CODE”, “VILL_CODE”)

#combine the new data frame code2 with the old data frame distvc
cbind(distvc,code2)->distvc

#delete the temporary objects
rm(code,code2)

%d bloggers like this: