Awk script to read schdata
March 30, 2007 at 4:43 pm | In Announcements for courses | 1 CommentMotiniva found that selecting subset for a state out of schdata was taking a long time in R. Here is an awk script that does it in no time. This script will create a text file which can then be read into R.
Save the following script in a file. Make the file executable (chmod +x filename). Edit the file to change the output file name (in two places) and the name of the State. Run it (./filename).
##awk script starts here
cp /nfs/data/Education/”Seventh AISES”/flash/schdata.txt .
awk ‘
BEGIN { FS=”:”
OFS=”:”}
BEGIN { print (“STCODE”,”STNAME”,”DISTCODE”,”DISTNAME”,”TCODE”,”BLOCKCODE”,”BLOCKNAME”,”VILCODE”,”VILNAME”,”WCODE”,”SCHCODE”,”SCHNAME”,”SCHADD”,”AREA”,”SCHCAT”,”SCHBULD”,”FTMALE”,”FTFEMALE”,”PMALE”,”PFEMALE”,”PTMALE”,”PTFEMALE”,”EB1″,”EB2″,”EB3″,”EB4″,”EB5″,”EB6″,”EB7″,”EB8″,”EB9″,”EB10″,”EB11″,”EB12″,”EG1″,”EG2″,”EG3″,”EG4″,”EG5″,”EG6″,”EG7″,”EG8″,”EG9″,”EG10″,”EG11″,”EG12″,”TOT1″,”TOT2″,”TOT3″,”TOT4″,”TOT5″,”TOT6″,”TOT7″,”TOT8″,”TOT9″,”TOT10″,”TOT11″,”TOT12″,”TOTB”,”TOTG”,”TOTSCH”) > “orissa.csv”}
$2 == “Orissa” {print ($0) >> “orissa.csv”}
‘ schdata.txt
rm schdata.txt
##awk script ends here.
To read the output data into R, use
read.table(“filename”,sep=”:”)->statedata
V.
May be a little late today. Please wait.
March 29, 2007 at 6:44 am | In Announcements for courses | Leave a CommentMy daughter’s school is closed this week. I have to arrange for
somebody to baby-sit while I take the class. I will try to come in
time but if I am wait, please do wait. Go through the recent posts on
http://vikasrawal.wordpress.com and start working on the assignment
data while you are waiting for me.
Also for the same reason, can we keep tomorrow’s (Friday) class at
11.15 am?
Sorry for this.
V.
R: Further on import/export of data
March 28, 2007 at 2:12 pm | In Announcements for courses | Leave a CommentThe manual dealing with import/export of data to/from R is here. Please go through this to seek guidance on importing data into R. I hope all of you have subscribed to r-help mailing list. Go through the archives of the list for specific problems not discussed in the manual. Any problems that still remain, discuss in the class or see me in my office.
V.
Reading data from microsoft access files in linux
March 28, 2007 at 12:23 pm | In Announcements for courses, Data manipulation, GNU-R | Leave a CommentSome of the Census 2001 data are in microsoft access files (having filename extensions .mdb). A microsoft access file can have several tables inside, each of which contains data. There is a software called mdbtools that can be used to read access files.
The command mdb-tables can be used to see the names of tables and the command mdb-export can be used to export access tables to csv files. Use and to see the manual pages.
If mdbtools is not installed on your machine you can use it from the server. Go to the server using ssh (ssh -X username@cespserv) and then extract the file using mdb-tables and mdb-export.
V.
End semester assignment data
March 28, 2007 at 12:09 pm | In Announcements for courses, GNU-R | Leave a CommentI have read the schdata.dbf (the data file from the School Education Survey) into R and saved it as sch.Rdata file kept in /nfs/data/vikas.
You can use load(“/nfs/data/vikas/sch.Rdata”) to get this into your R session. This will create a dataframe called schdata. Please not that this is a rather large data set. If each of you keeps a copy, we are going to eat up all the space on the server. Please do not copy the file into your home directory. Also, after the data have been brought into your R session, use the subset command to extract data for the State/s that you are interested in. Save the subset into another dataframe and remove the original schdata dataframe .
With this, you do not need to invoke read.dbf function to read the datafile as I had explained in the previous post.
V.
End semester assignment data
March 28, 2007 at 11:55 am | In Announcements for courses, GNU-R | Leave a CommentI have read the schdata.dbf (the data file from the School Education Survey) into R and saved it as sch.Rdata file kept in /nfs/data/vikas.
You can use load(“/nfs/data/vikas/sch.Rdata”) to get this into your R session. This will create a dataframe called schdata. Please not that this is a rather large data set. If each of you keeps a copy, we are going to eat up all the space on the server. Please do not copy the file into your home directory. Also, after the data have been brought into your R session, use the subset command to extract data for the State/s that you are interested in. Save the subset into another dataframe and remove the original schdata dataframe <rm(“schdata”)>.
V.
Reading School Education Survey data in R
March 27, 2007 at 4:04 pm | In Announcements for courses | Leave a CommentThe school education survey data are in a large dbf file called “schdata.dbf” in the following location /nfs/data/Education/Seventh AISES/flash
A simple function to read dbf files into is available as part of the package foreign. foreign has been installed as part of the r installation on cespserv. Note that it may not be installed on each client terminal in the computer centre. If it is not installed on your machine, you can run R on cespserv as follows and use the library(foreign).
> ssh -X username@cespserv
Enter your password when prompted. This will take you to the terminal prompt of the server. (The prompt will not change to “username@cespserv:~$”).
Type R here to start R.
library (foreign) #this will invoke package foreign
?read.dbf #this will open the help page
# the following command should read the dbf file and create schdata
#dataframe.
read.dbf(“/nfs/data/Education/”Seventh AISES”/flash/schdata.dbf”, as.is=T)->schdata
If you want to go back and do rest of the work on your own machine’s R installation, quit and save the workspace. Exit from the server and start R in your own machine. schdata should reappear. Otherwise, it is just fine to do all the work on the server’s R.
V.
Steps to manually mount a USB flash drive in GNU/Linux
March 26, 2007 at 4:36 pm | In Announcements for courses, Linux, Ubuntu | Leave a CommentFor students who have been having problems with this. See the following:
http://linuxhelp.blogspot.com/2007/03/steps-to-manually-mount-usb-flash-drive.html
Please read the commands carefully. The user id and group id in the command will have to be replaced by your uid and gid. You can know your uid and gid by using the following command.
cat /etc/passwd | grep $USER
V.
simple.scatterplot: Two way distributions
March 21, 2007 at 12:34 pm | In GNU-R, Graphics, Statistics | 3 CommentsJohn Verzani’s book has a title page that shows a scatterplot with histograms of x and y variables along the two axes. It is a very powerful way of looking at two distributions. The plot was generated through a function simple.scatterplot. The function is made available as part of the UsingR package, which can be installed from CRAN. The syntax of simple.scatterplot is indeed quite simple and can be modified to, for example, show boxplots instead of histograms on the side. That would be really interesting!!
V.
Workshop sessions on R
March 20, 2007 at 5:14 pm | In Announcements for courses | Leave a CommentI want to start the workshop sessions on R simultaneously. It will be useful if people look at my course website and start going through the “Introductory texts”.
Also, I hope all of you have looked at the School Education Survey data. I want to get you started on the assignment in the next class.
V.
Blog at WordPress.com. | Theme: Pool by Borja Fernandez.
Entries and comments feeds.