Module # 8 Input/Output, string manipulation and plyr package
String Manipulation and the plyr Package
1) We have three important steps we need to cover for this week's assingent. Step #1: Import some data then, run the commend "mean" using Sex as the category (use plyr package for this operation). Last commend in this step: write the resulting output to a file. Step#2: Convert the data set to a dataframe for names whos' name contains the letter i, then create a new data set with those names, Write those names to a file separated by comma’s (CSV). Step#3: Write the filtered data set and convert it to CSV file. With all of that laid out lets write some code to make it all happen.
> #Module 8 assignment: Input/Output, string manipulation and plyr package
> # Install necessary packages if not already installed
> install.packages("plyr")
> library(plyr)
> library(data.table)
>
> # Step 1: Read the data from the file
> students <- fread("CENSORED/Assignment 6 Dataset.txt", header = TRUE, sep = ",") # Using fread from data.table package
> print(students)
Name Age Sex Grade
<char> <int> <char> <int>
1: Raul 25 Male 80
2: Booker 18 Male 83
3: Lauri 21 Female 90
4: Leonie 21 Female 91
5: Sherlyn 22 Female 85
6: Mikaela 20 Female 69
7: Raphael 23 Male 91
8: Aiko 24 Female 97
9: Tiffaney 21 Female 78
10: Corina 23 Female 81
11: Petronila 23 Female 98
12: Alecia 20 Female 87
13: Shemika 23 Female 97
14: Fallon 22 Female 90
15: Deloris 21 Female 67
16: Randee 23 Female 91
17: Eboni 20 Female 84
18: Delfina 19 Female 93
19: Ernestina 19 Female 93
20: Milo 19 Male 67
Name Age Sex Grade
>
> # Calculate the mean grade for each sex category
> students_gendered_mean <- ddply(students, "Sex", summarise, Grade.Average = mean(Grade))
>
> # Step 1: Write the output to a file
> write.table(students_gendered_mean, "Students_Gendered_Mean.txt", row.names = FALSE, sep = "\t")
>
> # Step 2: Filter the dataset for names containing the letter "i"
> i_students <- subset(students, grepl("i", Name, ignore.case = TRUE))
> print(i_students)
Name Age Sex Grade
<char> <int> <char> <int>
1: Lauri 21 Female 90
2: Leonie 21 Female 91
3: Mikaela 20 Female 69
4: Aiko 24 Female 97
5: Tiffaney 21 Female 78
6: Corina 23 Female 81
7: Petronila 23 Female 98
8: Alecia 20 Female 87
9: Shemika 23 Female 97
10: Deloris 21 Female 67
11: Eboni 20 Female 84
12: Delfina 19 Female 93
13: Ernestina 19 Female 93
14: Milo 19 Male 67
>
> # Step 3: Write the filtered data to a CSV file
> write.csv(i_students, "i_students.csv", row.names = FALSE)
2) Just like that we have a new file, separated by commas with the names of students who have the letter "i" in them. Not much more to talk about this week.
Comments
Post a Comment