From ccbbc4294df7dc8b41daa7cd0f7f72aed000c19a Mon Sep 17 00:00:00 2001 From: Kirsten Date: Fri, 26 Sep 2014 19:51:47 -0400 Subject: [PATCH 1/5] added sample codebook using PROPPR --- PROPPR_codebook.md | 9 +++++++++ 1 file changed, 9 insertions(+) create mode 100644 PROPPR_codebook.md diff --git a/PROPPR_codebook.md b/PROPPR_codebook.md new file mode 100644 index 00000000..ff80a9dd --- /dev/null +++ b/PROPPR_codebook.md @@ -0,0 +1,9 @@ +Other sample code book. + +This sample is based on the PROPPR (https://clinicaltrials.gov/ct2/show/NCT01545232?term=PROPPr&rank=1) study of blood component ratios in transfusions in trauma patients. Warning: this is complicated. + +Experimental design and background: Blood is given to trauma patients as blood components: red blood cells, plasma and platelets. This experiment was aimed at seeing if increasing the ratio of red blood cells over that in normal whole blood would lower mortality. There were two treatment groups, normal and increased red cells. Patients were screened at 12 participating hospitals. Randomization was done by per mutated random blocks, stratified by site. + +Raw data: randomization assignment, date and time of admission to the hospital, type of injury, fluids given by EMT/paramedic before admission, hospital, time of blood product administration, lot number of blood product, time of hemostasis achieved, other fluids, pre-existing blood clotting diseases, pre-existing blood clotting inhibitors, date and time of discharge from ICU, date and time of ventilator start and stop, date and time of discharge from hospital and date and time of death (if death occurred within 30 days). + +Processed data: Assignment was converted to treatment group (factor variable), type of injury was coded as a factor penetrating (1) or blunt (0), hospital was coded as a factor (1-12), EMT fluids were binned in 500ml bins, pre-existing clotting diseases were coded as a factor (each assigned a number), pre-existing clotting medications were coded as a factor. Other fluids were binned in 1L bins. Date of discharge from ICU was converted to ICU-free days (out of 30), Ventilator times were converted to ventilator-free days (out of 30), date of discharge was converted to hospital-free days (out of 30). Amount of blood products was summed in two groups, amount given until hemostasis and amount given from hemostasis until 24 hours, these were considered as numeric amounts, these were calculated from summing the blood product lot numbers, sorted by time of administration. Date of death was converted to 24-hour mortality (factor: yes=1 no=0) and 30-day mortality (factor: yes=1, no=0). \ No newline at end of file From 61e2560d9f9df3e58908a42c425775082b7d7493 Mon Sep 17 00:00:00 2001 From: Kirsten Date: Fri, 26 Sep 2014 20:11:47 -0400 Subject: [PATCH 2/5] added a sample codebook using dogwalking --- Codebook_dogwalking.md | 18 ++++++++++++++++++ 1 file changed, 18 insertions(+) create mode 100644 Codebook_dogwalking.md diff --git a/Codebook_dogwalking.md b/Codebook_dogwalking.md new file mode 100644 index 00000000..a772071f --- /dev/null +++ b/Codebook_dogwalking.md @@ -0,0 +1,18 @@ +I feel the urge to write a sample codebook (for David's FAQ). + +It's very similar to a Statistical Analysis Plan, actually. + +Setup, there is a dogwalking business. It wants to analyze its work. + +Raw data is: name of dog, address of owner, time walked, date walked, size of dog (small, medium, or large), health of dog (well or sick) on that date and time, comments, and pay. + +The business wants to assign ID# to the dogs, and codewords to the address to make this data anonymous. There isn't anything to do to the comments--since free text is all over the place. + +Codebook: +The dog's name was transformed into an IDNumber (unique) (1-50), +the address was transformed into a factor, OwnerName (levels Alice, Bob, Charlie, Deborah, Ernest and Fred), +time and date walked were counted to make WalksPerWeek1, WalksPerWeek2, and WalksPerWeek3. Week1 begins at 00:01UTC on July1, 2014, Week2 begins at 00:01UTC on July8, 2014, Week3 begins at 00:01UTC on July15, 2014. +Health was summarized as HealthWeek1, HealthWeek2, and HealthWeek3. It is a factor with two levels, Well and Sick. If the dog was sick at any walk during that week, dog was marked sick, else dog was marked well. +Dog Size was converted into a factor: Large, Medium and Small are the levels. +Comments are dropped. +Pay is transformed into PayWeek1, PayWeek2, PayWeek3, which is a factor that has two levels (Yes, and No) for correct pay paid during that week. From 69bcffbeffbd48895acd95a994fdcb639d513141 Mon Sep 17 00:00:00 2001 From: Kirsten Date: Fri, 3 Oct 2014 18:31:45 -0400 Subject: [PATCH 3/5] Update getclean.md Added link to two gists of codebooks. --- getclean.md | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/getclean.md b/getclean.md index 2211ea77..7591c7b6 100644 --- a/getclean.md +++ b/getclean.md @@ -8,4 +8,5 @@ permalink: /getclean/ - [Apples to Oranges Data Organisation Challenge](https://github.com/thoughtfulbloke/faoexample) - [dplyr Video Tutorial](https://www.youtube.com/watch?v=jWjqLW-u3hc) and [R Markdown document](http://rpubs.com/justmarkham/dplyr-tutorial): An [update](http://blog.rstudio.org/2014/01/17/introducing-dplyr/) to the plyr package, useful for subsetting, sorting, summarizing, and merging data using a more intuitive syntax than plyr or base R. - [Downloading files general advice](http://rpubs.com/thoughtfulbloke/downloadtips) - +- [Codebook sample](https://gist.github.com/kirstenfrank/218c36a1938055d0f4e4) +- [Second Codebook sample](https://gist.github.com/kirstenfrank/699abe3e16fd1dc36e5d) From 18a576c2c38d7b5ec32a1c0dd80de60f49f890b7 Mon Sep 17 00:00:00 2001 From: Kirsten Date: Fri, 3 Oct 2014 18:53:29 -0400 Subject: [PATCH 4/5] removed two sample files --- Codebook_dogwalking.md | 18 ------------------ PROPPR_codebook.md | 9 --------- 2 files changed, 27 deletions(-) delete mode 100644 Codebook_dogwalking.md delete mode 100644 PROPPR_codebook.md diff --git a/Codebook_dogwalking.md b/Codebook_dogwalking.md deleted file mode 100644 index a772071f..00000000 --- a/Codebook_dogwalking.md +++ /dev/null @@ -1,18 +0,0 @@ -I feel the urge to write a sample codebook (for David's FAQ). - -It's very similar to a Statistical Analysis Plan, actually. - -Setup, there is a dogwalking business. It wants to analyze its work. - -Raw data is: name of dog, address of owner, time walked, date walked, size of dog (small, medium, or large), health of dog (well or sick) on that date and time, comments, and pay. - -The business wants to assign ID# to the dogs, and codewords to the address to make this data anonymous. There isn't anything to do to the comments--since free text is all over the place. - -Codebook: -The dog's name was transformed into an IDNumber (unique) (1-50), -the address was transformed into a factor, OwnerName (levels Alice, Bob, Charlie, Deborah, Ernest and Fred), -time and date walked were counted to make WalksPerWeek1, WalksPerWeek2, and WalksPerWeek3. Week1 begins at 00:01UTC on July1, 2014, Week2 begins at 00:01UTC on July8, 2014, Week3 begins at 00:01UTC on July15, 2014. -Health was summarized as HealthWeek1, HealthWeek2, and HealthWeek3. It is a factor with two levels, Well and Sick. If the dog was sick at any walk during that week, dog was marked sick, else dog was marked well. -Dog Size was converted into a factor: Large, Medium and Small are the levels. -Comments are dropped. -Pay is transformed into PayWeek1, PayWeek2, PayWeek3, which is a factor that has two levels (Yes, and No) for correct pay paid during that week. diff --git a/PROPPR_codebook.md b/PROPPR_codebook.md deleted file mode 100644 index ff80a9dd..00000000 --- a/PROPPR_codebook.md +++ /dev/null @@ -1,9 +0,0 @@ -Other sample code book. - -This sample is based on the PROPPR (https://clinicaltrials.gov/ct2/show/NCT01545232?term=PROPPr&rank=1) study of blood component ratios in transfusions in trauma patients. Warning: this is complicated. - -Experimental design and background: Blood is given to trauma patients as blood components: red blood cells, plasma and platelets. This experiment was aimed at seeing if increasing the ratio of red blood cells over that in normal whole blood would lower mortality. There were two treatment groups, normal and increased red cells. Patients were screened at 12 participating hospitals. Randomization was done by per mutated random blocks, stratified by site. - -Raw data: randomization assignment, date and time of admission to the hospital, type of injury, fluids given by EMT/paramedic before admission, hospital, time of blood product administration, lot number of blood product, time of hemostasis achieved, other fluids, pre-existing blood clotting diseases, pre-existing blood clotting inhibitors, date and time of discharge from ICU, date and time of ventilator start and stop, date and time of discharge from hospital and date and time of death (if death occurred within 30 days). - -Processed data: Assignment was converted to treatment group (factor variable), type of injury was coded as a factor penetrating (1) or blunt (0), hospital was coded as a factor (1-12), EMT fluids were binned in 500ml bins, pre-existing clotting diseases were coded as a factor (each assigned a number), pre-existing clotting medications were coded as a factor. Other fluids were binned in 1L bins. Date of discharge from ICU was converted to ICU-free days (out of 30), Ventilator times were converted to ventilator-free days (out of 30), date of discharge was converted to hospital-free days (out of 30). Amount of blood products was summed in two groups, amount given until hemostasis and amount given from hemostasis until 24 hours, these were considered as numeric amounts, these were calculated from summing the blood product lot numbers, sorted by time of administration. Date of death was converted to 24-hour mortality (factor: yes=1 no=0) and 30-day mortality (factor: yes=1, no=0). \ No newline at end of file From 5226adfa2b7d92a484f824b0fad53319f2d02ce1 Mon Sep 17 00:00:00 2001 From: Kirsten Date: Tue, 14 Oct 2014 17:46:03 -0400 Subject: [PATCH 5/5] cleaned up merge text --- getclean.md | 4 ---- 1 file changed, 4 deletions(-) diff --git a/getclean.md b/getclean.md index 04aca7bc..380a0515 100644 --- a/getclean.md +++ b/getclean.md @@ -8,10 +8,6 @@ permalink: /getclean/ - [Apples to Oranges Data Organisation Challenge](https://github.com/thoughtfulbloke/faoexample) - [dplyr Video Tutorial](https://www.youtube.com/watch?v=jWjqLW-u3hc) and [R Markdown document](http://rpubs.com/justmarkham/dplyr-tutorial): An [update](http://blog.rstudio.org/2014/01/17/introducing-dplyr/) to the plyr package, useful for subsetting, sorting, summarizing, and merging data using a more intuitive syntax than plyr or base R. - [Downloading files general advice](http://rpubs.com/thoughtfulbloke/downloadtips) -<<<<<<< HEAD - [Codebook sample](https://gist.github.com/kirstenfrank/218c36a1938055d0f4e4) - [Second Codebook sample](https://gist.github.com/kirstenfrank/699abe3e16fd1dc36e5d) -======= - [Query string (and other fields-within-fields) unrolling](http://rpubs.com/schnee/32988) - ->>>>>>> 89a253d761447641cb7a05338057d73a26a5959d