Screening call

Collection protocol

Details about the data collection protocol for participant screening and recruiting can be found on the PLAY Project website.

Retrieve from KoBoToolbox (KBT)

We make use of the targets package for downloading data files from KoBoToolbox and for saving local XLSX and CSV copies. This allows us to download and process those files on a regular basis.

Setup

Load functions needed to download KBT screening/demographic questionnaire files.

fl <- list.files(file.path(here::here(), "R"), "^kobo_|^file_", full.names = TRUE)
purrr::walk(fl, source)

Download

We have two targets specified in _targets.R that handle the regular downloading of screening data files.

First, we generate a data frame of KoBoToolbox forms that contain the screening (“Demographic”) data. Here is the target for that process:

# Not evaluated
  tar_target(
    kb_screen_df,
    kobo_list_data_filtered("[Dd]emographic"),
    cue = tarchetypes::tar_cue_age(
      name = kb_screen,
      age = as.difftime(update_interval, units = update_interval_units)
    )
  ),

Then we download and save the raw XLSX files to ../data/xlsx/screening using kobo_retrieve_save_many_xlsx(kb_screen_df, save_dir = "../data/xlsx/screening"). Finally, we convert the XLSX files to CSVs via file_load_xlsx_save_many_csv("../data/xlsx/screening", "../data/csv/screening", "Demographic"). The latter two steps are handled by the wrapper function screen_download_convert(kb_screen_df, "data/xlsx/screening", "data/csv/screening"). Here is the accompanying target:

# Download screening/demographic survey
  tar_target(
    screen_download,
    screen_download_convert(kb_screen_df, "data/xlsx/screening", "data/csv/screening"),
    cue = tarchetypes::tar_cue_age(
      name = screen_df,
      age = as.difftime(update_interval, units = update_interval_units)
    )
  ),

0.1.0.1 Survey questions

The form for the survey questions can be downloaded from the following URL:

https://kf.kobotoolbox.org/api/v2/assets/aGLEqT7eRBhuPgizCQBeqA.xls

Clean

As of 2023-09-22, the cleaning and merging process for the screening/demographic data is handled here, in this document, not as a target.

Setup

A set of cleaning functions have the screen_ prefix.

purrr::walk(list.files(file.path(here::here(), "R"), 
                    "^screen_", full.names = TRUE), source)

There are three CSV files to clean:

(fn <- list.files("../data/csv/screening", "[Dd]emographic", full.names = TRUE))
## [1] "../data/csv/screening/275882_PLAY_Demographic_Questionnaire.csv"        
## [2] "../data/csv/screening/334134_PLAY_Demographic_Questionnaire_Spanish.csv"
## [3] "../data/csv/screening/359546_PLAY_Demographic_Questionnaire.csv"

We clean them separately, as needed, then merge them.

Clean variable names

df1 <-
  readr::read_csv(fn[1],
                  col_types = readr::cols(.default = 'c'),
                  show_col_types = FALSE)
df2 <-
  readr::read_csv(fn[2],
                  col_types = readr::cols(.default = 'c'),
                  show_col_types = FALSE)
df3 <-
  readr::read_csv(fn[3],
                  col_types = readr::cols(.default = 'c'),
                  show_col_types = FALSE)
head(names(df1), 15)
##  [1] "start"                                                                        
##  [2] "end"                                                                          
##  [3] "c_today"                                                                      
##  [4] "play_phone_questionnaire/update_date"                                         
##  [5] "play_phone_questionnaire/group_contact_info/instructions_contactinfo"         
##  [6] "play_phone_questionnaire/group_contact_info/parent_phone"                     
##  [7] "play_phone_questionnaire/group_contact_info/parent_email"                     
##  [8] "play_phone_questionnaire/group_contact_info/group_address/note_parent_address"
##  [9] "play_phone_questionnaire/group_contact_info/group_address/parent_address_1"   
## [10] "play_phone_questionnaire/group_contact_info/group_address/parent_address_2"   
## [11] "play_phone_questionnaire/group_contact_info/group_address/city"               
## [12] "play_phone_questionnaire/group_contact_info/group_address/state"              
## [13] "play_phone_questionnaire/group_contact_info/acknowledge_contact_info"         
## [14] "play_phone_questionnaire/instructions_guid"                                   
## [15] "play_phone_questionnaire/child_first_name"
head(names(df2), 15)
##  [1] "start"                                                                                                                                                                                                                                    
##  [2] "end"                                                                                                                                                                                                                                      
##  [3] "c_today"                                                                                                                                                                                                                                  
##  [4] "play_demo_questionnaire/update_date"                                                                                                                                                                                                      
##  [5] "play_demo_questionnaire/NOTA_El_cuestionario_demogr_fico_debe_ser_completado_por_el_investigador_que_va_a_ir_a_la_visita_al_hogar_con_la_madre_por_tel_fono_La_madre_deber_a_ser_la_madre_que_va_a_participar_en_el_estudio_con_su_ni_o_a"
##  [6] "play_demo_questionnaire/group_siteinfo/site_id"                                                                                                                                                                                           
##  [7] "play_demo_questionnaire/group_siteinfo/subject_number"                                                                                                                                                                                    
##  [8] "play_demo_questionnaire/exp_name"                                                                                                                                                                                                         
##  [9] "play_demo_questionnaire/group_contact_info/instructions_contactinfo"                                                                                                                                                                      
## [10] "play_demo_questionnaire/group_contact_info/parent_phone"                                                                                                                                                                                  
## [11] "play_demo_questionnaire/group_contact_info/parent_email"                                                                                                                                                                                  
## [12] "play_demo_questionnaire/group_contact_info/group_address/note_parent_address"                                                                                                                                                             
## [13] "play_demo_questionnaire/group_contact_info/group_address/parent_address_1"                                                                                                                                                                
## [14] "play_demo_questionnaire/group_contact_info/group_address/parent_address_2"                                                                                                                                                                
## [15] "play_demo_questionnaire/group_contact_info/group_address/city"
head(names(df3), 15)
##  [1] "start"                                                                       
##  [2] "end"                                                                         
##  [3] "c_today"                                                                     
##  [4] "play_demo_questionnaire/update_date"                                         
##  [5] "play_demo_questionnaire/note_exp"                                            
##  [6] "play_demo_questionnaire/group_siteinfo/site_id"                              
##  [7] "play_demo_questionnaire/group_siteinfo/subject_number"                       
##  [8] "play_demo_questionnaire/group_siteinfo/exp_name"                             
##  [9] "play_demo_questionnaire/group_contact_info/instructions_contactinfo"         
## [10] "play_demo_questionnaire/group_contact_info/parent_phone"                     
## [11] "play_demo_questionnaire/group_contact_info/parent_email"                     
## [12] "play_demo_questionnaire/group_contact_info/group_address/note_parent_address"
## [13] "play_demo_questionnaire/group_contact_info/group_address/parent_address_1"   
## [14] "play_demo_questionnaire/group_contact_info/group_address/parent_address_2"   
## [15] "play_demo_questionnaire/group_contact_info/group_address/city"

There are a separate set of variable-by-variable cleaning functions in R/screen_clean_utils.R.

We remove the unneeded ’play_demo_*’ and ‘play_phone_questionnaire_’ variable headers using screen_remove_variable_headers().

We remove fields that contain administrative metadata with screen_remove_metadata_fields().

We remove fields used only by staff in uploading data to Databrary using screen_remove_databrary_fields().

We have name and address information in the screening data (e.g., ‘..parent_phone’, ‘..parent_email’, etc.)

In a future workflow, we will add Census FIPS IDs for the State and Country before removing the address information.

The Census queries stopped working around 2023-06-16.

For now, we remove identifiers without querying the Census.

We remove identifiable information using screen_remove_identifiers().

Then, we use dplyr::full_join() to combine the set of individually cleaned data files. The screen_clean_raw_csv() function combines the previous screen_remove* functions. The screen_clean_raw_join() function cleans each CSV then joins them.

(scr_df <- screen_clean_raw_join())
## Cleaning '/Users/rog1/rrr/KoBoToolbox/data/csv/screening/275882_PLAY_Demographic_Questionnaire.csv'.
## Removed n = 31 of 168 columns.
## Cleaning '/Users/rog1/rrr/KoBoToolbox/data/csv/screening/334134_PLAY_Demographic_Questionnaire_Spanish.csv'.
## Removed n = 47 of 236 columns.
## Cleaning '/Users/rog1/rrr/KoBoToolbox/data/csv/screening/359546_PLAY_Demographic_Questionnaire.csv'.
## Removed n = 47 of 239 columns.
## Joining with `by = join_by(start, end,
## c_today, update_date, site_id,
## subject_number, state, check_childage,
## check_childage_weeks, child_sex, day,
## day2, day1, play_id, language_spoken_mom,
## `language_spoken_mom/english`,
## `language_spoken_mom/spanish`,
## `language_spoken_mom/other`,
## language_spoken_mom_other,
## language_spoken_child,
## `language_spoken_child/english`,
## `language_spoken_child/spanish`,
## `language_spoken_child/other`,
## language_spoken_child_other,
## language_spoken_house,
## `language_spoken_house/english`,
## `language_spoken_house/spanish`,
## `language_spoken_house/other`,
## language_spoken_house_other,
## `child_information/child_bornonduedate`,
## `child_information/child_onterm`,
## `child_information/child_duedate`,
## `child_information/child_birthage`,
## `child_information/child_weight_pounds`,
## `child_information/child_weight_ounces`,
## `child_information/child_birth_complications`,
## `child_information/specify_birth_complications`,
## `child_information/hearing_disabilities`,
## `child_information/specify_hearing`,
## `child_information/vision_disabilities`,
## `child_information/specify_vision`,
## `child_information/major_illnesses_injuries`,
## `child_information/specify_illnesses_injuries`,
## `child_information/other_developmentaldelays`,
## `child_information/specify_developmentaldelays`,
## `child_information/child_race`,
## `child_information/child_sleep_time`,
## `child_information/child_wake_time`,
## `child_information/child_nap_hours`,
## `child_information/child_sleep_location`,
## `child_information/specify_child_sleep_location`,
## `child_information/indicate_child`,
## `child_information/indicate_child_2`,
## `group_family_structure/only_child`,
## `group_family_structure/specify_onlychild`,
## `group_family_structure/household_members`,
## `group_family_structure/household_members/father_biological`,
## `group_family_structure/household_members/male_partner`,
## `group_family_structure/household_members/mother_biological`,
## `group_family_structure/household_members/female_partner`,
## `group_family_structure/household_members/other_partner`,
## `group_family_structure/household_members/sibling_biological`,
## `group_family_structure/household_members/sibling_nonbiological`,
## `group_family_structure/household_members/grandmother`,
## `group_family_structure/household_members/grandfather`,
## `group_family_structure/household_members/great_grandmother`,
## `group_family_structure/household_members/great_grandfather`,
## `group_family_structure/household_members/aunt`,
## `group_family_structure/household_members/uncle`,
## `group_family_structure/household_members/cousin`,
## `group_family_structure/household_members/relative`,
## `group_family_structure/household_members/non_relative`,
## `group_family_structure/household_members/none`,
## `group_family_structure/other_relatives`,
## `group_family_structure/other_non_relatives`,
## `group_family_structure/indicate_familystructure`,
## `group_mominfo/mom_biological`,
## `group_mominfo/mom_relation`,
## `group_mominfo/mom_datecare`,
## `group_mominfo/mom_childbirth_age`,
## `group_mominfo/check_mom_childbirth_age`,
## `group_mominfo/mom_race`,
## `group_mominfo/mom_birth_country`,
## `group_mominfo/specify_mom_birth_country`,
## `group_mominfo/mom_education`,
## `group_mominfo/mom_employment`,
## `group_mominfo/mom_occupation`,
## `group_mominfo/mom_jobs_number`,
## `group_mominfo/mom_training`,
## `group_mominfo/mom_condition`,
## `group_mominfo/specify_mom_condition`,
## `group_mominfo/indicate_mom`,
## `group_biodad/biodad_childbirth_age`,
## `group_biodad/check_biodad_childbirth_age`,
## `group_biodad/biodad_race`,
## `group_biodad/indicate_biodad`,
## `group_biomom/biomom_childbirth_age`,
## `group_biomom/check_biomom_childbirth_age`,
## `group_biomom/biomom_race`,
## `group_biomom/indicate_biomom`,
## `group_nonbiopartner/nonbiopartner_race`,
## `group_nonbiopartner/indicate_nonbiopartner`,
## `group_genpartner_info/partner_education`,
## `group_genpartner_info/partner_employment`,
## `group_genpartner_info/partner_occupation`,
## `group_genpartner_info/partner_jobs_number`,
## `group_genpartner_info/partner_program`,
## `group_genpartner_info/indicate_genpartner`,
## `group_child_care_arrangements/childcare_types`,
## `group_child_care_arrangements/childcare_types/nanny_home`,
## `group_child_care_arrangements/childcare_types/nanny_nothome`,
## `group_child_care_arrangements/childcare_types/relative`,
## `group_child_care_arrangements/childcare_types/childcare`,
## `group_child_care_arrangements/childcare_types/none`,
## `group_child_care_arrangements/childcare_location`,
## `group_child_care_arrangements/childcare_hours`,
## `group_child_care_arrangements/childcare_number`,
## `group_child_care_arrangements/childcare_age`,
## `group_child_care_arrangements/childcare_language`,
## `group_inclusioncheck/inclusionreason`,
## `group_inclusioncheck/indicate_inclusion`,
## indicate_demoquestionnaire,
## indicate_databrary, `_id`)`
## Joining with `by = join_by(start, end,
## c_today, update_date, site_id,
## subject_number, state, check_childage,
## check_childage_weeks, child_sex, day,
## day2, day1, language_spoken_child,
## `language_spoken_child/english`,
## `language_spoken_child/spanish`,
## `language_spoken_child/other`,
## language_spoken_child_other,
## `child_information/child_bornonduedate`,
## `child_information/child_onterm`,
## `child_information/child_duedate`,
## `child_information/child_birthage`,
## `child_information/child_weight_pounds`,
## `child_information/child_weight_ounces`,
## `child_information/child_birth_complications`,
## `child_information/specify_birth_complications`,
## `child_information/hearing_disabilities`,
## `child_information/specify_hearing`,
## `child_information/vision_disabilities`,
## `child_information/specify_vision`,
## `child_information/major_illnesses_injuries`,
## `child_information/specify_illnesses_injuries`,
## `child_information/child_sleep_time`,
## `child_information/child_wake_time`,
## `child_information/child_nap_hours`,
## `child_information/child_sleep_location`,
## `child_information/specify_child_sleep_location`,
## `child_information/indicate_child`,
## `child_information/indicate_child_2`,
## `group_family_structure/household_members`,
## `group_family_structure/household_members/male_partner`,
## `group_family_structure/household_members/female_partner`,
## `group_family_structure/household_members/other_partner`,
## `group_family_structure/household_members/grandmother`,
## `group_family_structure/household_members/grandfather`,
## `group_family_structure/household_members/great_grandmother`,
## `group_family_structure/household_members/great_grandfather`,
## `group_family_structure/household_members/aunt`,
## `group_family_structure/household_members/uncle`,
## `group_family_structure/household_members/cousin`,
## `group_family_structure/household_members/relative`,
## `group_family_structure/household_members/non_relative`,
## `group_family_structure/other_relatives`,
## `group_family_structure/other_non_relatives`,
## `group_family_structure/indicate_familystructure`,
## `group_child_care_arrangements/childcare_types`,
## `group_child_care_arrangements/childcare_types/relative`,
## `group_child_care_arrangements/childcare_types/childcare`,
## `group_child_care_arrangements/childcare_types/none`,
## `group_child_care_arrangements/childcare_hours`,
## `group_child_care_arrangements/childcare_number`,
## `group_child_care_arrangements/childcare_age`,
## `group_child_care_arrangements/childcare_language`,
## `_id`)`
## # A tibble: 781 × 162
##    start  end   c_today update_date site_id
##    <chr>  <chr> <chr>   <chr>       <chr>  
##  1 2020-… 2020… 2020-0… <NA>        NYUNI  
##  2 2020-… 2020… 2020-0… <NA>        NYUNI  
##  3 2019-… 2020… 2020-0… <NA>        VCOMU  
##  4 2020-… 2020… 2020-0… <NA>        GEORG  
##  5 2020-… 2020… 2020-0… <NA>        NYUNI  
##  6 2020-… 2020… 2020-0… <NA>        NYUNI  
##  7 2020-… 2020… 2020-0… <NA>        GEORG  
##  8 2020-… 2020… 2020-0… <NA>        NYUNI  
##  9 2020-… 2020… 2020-0… <NA>        NYUNI  
## 10 2020-… 2020… 2020-0… <NA>        VCOMU  
## # ℹ 771 more rows
## # ℹ 157 more variables:
## #   subject_number <chr>, state <chr>,
## #   check_childage <chr>,
## #   check_childage_weeks <chr>,
## #   child_sex <chr>, day <chr>,
## #   day2 <chr>, day1 <chr>, …

Clean individual fields

Now, we can proceed to clean-up the merged data frame. The sequence of functions called below cleans ‘construct-specific’ variables as indicated by the function names.

scr_df <- scr_df |> 
    screen_clean_child_info() |>
    screen_clean_lang_info() |>
    screen_clean_mom_info() |>
    screen_clean_biodad_father_info() |>
    screen_clean_childcare_info() |>
    screen_clean_family_structure() |>
    screen_clean_play_id() |>
    screen_remove_selected_cols() |>
    screen_select_reorder_cols()

For convenience, we package this sequence in its own function, screen_clean_fields().

Note that all of the variables are considered character strings. The tidyverse suite does a great job of guessing what variables are what, but sometimes it guesses wrongly. So, in preliminary stages, it has proved easier to make everything a character string.

scr_df <- screen_clean_fields(scr_df)
str(scr_df)
## tibble [781 × 51] (S3: tbl_df/tbl/data.frame)
##  $ submit_date                       : chr [1:781] "2020-02-27T08:24:10.551-05:00" "2020-02-27T08:43:34.275-05:00" "2019-12-18T16:37:17.554-05:00" "2020-01-01T12:54:04.550-05:00" ...
##  $ site_id                           : chr [1:781] "NYUNI" "NYUNI" "VCOMU" "GEORG" ...
##  $ participant_ID                    : chr [1:781] "230" "229" "002" "010" ...
##  $ play_id                           : chr [1:781] NA NA NA NA ...
##  $ child_age_mos                     : chr [1:781] "32.677240850317695" "32.38138286215201" "26.725838264299803" "12.754766600920446" ...
##  $ child_sex                         : chr [1:781] "male" "female" "female" "male" ...
##  $ child_bornonduedate               : chr [1:781] "yes" "yes" "yes" "yes" ...
##  $ child_onterm                      : chr [1:781] NA NA NA NA ...
##  $ child_birthage                    : chr [1:781] "-7" "-10" "0" "1" ...
##  $ child_weight_pounds               : chr [1:781] "8" "7" "6" "7" ...
##  $ child_weight_ounces               : chr [1:781] "12" "10" "12" "15" ...
##  $ child_birth_complications         : chr [1:781] "no" "no" "no" "no" ...
##  $ child_birth_complications_specify : chr [1:781] NA NA NA NA ...
##  $ child_hearing_disabilities        : chr [1:781] "yes" "no" "no" "no" ...
##  $ child_hearing_disabilities_specify: chr [1:781] "Temporary hearing loss so tubes in his ears, one still remaining" NA NA NA ...
##  $ child_vision_disabilities         : chr [1:781] "no" "no" "no" "no" ...
##  $ child_vision_disabilities_specify : chr [1:781] NA NA NA NA ...
##  $ child_major_illnesses_injuries    : chr [1:781] "yes" "no" "no" "no" ...
##  $ child_illnesses_injuries_specify  : chr [1:781] "Pneumonia at 5 months" "" "" "" ...
##  $ child_developmentaldelays         : chr [1:781] NA NA NA NA ...
##  $ child_developmentaldelays_specify : chr [1:781] NA NA NA NA ...
##  $ child_sleep_time                  : chr [1:781] "20:15:00.000-05:00" "20:15:00.000-05:00" "19:00:00.000-05:00" "19:00:00.000-05:00" ...
##  $ child_wake_time                   : chr [1:781] "07:15:00.000-05:00" "06:45:00.000-05:00" "07:30:00.000-05:00" "06:15:00.000-05:00" ...
##  $ child_nap_hours                   : chr [1:781] "2" "3" "1.5" "2" ...
##  $ child_sleep_location              : chr [1:781] "crib_separate" "crib_separate" "crib_separate" "crib_separate" ...
##  $ mom_bio                           : chr [1:781] "yes" "yes" "yes" "yes" ...
##  $ mom_childbirth_age                : chr [1:781] "33.41" "38.84" "32.2" "30.38" ...
##  $ mom_race                          : chr [1:781] "white" "white" "white" "white" ...
##  $ mom_birth_country                 : chr [1:781] "unitedstates" "unitedstates" "unitedstates" "unitedstates" ...
##  $ mom_birth_country_specify         : chr [1:781] "" "" "" "" ...
##  $ mom_education                     : chr [1:781] "masters" "doctorate" "doctorate" "masters" ...
##  $ mom_employment                    : chr [1:781] "full_time" "full_time" "full_time" "full_time" ...
##  $ mom_occupation                    : chr [1:781] "Architect" "Spanish teacher" "Nurse instructor" "School Psychologist" ...
##  $ mom_jobs_number                   : chr [1:781] "1" "1" "1" "1" ...
##  $ mom_training                      : chr [1:781] "no" "no" "no" "no" ...
##  $ biodad_childbirth_age             : chr [1:781] "33.85_NA" "30.42_NA" "37.32_NA" "30.53_NA" ...
##  $ biodad_race                       : chr [1:781] "white_NA" "white_NA" "NA_NA" "NA_NA" ...
##  $ language_spoken_mom               : chr [1:781] "english" "english spanish" "english" "english" ...
##  $ language_spoken_mom_comments      : chr [1:781] NA NA NA NA ...
##  $ language_spoken_child             : chr [1:781] "english" "spanish" "english" "english" ...
##  $ language_spoken_home_comments     : chr [1:781] "FALSE_47706996" "TRUE_47707423" "FALSE_43397823" "FALSE_44074192" ...
##  $ language_spoken_child_comments    : chr [1:781] NA NA NA NA ...
##  $ language_spoken_home              : chr [1:781] "english" "english spanish" "english" "english" ...
##  $ language_spoken_house_other       : chr [1:781] NA NA NA NA ...
##  $ language_spoken_home_other        : chr [1:781] NA NA NA NA ...
##  $ childcare_types                   : chr [1:781] NA NA NA NA ...
##  $ childcare_location                : chr [1:781] NA NA NA NA ...
##  $ childcare_hours                   : chr [1:781] NA NA NA NA ...
##  $ childcare_number                  : chr [1:781] NA NA NA NA ...
##  $ childcare_age                     : chr [1:781] NA NA NA NA ...
##  $ childcare_language                : chr [1:781] NA NA NA NA ...

There is more work to do, but we have a version worth exporting.

Merge

Let’s add the Databrary volume ID info.

scr_df <- scr_df |>
  screen_add_db_vol_id()

Then filter out rows that do not have valid volume IDs.

valid_db_vol <- !is.na(scr_df$vol_id)

scr_df <- scr_df[valid_db_vol,]

There are \(n=\) 772 valid Databrary volume IDs out of a total of \(n=\) 781 screening records.

Next, we add a play_status field based on the group-name field from Databrary. We use group-name for indicating “Gold”, “Silver”, or “Not run.”

Two targets in _targets.R are relevant for this operation:

  tar_target(
    play_vols_df,
    readr::read_csv("data/csv/_meta/play_site_vols.csv",
                    show_col_types = FALSE)
  ),
  tar_target(
    databrary_session_csvs,
    purrr::map(play_vols_df$site_id, databrary_get_save_session_csv),
    cue = tarchetypes::tar_cue_age(
      name = databrary_session_csvs,
      age = as.difftime(update_interval, units = update_interval_units)
    )
  )

These targets generate site-specific CSVs in data/csv/site_sessions based on the database of PLAY sites contained in data/csv/_meta/play_site_vols.csv. We load these CSVs into a single data frame.

Load site session data

session_fns <-
  list.files("../data/csv/site_sessions", "\\.csv$", full.names = TRUE)

df_sessions <-
  purrr::map(
    session_fns,
    readr::read_csv,
    col_types = readr::cols(.default = 'c'),
    show_col_types = FALSE
  ) |>
  purrr::list_rbind()

The group_name variable contains status information about the sessions.

xtabs(~ group_name, data=df_sessions)
## group_name
##    No visit    No_visit    No_Visit 
##           1          25          68 
##   PLAY_Gold PLAY_Silver 
##         482          96

We note that there are three different versions of no visit: “No visit”, “No_visit”, and “No_Visit”. In addition, there are \(n=\) 70 sessions with NA in the group_name. These could be sessions that are still in QA or which are scheduled, or there could be some other anomaly. Here, we want to select only those sessions that occurred and which have passed QA–those sessions for which group_name is either ‘PLAY_Gold’ or ‘PLAY_Silver’.

df_sessions <- df_sessions |>
  dplyr::filter(stringr::str_detect(group_name, "PLAY_"))

Sharing by session status

Here is information about the sharing status.

xtabs(~ group_name + session_release, df_sessions)
##              session_release
## group_name    EXCERPTS PRIVATE SHARED
##   PLAY_Gold        369       0    113
##   PLAY_Silver       71       3     22

There was one session marked PRIVATE.

df_sessions |>
  dplyr::filter(session_release == "PRIVATE") |>
  dplyr::select(vol_id, session_id, session_name, group_name)
## # A tibble: 3 × 4
##   vol_id session_id session_name group_name
##   <chr>  <chr>      <chr>        <chr>     
## 1 1656   70116      PLAY_ASUNI_… PLAY_Silv…
## 2 979    66932      PLAY_PRINU_… PLAY_Silv…
## 3 1397   70234      PLAY_UHOUS_… PLAY_Silv…

Now, we join the screening data with the Databrary session data.

screen_datab_df <- dplyr::left_join(df_sessions, scr_df, by = c('vol_id', 'participant_ID'))

Let’s do some additional cleaning of redundant column names, e.g., exclusion.

screen_datab_df <- screen_datab_df |>
  tidyr::unite(exclusion_reason, c("exclusion1_reason", "exclusion2_reason", "exclusion_reason")) |>
  dplyr::mutate(exclusion_reason = stringr::str_remove_all(exclusion_reason, "NA|_"))

Export cleaned file

We date-stamp the exported file so we can monitor progress as this workflow develops.

sfn <- paste0("PLAY-screening-datab-", Sys.Date(), ".csv")
readr::write_csv(screen_datab_df, file.path(here::here(), "data/csv/screening/agg/", sfn))

We also save a copy with “latest”.

sfn <- "PLAY-screening-datab-latest.csv"
readr::write_csv(screen_datab_df, file.path(here::here(), "data/csv/screening/agg/", sfn))