4 Cohort
The focus of clusteR is the cohort file: a data file where information about
your survey participants is stored. This section focuses on updating,
maintaining, and exporting cohort information. For creating a cohort file, see
setup_cohort.
4.1 update_cohort
update_cohort (and its partner, update_confirm) is the workhorse of
clusteR. Using the data source specified in setup_get and the
Editor, update_cohort reviews new information and applies changes to the
cohort file. It also archives the cohort file so you can recover past versions.
The function creates a report displaying changes to be made to the cohort file.
The first section displays changes from your survey data source, and the second
displays changes from Editor. These sections are broken down into subsections
by cohort file field, with each subsection containing a table of changes to be
made (if any). Each table displays the ID, the existing value (.coh), and the
new value (.src/.man). It also stores versions of the cohort file, Editor,
and data source in memory, so update_cohort and update_confirm must be
run in the same R session.
When you have reviewed the Update Errors report and it displays only accurate
information, run update_confirm to write the updates to the cohort file.
Changes can be made in your data source or in Editor.
4.1.1 Editor
Editor is an automatically-generated CSV file located in the Cohort folder.
It will be read by update_cohort, and changes you make in Editor will override
any existing or future updates from your data source. It is recommended to make
changes to the cohort file with Editor rather than in the data source itself to
preserve the integrity of your survey response data. You may also add entries
to the Cohort file using Editor.
clusteR will not overwrite cohort information from your data source with blank cells in Editor. It does not overwrite entire rows, and it never overwrites answers to survey questions that are not stored in your cohort file.
4.1.2 view_progress
You can generate and view a simple report on survey completion, including data
and maps by cluster, by running view_progress. The color scale on the
completion maps is customizable via breaks and colors, which are passed to
ggplot2::scale_fill_stepsn.
4.1.3 view_map
You may need to view the U.S. Census blocks used as clusters for your survey.
view_map displays these blocks on a simple map with county lines. It is
customizable, including a title, subtitle, fill, and background colors. The map
on view_progress is more comprehensive.
4.2 Maintaining the cohort
While update_cohort (and update_confirm) handle most of the cohort
maintenance automatically, you may need to perform additional, more manual
maintenance tasks. These functions assist with manual cohort monitoring and
maintenance.
4.2.1 restore_cohort
Each time the cohort file is updated, an archived version is saved in
Cohort/Archive. You can restore an archived version using restore_cohort. When
a specific archive is not named, the most recent is displayed and can be saved
by typing “save” in the console. You can also name a file in Cohort/Archive to
restore an older version. Unless you type “save” in the console, the displayed
archive version is not stored as the cohort file.
4.2.2 view_cohort
You can view the cohort at any time with view_cohort. It also invisibly
returns the cohort file, so you can use the assignment operator <- to
save it to memory. If you would like to do this silently, use
view_cohort(FALSE).
4.2.3 search_cohort
You can search (more accurately, filter) the cohort with search_cohort, which
displays the results in a viewer and invisibly returns the filtered results.
search_cohort accepts normal dplyr::filter syntax.
4.3 Reaching the cohort
clusteR includes functions to output data about cohort participants. This can often be done in PDF or CSV format for convenience.
make_email, make_mailing, and
make_phone allow you to output cohort information filtered by
individual characteristics. They can be manually filtered by any column in the
cohort file or .status can be used to include only those who have completed
the survey (status “Completed - enrolled”, “Completed - re-enroll”, or
“Completed - unenroll”), whose responses are pending (status “Enrolled”,
“Re-enroll”, or “Not enrolled”), or who are enrolled in the cohort (status
“Enrolled” or “Completed - enrolled”).
make_groups, with make_walklist, outputs
cohort information filtered by cluster characteristics. The filt argument
allows users to filter clusters by a summary table of cluster characteristics.
-
n, the total number of cohort members in each cluster -
completed,completed_pct: The number or percent (out of 100) of participants with any Completed status -
pending,pending_pct: The number or percent (out of 100) of participants with an Enrolled, Re-enroll, or Not enrolled status -
enrolled,enrolled_pct: The number or percent (out of 100) of participants with a Completed - enrolled or Enrolled status
4.3.1 make_email
make_email outputs each participant’s ID, name, and email address. See
Reaching the cohort for filtering options.
4.3.2 make_mailing
make_mailing outputs each participant’s ID, name, mailing address, city,
state, and ZIP code. See Reaching the cohort for filtering options.
4.3.3 make_phone
make_phone outputs each participant’s ID, name, and phone number. See
Reaching the cohort for filtering options.
4.3.4 make_groups
make_groups filters clusters by a summary table of cluster characteristics
(using the filt argument), groups clusters, and creates Assignments.csv in the
Contacts folder. It also outputs a plot of these automatic groups. make_groups
should be followed by make_walklist
By default, make_groups uses the internal function mult_kmeans, which uses
a kmeans algorithm to attempt to group clusters into approximately even groups
by their proximity. You can specify the number of groups to create with k, the
number of times to run the kmeans algorithm with runs, and the number of
iterations (times centers are calculated and groups are assigned) with
iter.max.
clusteR is highly extensible. You can create and use other
grouping functions using the fn argument and the dynamic dots (...) to
specify any necessary arguments beyond the defaults. make_groups requires the
function to output a dataframe with the following columns:
-
cluster, matching the cohort file cluster identifier -
geoid, matching the Census block GEOID -
latandlong, the latitude and longtitude of a point within the cluster -
ur, showing whether the block is urban or rural -
geometry, an sf geometry field to draw the block shape
4.3.5 make_walkmap
make_walkmap rebuilds a very similar map to make_groups, but
can tweak the groups included and save the map as a PNG file.
The map can be saved as a PNG by specifying a path as save, including the
.png extension.
You can customize the groups included on the map you produce by passing them
as a vector to groups.
As with view_map, you can specify the title, subtitle, and
background colors (though not the fill color, since that is automatically
determined by group).
4.3.6 make_walklist
make_walklist, unlike make_mailing, exports addresses in a
formatted PDF intended to be used in door-to-door outreach. You can specify the
cohort file columns to be included in the output by changing cols.
The default walk list template is very simple. custom_walklist creates a
template in the Scripts folder that you can modify and use by specifying its
path as template. If there are additional objects needed for your template,
they can be provided to make_walklist as named arguments.
4.4 export_cohort
Cohort information is stored as a plain text file. The data can be exported to
a CSV file with export_cohort.
By default, the exported file will be named after name specified during
setup. You can alter it by passing a string to .name.
By default, the status of the cohort file is exported as-is. If .status is set
to TRUE, participant status will be updated to remove information about survey
completion and participants with an unenroll or do not contact flag will be
removed.
Setting .removed to TRUE will create removed.csv, which lists all participants
removed by setting .status to TRUE.
All other arguments passed to export_cohort are passed on to dplyr::mutate.
To remove columns from the output, you can set them to NULL.