Normalize data and filters it by prevalence level
norm_data.RdNormalize data and filters it by prevalence level
Usage
norm_data(
data_with_annotation,
col_module_id,
prev_list = c(0.3),
annotation_level,
data_type = "shotgun"
)Arguments
- data_with_annotation
Dataframe. The abundance table merged with the module names. Required format: modules are the rows and samples are the columns. The first column must be the modules name (e.g. species), the second is the module ID (e.g. msp), and each subsequent column is a sample
- col_module_id
String. The name of the column with the module names in annotation_table
- prev_list
Vector of numeric. The prevalences to be studied. Required format is decimal: 0.20 for 20% of prevalence
- annotation_level
String. Annotation level to aggregate the taxa
- data_type
String. Enables the treatment of 16S data with "16S", default value is "shotgun"
Value
List of dataframes. Each element of the list corresponds to a normalized 'data_with_annotation', by level of prevalence
Examples
tiny_data <- data.frame(
species = c("One bacteria", "One bacterium L", "One bacterium G", "Two bact"),
msp_name = c("msp_1", "msp_2", "msp_3", "msp_4"),
SAMPLE1 = c(0, 1.328425e-06, 0, 1.527688e-07),
SAMPLE2 = c(1.251707e-07, 1.251707e-07, 3.985320e-07, 0),
SAMPLE3 = c(0, 0, 4.926046e-09, 5.626392e-06),
SAMPLE4 = c(0, 0, 2.98320e-05, 0)
)
tiny_normed <- norm_data(tiny_data, col_module_id = "msp_name", annotation_level = "species", prev_list = c(0.20, 0.30))