List the simulated count tables by level of prevalence
simulate_by_prevalence.RdList the simulated count tables by level of prevalence
Usage
simulate_by_prevalence(
data_with_annotation,
prev_list,
graph_file = NULL,
col_module_id,
annotation_level,
sample_size = 500,
seed,
verbatim = FALSE,
data_type = "shotgun"
)Arguments
- data_with_annotation
Dataframe. The abundance table merged with the module names. Required format: modules are the rows and samples are the columns. The first column must be the modules name (e.g. species), the second is the module ID (e.g. msp), and each subsequent column is a sample
- prev_list
List of numeric. The prevalences to be studied. Required format is decimal: 0.20 for 20% of prevalence.
- graph_file
Dataframe. The object generated by graph_step() function
- col_module_id
String. The name of the column with the module names in data_with_annotation
- annotation_level
String. The name of the column with the level to be studied. Examples: species, genus, level_1
- sample_size
Numeric. The size to be considerated, the value of 500 is recommended
- seed
Numeric. The seed number, ensuring reproducibility
- verbatim
Boolean. Controls verbosity
- data_type
String. Enables the treatment of 16S data with "16S", default value is "shotgun"
Value
List of dataframes. Each element of the list corresponds to a level of prevalence and is a simulated abundance table
Examples
tiny_data <- data.frame(
species = c("One bacteria", "One bacterium L", "One bacterium G", "Two bact"),
msp_name = c("msp_1", "msp_2", "msp_3", "msp_4"),
SAMPLE1 = c(0, 1.328425e-06, 0, 1.527688e-07),
SAMPLE2 = c(1.251707e-07, 1.251707e-07, 3.985320e-07, 0),
SAMPLE3 = c(0, 0, 4.926046e-09, 5.626392e-06),
SAMPLE4 = c(0, 0, 2.98320e-05, 0)
)
tiny_graph <- graph_step(tiny_data, col_module_id = "msp_name", annotation_level = "species", seed = 20242025) %>% suppressWarnings()
tiny_sims <- simulate_by_prevalence(tiny_data, prev_list = c(0.20, 0.30), graph_file = tiny_graph, col_module_id = "msp_name", annotation_level = "species", sample_size = 500, seed = 20242025)