Description Usage Arguments Value Examples

View source: R/calculate_diff_abundance.R

Performs differential abundance calculations and statistical hypothesis tests on data frames with protein, peptide or precursor data. Different methods for statistical testing are available.

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 | ```
calculate_diff_abundance(
data,
sample,
condition,
grouping,
intensity_log2,
missingness = missingness,
comparison = comparison,
mean = NULL,
sd = NULL,
n_samples = NULL,
ref_condition = "all",
filter_NA_missingness = TRUE,
method = c("moderated_t-test", "t-test", "t-test_mean_sd", "proDA"),
p_adj_method = "BH",
retain_columns = NULL
)
``` |

`data` |
a data frame containing at least the input variables that are required for the
selected method. Ideally the output of |

`sample` |
a character column in the |

`condition` |
a character or numeric column in the |

`grouping` |
a character column in the |

`intensity_log2` |
a numeric column in the |

`missingness` |
a character column in the |

`comparison` |
a character column in the |

`mean` |
a numeric column in the |

`sd` |
a numeric column in the |

`n_samples` |
a numeric column in the |

`ref_condition` |
optional, character value providing the condition that is used as a
reference for differential abundance calculation. Only required for |

`filter_NA_missingness` |
a logical value, default is |

`method` |
a character value, specifies the method used for statistical hypothesis testing.
Methods include Welch test ( |

`p_adj_method` |
a character value, specifies the p-value correction method. Possible
methods are c("holm", "hochberg", "hommel", "bonferroni", "BH", "BY", "fdr", "none"). Default
method is |

`retain_columns` |
a vector indicating if certain columns should be retained from the input
data frame. Default is not retaining additional columns |

A data frame that contains differential abundances (`diff`

), p-values (`pval`

)
and adjusted p-values (`adj_pval`

) for each protein, peptide or precursor (depending on
the `grouping`

variable) and the associated treatment/reference pair. Depending on the
method the data frame contains additional columns:

"t-test": The

`std_error`

column contains the standard error of the differential abundances.`n_obs`

contains the number of observations for the specific protein, peptide or precursor (depending on the`grouping`

variable) and the associated treatment/reference pair."t-test_mean_sd": Columns labeled as control refer to the second condition of the comparison pairs. Treated refers to the first condition.

`mean_control`

and`mean_treated`

columns contain the means for the reference and treatment condition, respectively.`sd_control`

and`sd_treated`

columns contain the standard deviations for the reference and treatment condition, respectively.`n_control`

and`n_treated`

columns contain the numbers of samples for the reference and treatment condition, respectively. The`std_error`

column contains the standard error of the differential abundances.`t_statistic`

contains the t_statistic for the t-test."moderated_t-test":

`CI_2.5`

and`CI_97.5`

contain the 2.5% and 97.5% confidence interval borders for differential abundances.`avg_abundance`

contains average abundances for treatment/reference pairs (mean of the two group means).`t_statistic`

contains the t_statistic for the t-test.`B`

The B-statistic is the log-odds that the protein, peptide or precursor (depending on`grouping`

) has a differential abundance between the two groups. Suppose B=1.5. The odds of differential abundance is exp(1.5)=4.48, i.e, about four and a half to one. The probability that there is a differential abundance is 4.48/(1+4.48)=0.82, i.e., the probability is about 82% that this group is differentially abundant. A B-statistic of zero corresponds to a 50-50 chance that the group is differentially abundant.`n_obs`

contains the number of observations for the specific protein, peptide or precursor (depending on the`grouping`

variable) and the associated treatment/reference pair."proDA": The

`std_error`

column contains the standard error of the differential abundances.`avg_abundance`

contains average abundances for treatment/reference pairs (mean of the two group means).`t_statistic`

contains the t_statistic for the t-test.`n_obs`

contains the number of observations for the specific protein, peptide or precursor (depending on the`grouping`

variable) and the associated treatment/reference pair.

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 | ```
set.seed(123) # Makes example reproducible
# Create synthetic data
data <- create_synthetic_data(
n_proteins = 10,
frac_change = 0.5,
n_replicates = 4,
n_conditions = 2,
method = "effect_random",
additional_metadata = FALSE
)
# Assign missingness information
data_missing <- assign_missingness(
data,
sample = sample,
condition = condition,
grouping = peptide,
intensity = peptide_intensity_missing,
ref_condition = "all",
retain_columns = c(protein, change_peptide)
)
# Calculate differential abundances
# Using "moderated_t-test" and "proDA" improves
# true positive recovery progressively
diff <- calculate_diff_abundance(
data = data_missing,
sample = sample,
condition = condition,
grouping = peptide,
intensity_log2 = peptide_intensity_missing,
missingness = missingness,
comparison = comparison,
method = "t-test",
retain_columns = c(protein, change_peptide)
)
head(diff, n = 10)
``` |

Embedding an R snippet on your website

Add the following code to your website.

For more information on customizing the embed code, read Embedding Snippets.