Title: | Known Population Median Test |
---|---|
Description: | Functions that implement the known population median test. |
Authors: | Matthew M Parks <[email protected]> |
Maintainer: | Matthew M Parks <[email protected]> |
License: | MIT + file LICENSE |
Version: | 0.1.0 |
Built: | 2024-11-02 05:57:58 UTC |
Source: | https://github.com/cran/kpmt |
Various quantitative features of the transcripts of the set of all 19,962 protein coding genes in the annotated human genome, according to Ensembl 84. Features include length of the Coding Sequence (CDS), lengths of the 5' and 3' UTRs, and the relative usage for each amino acid.
genefeat
genefeat
matrix.
Yates, et. al. (2016) Nucleic Acids Res. 2016 44 Database issue:D710-6. "Ensembl 2016". (doi)
The names of all 868 protein coding genes annotated as members of Gene Ontology (GO) term GO:0007186 (G-protein coupled receptor signaling pathway), according to the Gene Ontology consortium.
GO0007186
GO0007186
character vector.
Ashburner et al. Gene ontology: tool for the unification of biology (2000) Nat Genet 25(1):25-9. (doi:10.1038/75556) The Gene Ontology Consortium. Gene Ontology Consortium: going forward. (2015) Nucl Acids Res 43 Database issue (doi)
Performs the known population median test.
kpmt(pop, obs = NULL, med = NULL, size = NULL, tail = "two-sided", verbose = FALSE)
kpmt(pop, obs = NULL, med = NULL, size = NULL, tail = "two-sided", verbose = FALSE)
pop |
[data frame, matrix, vector] numeric values for the whole population. If a data frame or matrix is given, it should have format:
For a data frame or matrix, the test will be performed on each column, separately. If a named vector is given, it should have format:
|
obs |
[character vector or named list of character vectors] a character vector of population member names, or a named list of character vectors of population member names.
|
med |
[number or vector] pre-computed minimal medians of pop. |
size |
[integer] size of the set which generated the observed median. If obs is a sample, i.e. contains population member names, then size must be NULL. |
tail |
["two-sided", "lower", "upper"] if NULL, then the minimum of lower and upper will be reported. |
verbose |
display extra messages for tracking execution. |
data frame with columns:
"name" a column from pop
"median.sample" min median of the sample
"median.all" min median of the whole population
"median.background" min median of the non-sampled members.
"logp" log of p.value if sample median is different from all
"p.value" p.value if sample median is different from all
"FDR" only if > 30 features, i.e columns of pop
each row is a different population feature, i.e. column of pop.
data(genefeat) data(GO0007186) res <- kpmt( pop = genefeat , obs = GO0007186 )
data(genefeat) data(GO0007186) res <- kpmt( pop = genefeat , obs = GO0007186 )
Computes the log of the Stirling approximation of n!.
logStirling(n)
logStirling(n)
n |
integer or vector of integers. |
Stirling approximation of log(n!). If n <= 14, then computes log(n!) directly, i.e. no Stirling approximation.
Computes the minimal median of a vector or matrix.
minMedian(x)
minMedian(x)
x |
a vector or matrix of real numbers. if a matrix, then the minimal median will be computed for each column. |
the minimal median of x.
minMedian(1:6) # returns 3
minMedian(1:6) # returns 3
The implementation of the exact analytical solution of the Known Population Median Test involves approximations using the Stirling series and is therefore suspect for computational error. This function creates a population and empirically computes p-values via resampling. It then compares these empirical p-values to those calculated by the Known Population Median Test, and returns the error in log(p). WARNING! Takes a long time.
validate_accuracy(N = 50, n = 10, nrep = 1e+08)
validate_accuracy(N = 50, n = 10, nrep = 1e+08)
N |
population size |
n |
sample size |
nrep |
number of resampling samples for empirically estimating p-values. |
data frame with log(p) error and other information.