Title: | Information Theory Tools for Forensic Analysis |
---|---|
Description: | The 'forensIT' package is a comprehensive statistical toolkit tailored for handling missing person cases. By leveraging information theory metrics, it enables accurate assessment of kinship, particularly when limited genetic evidence is available. With a focus on optimizing statistical power, 'forensIT' empowers investigators to effectively prioritize family members, enhancing the reliability and efficiency of missing person investigations. |
Authors: | Franco Marsico [aut, cre], Ariel Chernomoretz [aut] |
Maintainer: | Franco Marsico <[email protected]> |
License: | GPL (>= 3) |
Version: | 1.1.1 |
Built: | 2025-03-06 06:20:33 UTC |
Source: | https://github.com/cran/forensIT |
Build ensemble of CPTs from a list of simulations
buildEnsembleCPTs(lsimu, lminimalProbGenoMOI)
buildEnsembleCPTs(lsimu, lminimalProbGenoMOI)
lsimu |
list of simulations |
lminimalProbGenoMOI |
list of minimal probabilities of genotypes given MOI # nolint |
list of CPTs
library(forrel) library(mispitools) freqs <- lapply(getfreqs(Argentina)[1:15], function(x) {x[x!=0]}) fam <- linearPed(2) fam <- addChildren(fam, father = 1, mother = 2) fam <- pedtools::setMarkers(fam, locusAttributes = freqs) ped <- profileSim(fam, N = 1, ids = c(6) , numCores = 1,seed=123) lsimEnsemble <- simTestIDMarkers(ped,2,numSim=5,seed=123) lensembleIT <- buildEnsembleITValues(lsimu=lsimEnsemble,ITtab=simME$ITtable,bFullIT = TRUE) lensembleCPTs <- buildEnsembleCPTs(lsimu=lsimEnsemble,lminimalProbGenoMOI=simME$lprobGenoMOI)
library(forrel) library(mispitools) freqs <- lapply(getfreqs(Argentina)[1:15], function(x) {x[x!=0]}) fam <- linearPed(2) fam <- addChildren(fam, father = 1, mother = 2) fam <- pedtools::setMarkers(fam, locusAttributes = freqs) ped <- profileSim(fam, N = 1, ids = c(6) , numCores = 1,seed=123) lsimEnsemble <- simTestIDMarkers(ped,2,numSim=5,seed=123) lensembleIT <- buildEnsembleITValues(lsimu=lsimEnsemble,ITtab=simME$ITtable,bFullIT = TRUE) lensembleCPTs <- buildEnsembleCPTs(lsimu=lsimEnsemble,lminimalProbGenoMOI=simME$lprobGenoMOI)
Build ensemble of IT values from a list of simulations
buildEnsembleITValues( lsimu = lsimulation, ITtab = sim$ITtable, bFullIT = FALSE )
buildEnsembleITValues( lsimu = lsimulation, ITtab = sim$ITtable, bFullIT = FALSE )
lsimu |
list of simulations |
ITtab |
IT table |
bFullIT |
boolean to return full IT table |
list of IT values
library(forrel) library(mispitools) freqs <- lapply(getfreqs(Argentina)[1:15], function(x) {x[x!=0]}) fam <- linearPed(2) fam <- addChildren(fam, father = 1, mother = 2) fam <- pedtools::setMarkers(fam, locusAttributes = freqs) ped <- profileSim(fam, N = 1, ids = c(6) , numCores = 1,seed=123) lsimEnsemble <- simTestIDMarkers(ped,2,numSim=5,seed=123) lensembleIT <- buildEnsembleITValues(lsimu=lsimEnsemble,ITtab=simME$ITtable,bFullIT = TRUE)
library(forrel) library(mispitools) freqs <- lapply(getfreqs(Argentina)[1:15], function(x) {x[x!=0]}) fam <- linearPed(2) fam <- addChildren(fam, father = 1, mother = 2) fam <- pedtools::setMarkers(fam, locusAttributes = freqs) ped <- profileSim(fam, N = 1, ids = c(6) , numCores = 1,seed=123) lsimEnsemble <- simTestIDMarkers(ped,2,numSim=5,seed=123) lensembleIT <- buildEnsembleITValues(lsimu=lsimEnsemble,ITtab=simME$ITtable,bFullIT = TRUE)
Compare population and Bayesian network genotype probability density functions # nolint
compareBnetPopGenoPDFs(lprobTable)
compareBnetPopGenoPDFs(lprobTable)
lprobTable |
list of probability tables |
list of KL divergences
Cross entropy
crossH(px, py, epsilon = 1e-20)
crossH(px, py, epsilon = 1e-20)
px |
probability distribution |
py |
probability distribution |
epsilon |
small number to avoid log(0) |
cross entropy
distKL: KL distribution obtained for specific relative contributor
distKL(ped, missing, relative, frequency, numsims = 100, cores = 1)
distKL(ped, missing, relative, frequency, numsims = 100, cores = 1)
ped |
Reference pedigree. It could be an input from read_fam() function or a pedigree built with pedtools. # nolint |
missing |
Missing person |
relative |
Selected relative. |
frequency |
Allele frequency database. |
numsims |
Number of simulated genotypes. |
cores |
Enables parallelization. |
An object of class data.frame with KLs.
library(forrel) x = linearPed(2) x = setMarkers(x, locusAttributes = NorwegianFrequencies[1:2]) x = profileSim(x, N = 1, ids = 2) distKL(ped = x, missing = 5, relative = 1, cores = 1, frequency = NorwegianFrequencies[1:2], numsims = 3)
library(forrel) x = linearPed(2) x = setMarkers(x, locusAttributes = NorwegianFrequencies[1:2]) x = profileSim(x, N = 1, ids = 2) distKL(ped = x, missing = 5, relative = 1, cores = 1, frequency = NorwegianFrequencies[1:2], numsims = 3)
Eliminate Mendelian errors using Lange-Goradia algorithm
elimLangeGoradia(ped, iMarker = 1, bitera = TRUE, bverbose = TRUE)
elimLangeGoradia(ped, iMarker = 1, bitera = TRUE, bverbose = TRUE)
ped |
pedigree |
iMarker |
index of marker to be used |
bitera |
iterate until no more errors are found |
bverbose |
print progress |
pedigree with Mendelian errors eliminated
Export a pedigree to a file
exportPed(ped, fname, iMarker = 1)
exportPed(ped, fname, iMarker = 1)
ped |
pedigree |
fname |
file name |
iMarker |
index of marker to be used |
pedigree with Mendelian errors eliminated
The 'forensIT' package, available on CRAN, is a comprehensive statistical toolkit tailored for handling missing person cases. By leveraging information theory metrics, it enables accurate assessment of kinship, particularly when limited genetic evidence is available. With a focus on optimizing statistical power, 'forensIT' empowers investigators to effectively prioritize family members, enhancing the reliability and efficiency of missing person investigations. Experience the power of information theory in kinship testing with the user-friendly 'forensIT' package, freely accessible on CRAN. # nolint
Maintainer: Franco Marsico [email protected]
Authors:
Ariel Chernomoretz [email protected]
Calculate genotype probabilities from parental probabilities
genotypeProbs(probP, probM)
genotypeProbs(probP, probM)
probP |
vector of parental probabilities |
probM |
vector of parental probabilities |
matrix of genotype probabilities
Genotype Probability Table
genotypeProbTable(bbn1, resQQ, bplot = FALSE, numMarkers = 4, lLoci)
genotypeProbTable(bbn1, resQQ, bplot = FALSE, numMarkers = 4, lLoci)
bbn1 |
Bayesian network |
resQQ |
results from bn |
bplot |
boolean to plot |
numMarkers |
number of markers |
lLoci |
list of loci |
Genotype Probability Table
function to calculate the probability of genotypes given the MOI
genotypeProbTable_bis(bbn1, resQQ, bplot = FALSE, numMarkers = 4, freq)
genotypeProbTable_bis(bbn1, resQQ, bplot = FALSE, numMarkers = 4, freq)
bbn1 |
bayesian network |
resQQ |
list of results from the inference |
bplot |
plot results |
numMarkers |
number of markers |
freq |
allele frequencies |
matrix of genotype probabilities
Get alleles from genotypes
getAllelesFromGenotypes(g)
getAllelesFromGenotypes(g)
g |
genotypes |
alleles
Entropy of a discrete probability distribution
H(px, epsilon = 1e-20, normalized = FALSE)
H(px, epsilon = 1e-20, normalized = FALSE)
px |
probability distribution |
epsilon |
small number to avoid log(0) |
normalized |
boolean to normalize entropy |
entropy
index2Genotypes2
index2Genotypes2(ped, id, iMarker, alleleSet)
index2Genotypes2(ped, id, iMarker, alleleSet)
ped |
pedigree |
id |
individual id |
iMarker |
marker index |
alleleSet |
allele set |
genotypes
index2Genotypes
index2Genotypes2.pedtools(ped, id, iMarker, alleleSet)
index2Genotypes2.pedtools(ped, id, iMarker, alleleSet)
ped |
pedigree |
id |
individual id |
iMarker |
marker index |
alleleSet |
allele set |
genotypes
KL divergence
KLd(ppx, ppy, epsilon = 1e-20, bsigma = FALSE)
KLd(ppx, ppy, epsilon = 1e-20, bsigma = FALSE)
ppx |
probability distribution |
ppy |
probability distribution |
epsilon |
small number to avoid log(0) |
bsigma |
boolean to compute sigma |
KL divergence
KL divergence
KLde(px, py, epsilon = 1e-20)
KLde(px, py, epsilon = 1e-20)
px |
probability distribution |
py |
probability distribution |
epsilon |
small number to avoid log(0) |
KL divergence
perMarkerKLs
perMarkerKLs(ped, MP, frequency)
perMarkerKLs(ped, MP, frequency)
ped |
Reference pedigree. |
MP |
missing person |
frequency |
Allele frequency database. |
An object of class data.frame with KLs.
library(forrel) x = linearPed(2) plot(x) x = setMarkers(x, locusAttributes = NorwegianFrequencies[1:5]) x = profileSim(x, N = 1, ids = 2) perMarkerKLs(x, MP = 5 , NorwegianFrequencies[1:5])
library(forrel) x = linearPed(2) plot(x) x = setMarkers(x, locusAttributes = NorwegianFrequencies[1:5]) x = profileSim(x, N = 1, ids = 2) perMarkerKLs(x, MP = 5 , NorwegianFrequencies[1:5])
Plot KL distances.
plotKL(res)
plotKL(res)
res |
output from distKL function. |
A scatterplot.
library(forrel) x = linearPed(2) plot(x) x = setMarkers(x, locusAttributes = NorwegianFrequencies[1:5]) x = profileSim(x, N = 1, ids = 2) res <- distKL(ped = x, missing = 5, relative = 1, cores = 1, frequency = NorwegianFrequencies[1:5], numsims = 5) plotKL(res)
library(forrel) x = linearPed(2) plot(x) x = setMarkers(x, locusAttributes = NorwegianFrequencies[1:5]) x = profileSim(x, N = 1, ids = 2) res <- distKL(ped = x, missing = 5, relative = 1, cores = 1, frequency = NorwegianFrequencies[1:5], numsims = 5) plotKL(res)
Px
Px(p1, p0, dbg = FALSE)
Px(p1, p0, dbg = FALSE)
p1 |
probability distribution |
p0 |
probability distribution |
dbg |
boolean to compute sigma |
Px
run information theory (IT) metrics
runIT( lped = NULL, freqs, QP, dbg, numCores, bOnlyIT = FALSE, lprobg_ped = NULL, bsigma = FALSE, blog = FALSE, dep = TRUE )
runIT( lped = NULL, freqs, QP, dbg, numCores, bOnlyIT = FALSE, lprobg_ped = NULL, bsigma = FALSE, blog = FALSE, dep = TRUE )
lped |
list of pedigree objects |
freqs |
list of allele frequencies |
QP |
QP |
dbg |
debug |
numCores |
number of cores |
bOnlyIT |
boolean to only run IT |
lprobg_ped |
list of probG |
bsigma |
boolean to compute sigma |
blog |
boolean to write log |
dep |
check fbnet dependency |
runIT
Simulate LR
simLR( lprobg_ped, numSim = 10000, epsilon = 1e-20, bplot = FALSE, bLRs = FALSE, seed = 123457 )
simLR( lprobg_ped, numSim = 10000, epsilon = 1e-20, bplot = FALSE, bLRs = FALSE, seed = 123457 )
lprobg_ped |
list of probability distributions |
numSim |
number of simulations |
epsilon |
small number to avoid log(0) |
bplot |
boolean to plot |
bLRs |
boolean to return LRs |
seed |
seed |
LRs
simME: output from simMinimalEnsemble considering an uncle
simME
simME
A list with minimalEnsemble of genotypes
It performs simulations of minimal ensembles of genotypes
simMinimalEnsemble( ped, QP, testID, freqs, numCores = 1, seed = 123457, bVerbose = TRUE, bJustGetNumber = FALSE, bdbg = FALSE, dep = TRUE )
simMinimalEnsemble( ped, QP, testID, freqs, numCores = 1, seed = 123457, bVerbose = TRUE, bJustGetNumber = FALSE, bdbg = FALSE, dep = TRUE )
ped |
pedigree |
QP |
QP |
testID |
test ID |
freqs |
frequencies |
numCores |
number of cores |
seed |
seed |
bVerbose |
boolean to print information |
bJustGetNumber |
boolean to just get the number of runs |
bdbg |
boolean to debug |
dep |
check dependency fbnet |
list of results
Simulate testID markers
simTestIDMarkers(ped, testID, numSim = 10, seed = 123457)
simTestIDMarkers(ped, testID, numSim = 10, seed = 123457)
ped |
pedigree |
testID |
test ID |
numSim |
number of simulations |
seed |
seed |
list of simulations
library(forrel) library(mispitools) freqs <- lapply(getfreqs(Argentina)[1:15], function(x) {x[x!=0]}) fam <- linearPed(2) fam <- addChildren(fam, father = 1, mother = 2) fam <- pedtools::setMarkers(fam, locusAttributes = freqs) ped <- profileSim(fam, N = 1, ids = c(6) , numCores = 1,seed=123) lsimEnsemble <- simTestIDMarkers(ped,2,numSim=5,seed=123)
library(forrel) library(mispitools) freqs <- lapply(getfreqs(Argentina)[1:15], function(x) {x[x!=0]}) fam <- linearPed(2) fam <- addChildren(fam, father = 1, mother = 2) fam <- pedtools::setMarkers(fam, locusAttributes = freqs) ped <- profileSim(fam, N = 1, ids = c(6) , numCores = 1,seed=123) lsimEnsemble <- simTestIDMarkers(ped,2,numSim=5,seed=123)
strsplit2
strsplit2(x, split)
strsplit2(x, split)
x |
character vector |
split |
character |
matrix
Check for Mendelian errors in trios
trioCheckFast(ffa, mmo, oof)
trioCheckFast(ffa, mmo, oof)
ffa |
father's alleles |
mmo |
mother's alleles |
oof |
offspring's alleles |
TRUE if there is a Mendelian error
unidimKLplot: KL distributions presented in the same units (Log10(LR))
unidimKLplot(res)
unidimKLplot(res)
res |
output from distKL function. |
A scatterplot.