Relationships Methodology
Relationships are the core building block of System. This section outlines how they are computed.
Criteria
The criteria for a piece of statistical evidence to become a relationship on System are:
Source | Strength | Significance |
---|---|---|
Statistical associations computed by System from a public dataset | Strong or Very Strong | Significant |
Statistical associations computed by System from the test set of a machine learning model | Strong or Very Strong | Significant |
Statistical associations retrieved from a (peer reviewed) scientific paper | Very Weak, Weak, Moderate, Strong, or Very Strong | Significant |
What evidence System collects and computes
System programmatically collects and computes the following evidence from each source of evidence:
Source | Statistical information collected and computed |
---|---|
Peer-Reviewed Scientific Articles |
|
Dataset |
|
Model |
|
How System determines “strength”
Strength is an algorithm-agnostic measure of the magnitude of the effect implied by an association. System's methodology differs based on the type of the association.
For correlation-style associations (such as Pearson's R, or Kendall's Tau) we use commonly accepted community guidelines to bucket those associations into one of the five following categories:
STRENGTH | PEARSON-R | KENDALL-TAU | CRAMER-V | EFFECT SIZE |
Very Weak | [0, 0.1) | [0, 0.1) | [0, 0.05) | [0, 0.1) |
Weak | [0.1, 0.3) | [0.1, 0.3) | [0.05, 0.1) | [0.1, 0.3) |
Medium | [0.3, 0.6) | [0.3, 0.6) | [0.1, 0.15) | [0.3, 0.6) |
Strong | [0.6, 0.9) | [0.6, 0.9) | [0.15, 0.25) | [0.6, 0.9) |
Very Strong | [0.9, 1] | [0.9, 1] | [0.25, …) | [0.9, 1] |
For associations derived from predictive models, we use the evidence already on System to bin the value of a feature’s importance into one of the above buckets. The feature importance value (e.g. permutation score) combined with the performance of the model that the association was derived from (e.g. F1 score) is compared with similar associations on System.
STRENGTH | REGRESSORS
(R2 SCORE * PERMUTATION SCORE) | CLASSIFIERS
(F1 SCORE * PERMUTATION SCORE) |
Very Weak | [0, 0.1) of max on System | [0, 0.1) of max on System |
Weak | [0.1, 0.3) of max on System | [0.1, 0.3) of max on System |
Medium | [0.3, 0.6) of max on System | [0.3, 0.6) of max on System |
Strong | [0.6, 0.9) of max on System | [0.6, 0.9) of max on System |
Very Strong | [0.9, 1] of max on System | [0.9, 1] of max on System |
Examples
Source | Source Type | Statistical Association Retrieved | Strength | Significance | Relationship on System |
---|---|---|---|---|---|
Dataset | PEARSON R: 0.983 between primary_school_life_expectancy_years and primary_school_completion_rate_female | Very Strong | P < 0.001 | ||
Dataset | Max PEARSON R (when one feature is lagged): 0.867 between Confirmed Cases Of COVID-19 and Deaths From COVID-19 + 23 lag | Strong | P < 0.001 | ||
Model | R2 Permutation Score: 0.225 between Two_Week_Prior_Weekly_Deaths and Weekly_Deaths | Very Strong | P < 0.001 | ||
Paper | Adjusted Odds Ratio: 1.94 between Individual Is A Lifetime Cigarette Smoker and Individual Consumes Caffeinated Coffee | Very Strong | P = 0.003 |
Last modified 5mo ago