System Docs

Investigating relationships

System gathers raw evidence of statistical associations from public datasets, public models, and scientific papers. These are added to System programmatically (through connectors and packages) and manually by users. The source of every piece of evidence and how it was added to System are included alongside the evidence.
The current beta release only supports quantitative evidence, and only a small community of users can add to System today (to mitigate certain unintended consequences we are actively working to prevent). In upcoming releases, anyone will be able to add to System and the platform will support qualitative evidence.
Once in System’s graph database (SystemDB), pieces of evidence of statistical association are given a strength score, a significance flag, and a reproducibility flag (view below). Pieces of evidence that meet the criteria for strength and significance outlined here become “relationships” between pairs of metrics and pairs of topics on System. System always includes reference to the original source of the evidence, authorship, and the raw values of statistical association.
Relationships are displayed in two forms: a line (or edge) on the graph and a relationship card on pages.
Clicking on either will show more information about the relationship. Relationships are annotated and contextualized with the following attributes:
  • Strength
  • Sign
  • Direction
  • Population
  • Controls
  • Reproducibility
Strength is a measure of how statistically strong a relationship is in some defined context. The platform computes strength based on this methodology.
The sign of an evidence indicates whether an increase in one metric, would increase (positive) or decrease (negative) the other metric.
The direction of a relationship indicates how information flows between a pair of objects (like metrics). System infers and reports statistical directions (e.g. direction from independent to dependent variables). These should not be interpreted as causal links.
Directions are directly inferred by the platform or manually entered from the source. For example, a predictive model generates evidence directed from features to the model's target (Feature → Target). Or a study reporting the finding of a randomized control trial creates evidence directed from the intervention to outcome (Intervention → Outcome).
Some evidence types, particularly those generated from interactions between features in a dataset (e.g. Pearson R correlations), are bidirectional (<-->). If the directionality of the evidence is not known, the line linking a pair of metrics has no arrow.
At the metric and topic relationship levels, which are built by combining multiple pieces of evidence between the same sets of objects, directionality is inferred when there is a consensus between the underlying pieces of evidence. Otherwise, the directionality is presently marked as unknown.
The context and population (subjects) that the evidence was gathered from is an important consideration in understanding, comparing, and using a relationship — and evaluating its representativeness. Populations may include parameters like who (e.g. Women aged 18-54), where (e.g. South Korea), when (e.g. May-July 2021). This is currently stored and displayed as unstructured text. We are working to make this structured.
A control variable (or scientific constant) in scientific experimentation is an experimental element which is constant and unchanged throughout the course of the investigation. Control variables could strongly influence experimental results, were they not held constant during the experiment in order to test the relative relationship of the dependent and independent variables. The control variables themselves are not of primary interest to the experimenter. (Source: Wikipedia)
The reproducibility of each piece of evidence is established by investigating the presence or absence of various information that would enable other researchers to successfully reproduce the outcome of the original research. The following table lists the fields of information that System considers in determining the reproducibility of a piece of evidence:
Study Details:
Information about the methodology and description of the study. This can be satisfied by submitting the Digital Object Identifier (DOI) of the study, or a detailed description of the methodology and the source code.
Control Variables (covariates)
The variables that are controlled for in the model where the associations are measured
Accessibility of Data:
Training dataset source
Parent dataset
Providing the dataset that the model was trained on or the inference was based off increases the reproducibility of associations.
Status of Statistical Significance
We encourage authors to report all point estimates with confidence intervals or significance (p-value and confidence level).
For ML Models: ML Hyperparameters
Hyperparameters used when training an ML model
For ML Models:
Partial Dependence Plots
Partial Dependence Plots (PDPs) are among the best ways to report on the contribution of a feature to the overall performance of an ML model.
Relationship Summaries & Meta-Analysis
System generates a natural language version of the statistical evidence of a relationship for easier understanding. For example:
The relationship between COVID-19 and Heart Failure is established by 2 sources.
4.29 Odds Ratio between Individual has History of Heart Failure and Patient was Hospitalized for COVID-19 Treatment
Multiple Odds Ratios (2.01, 1.76, 2.57, 1.99, 1.87) between Individual has History of Heart Failure and Patient Died After COVID-19 Diagnosis
Implementation of Physical Distancing Measures is associated with a 62% decrease in the odds of Happiness in 1 source.
0.38 Odds Ratio between Implementation of Physical Distancing Measures and Happiness
Life Expectancy in Bottom Income Quartile in Zip Code slightly increases Income Segregation of Zip Code in 1 source.
0.26 Pearson R between Income Segregation of Zip Code and Life Expectancy in Bottom Income Quartile in Zip Code