How do we ensure and measure accuracy?

Model accuracy

Since the models we use to extract and link information together include LLMs, there is a risk that some of the information we include on our platform will be inaccurate. To reduce the risk of sharing inaccurate content, we have set up multiple guardrails, starting at the source of extraction, with processes in place for monitoring, evaluating, and gathering feedback on inaccuracies.

Since our models are looking for statistical pieces of information in a specific text, we incorporate validation methods both pre- and post-extraction to ensure the content extracted is actually reported in the abstract, design our models to extract information that meets a specific format, and conduct additional post-processing to filter our potentially erroneous results. Together, these guardrails help us significantly reduce the risk of hallucination.

Additionally, by incorporating clear instructions and rules, we have achieved between 84%-90% accuracy in our extraction models, meaning that each component of the statistical relationship extracted is accurate to the source. Inaccuracies can include misleading variable names, incomplete statistic types (e.g., “odds ratio” instead of “adjusted odds ratio”) or p-values with the wrong sign (p=0.05, instead of p<0.05).

Tracking and fixing inaccuracies

In order to continue to improve on these models, we routinely conduct external evaluations on large samples of statistical relationships, and use this information to improve our models. Additionally, we have created processes on the front end for users of System Pro to flag any inaccuracies and suggest revisions to extracted information, which then removes those pieces of evidence from the site until the content can be reviewed and updated.

Last updated