System Docs
Search…
Browsing the catalog
System Catalog organizes all the raw material and metadata used to generate relationships on System. It is a novel way of cataloging and discovering the output of research and data science.
Use System Catalog to find research material about a topic of interest, investigate material more closely with visually-rich summaries, track changes in material that is dynamic, and compare certain types of material.
You can search System Catalog and sort what’s returned by parameters like when they were last updated, the most used or popular, and alphabetically.
The primary objects cataloged by System Catalog are:
  • Data
    • Datasets
    • Features
  • Studies
  • Models
  • Metrics
  • Topics
Each object in System Catalog has a card view and a page view. Clicking on an object category in the catalog headings (shown above), will direct you to the card views for the objects. The card view offers a summary of the ways in which the object is used and defined on System. The page view expands on the information available in the card view. You will see examples of page views for each object below.
Data
Datasets - A dataset on System is a representation of a table of data. This could be a live table in a database, a CSV file, or, say, the result of a survey. It captures the high level information about the table (e.g. name, URL) as well as its content (e.g. columns). Datasets contain a set of features. System only stores metadata about a dataset, not the data itself.
Below is an example of a dataset page. You can find important information such as population, description, shape, methodology, version, license, available formats, the owner and the link to the data source. When possible, you can also examine how the dataset is being used and queried.
Data Provenance & Usage
When datasets are connected to System through the platform’s database connectors, we provide additional metadata in the form of queries to the database that mention that particular dataset. This information is sourced from usage logs stored by cloud database services, which are processed and filtered to provide you with information on how that specific dataset is used. These queries often contain information about how the table or view is created in the database, and they can sometimes tell you what tables or views are created from this dataset (the dataset’s “provenance”). The Usage tab on the dataset page displays the most frequently queried mentions of that dataset in the most recent 24 hours, and also provides links to other datasets and features on System that are mentioned in those queries.
Data Quality
When metadata on datasets and features are updated on System (most often via retrieval through the platform’s database integrations), the platform conducts automatic checks for indications of changes in data quality. These include tracking any changes in the number of missing observations for all feature types, as well as changes in the statistical properties of numerical, categorical, and time series feature types (including minimum/maximum values, the mean and median values, etc.). Alerts and notifications are generated for any changes in the values of these properties that exceed a certain threshold. These are currently sent to users in testing. The default values for these thresholds are calibrated for each feature type and statistical property and are configurable (e.g., you can set an alert when the mean value of a feature changes by +/- 10% for a specific set of features).
Features — A feature on System represents data in a column of a dataset.
Below is an example of a feature page. The page shows a distribution graph from Nov 30 1999 to Nov 30 2021, generated by System upon retrieval.
Models
A model on System is a representation of a statistical model. Models can be machine learning models (e.g. predictive, supervised or unsupervised), any fitted statistical model (e.g. Generalized Linear Regressions, Logistic Regressions, etc.), or any model in production with an accessible endpoint (e.g. models deployed to AWS SageMaker.) A model can be as simple as a 2-variable correlation analysis or as complex as a 100-feature Random Forest classifier.
System’s model object is designed to help you:
find the most important features from a variety of different feature importance metrics computed by System, at a moment in time and over time (if the model is in production) measure and visualize the interaction between features track drift in performance and feature importance compare different models predicting the same target
For System to generate a model object, the model has to first be trained or fitted in one of the platform’s supported statistical or machine learning libraries (learn more here) and added to System via the platform’s Python or R packages.
Below is an annotated walkthrough of a model page. Click on each image to expand.
Compare Models
System provides a way to closely examine knowledge produced from multiple models through Model Compare. This feature provides easy to read side-by-side comparisons of the metadata for two or more models that have been added to System. You can quickly see differences in model performance metrics, feature importance values, and hyperparameters. It also provides links to datasets, features, and metrics used in the model in order to give you a more complete understanding of the different data sources and specifications for a set of models.
System provides a way to closely examine knowledge produced from multiple models through Model Compare. This feature provides easy to read side-by-side comparisons of the metadata for two or more models that have been added to System. You can quickly see differences in model performance metrics, feature importance values, and hyperparameters. It also provides links to datasets, features, and metrics used in the model in order to give you a more complete understanding of the different data sources and specifications for a set of models.
Copy link