Data-Driven Discovery of Models

Automating Data Science and Predicting Behavior through Empirical Models

What is D3M?

The DARPA Data Driven Discovery of Models (D3M) program automates methods in data science to enable domain experts to incorporate their knowledge into the modeling process and create meaningful and valid predictive models of real, complex processes without the need for expert data scientists. Learn more.

Getting Started

Analytic Platforms for Domain SMEs

Three fully integrated, intuitive interactive platforms for domain SMEs to curate, select, edit and explain: (1) data and problems, (2) features and relationships, and (3) models.

Einblick, founded out of years of research at MIT and Brown, is changing the way people work and play with data by providing a fast and collaborative approach to understand the past, predict the future, and optimize decisions.

TwoRavens is a platform for machine learning that allows a domain expert, in concert with our system, to complete a high quality, predictive and interpretable model without any statistical or machine learning expertise. To do so, the system facilitates intuitive machine learning and model interpretation, model discovery, and data exploration. Watch a demo video here.

Distil is a mixed-initiative modeling workbench developed by Uncharted Software. Through an interactive analytic-question-first workflow, it enables subject matter experts to discover underlying dynamics of complex systems and generate data-driven models from tabular, time series, image and multispectral satellite image datasets.

AutoML Engines for Data Scientists

Automated machine learning engines that quickly discover pipelines that outperform human experts, supporting 20+ problem types, and built with an extensible machine learning library of over 300 automatically discoverable modeling primitives.


AutonML is an automated machine learning system developed by Carnegie Mellon University Auton Lab to power data scientists with efficient model discovery and advanced data analytics. “AutonML takes your machine learning capacity to the nth power”

Alpha-AutoML is an extensible open-source AutoML system developed at the NYU VIDA Center. It leverages the reinforcement learning and neural network components of AlphaD3M, but uses standard, open-source infrastructure to specify and run pipelines. It is compatible with state-of-the-art ML techniques: by using the Scikit-learn pipeline infrastructure, Alpha-AutoML is fully compatible with other standard libraries like XGBoost, Hugging Face, Keras, PyTorch. In addition, primitives can be added on the fly through the standard Scikit-learn's fit/predict API, making it possible for Alpha-AutoML to leverage new developments in machine learning and keep up with the fast pace in the area.


Explore the D3M AutoML Ecosystem

Dataset Search and Augmentation

Learn more about Data Discovery and Augmentation tools that enhance the D3M data preparation and modeling process.

  • NYU Auctus, an open-source dataset search engine

  • ISI Datamart, a publicly available knowledge graph with Wikidata at its core

Metalearning Database

A large database of Millions of auto-generated pipelines to solve 20+ problem types across 200+ datasets.

Extend the D3M AutoML Ecosystem

How to add datasets, problems, primitives, and pipelines using the D3M standard JSON schemas. The AutoML engines will then use existing and new primitives to auto-solve the new problems.

MARVIN visual front end to D3M AutoML ecosystem

The MARVN GUI provides a query, exploration, and analytics interface to the datasets, problems, primitives, and pipeline solutions in the D3M ecosystem.

Machine Learning Primitives

Discoverable library of fundamental ML elements used to build models and pipelines.

Other D3M AutoML Engines

Cutting-edge systems leverage state-of-the-art research in ML architecture search and metalearning to comprehensively explore combinations of ML primitives for high-quality models.