Data-Driven Discovery of Models
Automating Data Science and Predicting Behavior through Empirical Models
What is D3M?
The DARPA Data Driven Discovery of Models (D3M) program automates methods in data science to enable domain experts to incorporate their knowledge into the modeling process and create meaningful and valid predictive models of real, complex processes without the need for expert data scientists. Learn more.
Getting Started
Analytic Platforms for Domain SMEs
Three fully integrated, intuitive interactive platforms for domain SMEs to curate, select, edit and explain: (1) data and problems, (2) features and relationships, and (3) models.
Einblick, founded out of years of research at MIT and Brown, is changing the way people work and play with data by providing a fast and collaborative approach to understand the past, predict the future, and optimize decisions.
TwoRavens is a platform for machine learning that allows a domain expert, in concert with our system, to complete a high quality, predictive and interpretable model without any statistical or machine learning expertise. To do so, the system facilitates intuitive machine learning and model interpretation, model discovery, and data exploration. Watch a demo video here.
Distil is a mixed-initiative modeling workbench developed by Uncharted Software. Through an interactive analytic-question-first workflow, it enables subject matter experts to discover underlying dynamics of complex systems and generate data-driven models from tabular, time series, image and multispectral satellite image datasets.
AutoML Engines for Data Scientists
Automated machine learning engines that quickly discover pipelines that outperform human experts, supporting 20+ problem types, and built with an extensible machine learning library of over 300 automatically discoverable modeling primitives.
Alpha-AutoML is an extensible open-source AutoML system developed at the NYU VIDA Center. It leverages the reinforcement learning and neural network components of AlphaD3M, but uses standard, open-source infrastructure to specify and run pipelines. It is compatible with state-of-the-art ML techniques: by using the Scikit-learn pipeline infrastructure, Alpha-AutoML is fully compatible with other standard libraries like XGBoost, Hugging Face, Keras, PyTorch. In addition, primitives can be added on the fly through the standard Scikit-learn's fit/predict API, making it possible for Alpha-AutoML to leverage new developments in machine learning and keep up with the fast pace in the area.
Explore the D3M AutoML Ecosystem
Dataset Search and Augmentation
Learn more about Data Discovery and Augmentation tools that enhance the D3M data preparation and modeling process.
NYU Auctus, an open-source dataset search engine
ISI Datamart, a publicly available knowledge graph with Wikidata at its core
A large database of Millions of auto-generated pipelines to solve 20+ problem types across 200+ datasets.
Extend the D3M AutoML Ecosystem
How to add datasets, problems, primitives, and pipelines using the D3M standard JSON schemas. The AutoML engines will then use existing and new primitives to auto-solve the new problems.
MARVIN visual front end to D3M AutoML ecosystem
The MARVN GUI provides a query, exploration, and analytics interface to the datasets, problems, primitives, and pipeline solutions in the D3M ecosystem.
Discoverable library of fundamental ML elements used to build models and pipelines.
Cutting-edge systems leverage state-of-the-art research in ML architecture search and metalearning to comprehensively explore combinations of ML primitives for high-quality models.