Empirical Hardness of Finding Optimal Bayesian Network Structures

This is an online supplement for the paper "Empirical Hardness of Finding Optimal Bayesian Network Structures", B. Malone, K. Kangas, M. Järvisalo, M. Koivisto, P. Myllymäki (under revision). We study the empirical hardness of learning an optimal structure of a Bayesian network from statistical data via score-based methods. Specifically, we evaluate three state-of-the-art approaches or "solvers" for learning optimal structures on a collection of benchmark data. We then learn models for predicting the runtime of each solver on a given problem instance, based on various instance features, and use the models to construct algorithm portfolios.

This supplement provides the experiment data, including the benchmark data sets used, the feature values of all instances, the predictive models learned, as well as the measured and predicted runtimes of solvers. We have also made our data available as a scenario in the ASlib algorithm selection library.

Supplemental material for the preliminary AAAI-14 version of this work can be found here.

Datasets in .csv format (compressed as .tar.gz)

Experiment results

Results table in .csv format

The table contains one line for each BNSL instance, and each line has the following attributes:

name		name of the instance
dataset		name of the source dataset
category		dataset class, one of: real, sampled, synthetic
var		number of variables
rec		number of records in the source dataset
f		scoring function, one of: bdeu, bic
ess		equivalent sample size (? for BIC scores)
pl		parent limit
[solver]	(8 attributes)	running time of [solver], either a number or "to" (timeout) or "mo" (memout)
[feature]	(86 attributes)	value of [feature]
[feature-set].[model-class]_[solver]_prediction	(144 attributes)	predicted runtime of [solver] by the predictor trained using [model-class] and [feature-set]

Models

The models produced by auto-sklearn:

Random forests (820 MB)
Ensemble of various models (45 GB)