Empirical Hardness of Finding Optimal Bayesian Network Structures

This is an online supplement for the paper "Empirical Hardness of Finding Optimal Bayesian Network Structures", B. Malone, K. Kangas, M. Järvisalo, M. Koivisto, P. Myllymäki (under revision). We study the empirical hardness of learning an optimal structure of a Bayesian network from statistical data via score-based methods. Specifically, we evaluate three state-of-the-art approaches or "solvers" for learning optimal structures on a collection of benchmark data. We then learn models for predicting the runtime of each solver on a given problem instance, based on various instance features, and use the models to construct algorithm portfolios.

This supplement provides the experiment data, including the benchmark data sets used, the feature values of all instances, the predictive models learned, as well as the measured and predicted runtimes of solvers. We have also made our data available as a scenario in the ASlib algorithm selection library.

Supplemental material for the preliminary AAAI-14 version of this work can be found here.

Datasets in .csv format (compressed as .tar.gz)

Experiment results

The table contains one line for each BNSL instance, and each line has the following attributes:

namename of the instance
datasetname of the source dataset
categorydataset class, one of: real, sampled, synthetic
varnumber of variables
recnumber of records in the source dataset
fscoring function, one of: bdeu, bic
essequivalent sample size (? for BIC scores)
plparent limit
[solver](8 attributes)running time of [solver], either a number or "to" (timeout) or "mo" (memout)
[feature](86 attributes)value of [feature]
[feature-set].[model-class]_[solver]_prediction(144 attributes)predicted runtime of [solver] by the predictor trained using [model-class] and [feature-set]

Models

The models produced by auto-sklearn: