Data Format (for MLC task only)

Data is specified in a separate file. This file has the same name as the original network file but with an added .data suffix. For instance, problem.uai will have training data in problem.uai.data file.

The evidence file consists of a T+3 lines where T is the number of data points.

  • The first line in the file will specify the number of data points.
  • The second line in the file will begin with the number of observed or evidence variables followed by the indexes of the observed variables. The indexes correspond to the ones implied by the original problem file.
  • The third line in the file will begin with the number of query variables (or labels) followed by the indexes of the query variables. Again, the indexes correspond to the ones implied by the original problem file.
  • The remaining T lines will specify the data points. Each line will contain an assignment (q,e) to the query and observed variables followed by the weight of the assignment. The weight of the assignment (q,e) is given by log10 Pr(q,e) + log10 Z where Z is the partition function of the Markov network.

For example, given a Markov network having 10 variables, let the indices of the evidence (observed) and query variables be (1,4,7) and (5,6,8) respectively. Given the following 2 data points:

  • The first data point is an assignment of values (0,2,1) and (2,1,0) to the evidence variables (1,4,7) and (5,6,8) respectively. The weight of the assignment is -48.21
  • The second data point is an assignment of values (1,0,1) and (1,0,3) to the evidence variables (1,4,7) and (5,6,8) respectively. The weight of the assignment is -76.27

the data file will contain the following:

2
3 1 4 7
3 5 6 8
1 0 4 2 7 1 5 2 6 1 8 0 -48.21
1 1 4 0 7 1 5 1 6 0 8 3 -76.27