testing - WEKA Train and test set are not compatible when classifying boolean data -
i getting following message in weka explorer when try use training data classify new test data:
problem evaluating classifier: train , test set not compatible attributed differ @ position 6: labels differ @ position 1: true != false
i using j48 classifier classify rss feeds according popularity of keywords both in boolean form , numerically. problem occurs boolean variant. training data this:
@relation _dm_3793_855329_11032013_1362993476361_boolean-weka.filters.unsupervised.attribute.numerictonominal-r65 @attribute bin {false,true} @attribute kill {false,true} @attribute laden {false,true} @attribute video {false,true} @attribute pakistan {false,true} @attribute imf {true,false} …
whereas equivalent testing data is:
@relation _dm_4993_179211_18032013_1363611143017_boolean-weka.filters.unsupervised.attribute.numerictonominal-r65 @attribute bin {false,true} @attribute kill {false,true} @attribute laden {false,true} @attribute video {false,true} @attribute pakistan {false,true} @attribute imf {false,true} …
for last line attribute ‘imf’
, labels reversed assume cause of problem: how can solve it?
both training , testing data labelled , typical row resembles following:
@data false,false,false,true,false,false,false,false,true,false,false, …, ‘name of class’
my .arff
files created dynamically in java code follows:
// create .arff file. csvloader loader = new csvloader(); loader.setsource(new file(cf.getcsvfile())); instances data = loader.getdataset(); numerictonominal numtonom = new numerictonominal(); string[] options = utils.splitoptions("-r " + columnnames.length + ""); // class attribute, if numeric, must 'discretized'. numtonom.setoptions(options); numtonom.setinputformat(data); data = numerictonominal.usefilter(data, numtonom); arffsaver saver = new arffsaver(); saver.setinstances(data); saver.setfile(new file(cf.getarfffile())); saver.writebatch();
so can tell me if i’m using filter incorrectly or missing something? equivalent .arff
files numeric frequencies, generated through same code, compatible.
thanks
mr morgan.
for last line attribute ‘imf’, labels reversed assume cause of problem: how can solve it?
change test arff file has same header training file. both training , testing file should have same header information apart relation name. make last line of header
@attribute imf {true,false}
i had same problem see question , answer. decide header information , put file. every data set use same header information. either using coding or create arff files hand.
Comments
Post a Comment