illinois-ml-nlp-users AT lists.siebelschool.illinois.edu
Subject: Support for users of CCG software closed 7-27-20
List archive
[Illinois-ml-nlp-users] Seg fault on N=2.5 million subset of DNA dataset from pascal large scale learning challenge
Chronological Thread
- From: Hugh Perkins <hughperkins AT gmail.com>
- To: "illinois-ml-nlp-users AT cs.uiuc.edu" <illinois-ml-nlp-users AT cs.uiuc.edu>
- Subject: [Illinois-ml-nlp-users] Seg fault on N=2.5 million subset of DNA dataset from pascal large scale learning challenge
- Date: Sat, 16 Feb 2013 23:09:42 +0800
- List-archive: <http://lists.cs.uiuc.edu/pipermail/illinois-ml-nlp-users/>
- List-id: Support for users of CCG software <illinois-ml-nlp-users.cs.uiuc.edu>
Seg fault on N=2.5 million subset of DNA dataset from pascal large scale learning challenge
Procedure followed:
- download dna from http://largescale.ml.tu-berlin.de/instructions/
- run convert.py script from site above to create dna_train_svmlight.txt
head -n 5000000 dna_train_svmlight.txt > 2M5traintest.txt
head -n 2500000 2M5traintest.txt > train2M5.txt
tail -n 2500000 2M5traintest.txt > test2M5.txt
$install/sbm_exp-1.0/cdblock/blockspliter $data/dna/train2M5.txt $data/dna/train2M5splitbase
$install/sbm_exp-1.0/cdblock/blocktrain -c 1 -B 1 -w1 329 $data/dna/train2M5splitbase $data/dna/sbmexpmodel2m5.txt
The blockspliter runs ok
The blocktrain segfaults after a few minutes:
(juncluster)hughperkins@juncluster3:~/install/sbm_exp-1.0/cdblock$ $install/sbm_exp-1.0/cdblock/blockspliter $data/dna/train2M5.txt $data/dna/train2M5splitbase
.................................................................................................................................................................................$install/sbm_exp-1.0/cdblock/blocktrain -c 1 -B 1 -w1 329 $data/dna/train2M5splitbase $data/dna/sbmexpmodel2m5.txt
.........................................................................
time : 142
(juncluster)hughperkins@juncluster3:~/install/sbm_exp-1.0/cdblock$ $install/sbm_exp-1.0/cdblock/blocktrain -c 1 -B 1 -w1 329 $data/dna/train2M5splitbase $data/dna/sbmexpmodel2m5.txt
0
n:800
Testing file is missingREALLOCATE MEMORY
Segmentation fault (core dumped)
I also tried with the liblinear version of cdblock, and got approximately the same result:
I kind of want to be able to compare this dataset/solver with a novel solver for a possible KDD paper this week. I know it's a bit short notice!
- [Illinois-ml-nlp-users] Seg fault on N=2.5 million subset of DNA dataset from pascal large scale learning challenge, Hugh Perkins, 02/16/2013
Archive powered by MHonArc 2.6.16.