Skip to Content.
Sympa Menu

illinois-ml-nlp-users - [Illinois-ml-nlp-users] Seg fault on N=2.5 million subset of DNA dataset from pascal large scale learning challenge

illinois-ml-nlp-users AT lists.siebelschool.illinois.edu

Subject: Support for users of CCG software closed 7-27-20

List archive

[Illinois-ml-nlp-users] Seg fault on N=2.5 million subset of DNA dataset from pascal large scale learning challenge


Chronological Thread 
  • From: Hugh Perkins <hughperkins AT gmail.com>
  • To: "illinois-ml-nlp-users AT cs.uiuc.edu" <illinois-ml-nlp-users AT cs.uiuc.edu>
  • Subject: [Illinois-ml-nlp-users] Seg fault on N=2.5 million subset of DNA dataset from pascal large scale learning challenge
  • Date: Sat, 16 Feb 2013 23:09:42 +0800
  • List-archive: <http://lists.cs.uiuc.edu/pipermail/illinois-ml-nlp-users/>
  • List-id: Support for users of CCG software <illinois-ml-nlp-users.cs.uiuc.edu>

Seg fault on N=2.5 million subset of DNA dataset from pascal large scale learning challenge

Procedure followed:
- download dna from http://largescale.ml.tu-berlin.de/instructions/ 
- run convert.py script from site above to create dna_train_svmlight.txt 

head -n 5000000 dna_train_svmlight.txt > 2M5traintest.txt
head -n 2500000 2M5traintest.txt > train2M5.txt
tail -n 2500000 2M5traintest.txt > test2M5.txt

$install/sbm_exp-1.0/cdblock/blockspliter $data/dna/train2M5.txt $data/dna/train2M5splitbase
$install/sbm_exp-1.0/cdblock/blocktrain -c 1 -B 1 -w1 329 $data/dna/train2M5splitbase $data/dna/sbmexpmodel2m5.txt

The blockspliter runs ok
The blocktrain segfaults after a few minutes:

(juncluster)hughperkins@juncluster3:~/install/sbm_exp-1.0/cdblock$ $install/sbm_exp-1.0/cdblock/blockspliter $data/dna/train2M5.txt $data/dna/train2M5splitbase
.................................................................................................................................................................................$install/sbm_exp-1.0/cdblock/blocktrain -c 1 -B 1 -w1 329 $data/dna/train2M5splitbase $data/dna/sbmexpmodel2m5.txt
.........................................................................
time : 142
(juncluster)hughperkins@juncluster3:~/install/sbm_exp-1.0/cdblock$ $install/sbm_exp-1.0/cdblock/blocktrain -c 1 -B 1 -w1 329 $data/dna/train2M5splitbase $data/dna/sbmexpmodel2m5.txt
0
n:800
Testing file is missingREALLOCATE MEMORY
Segmentation fault (core dumped)

I also tried with the liblinear version of cdblock, and got approximately the same result:

I kind of want to be able to compare this dataset/solver with a novel solver for a possible KDD paper this week.  I know it's a bit short notice!




  • [Illinois-ml-nlp-users] Seg fault on N=2.5 million subset of DNA dataset from pascal large scale learning challenge, Hugh Perkins, 02/16/2013

Archive powered by MHonArc 2.6.16.

Top of Page