illinois-ml-nlp-users AT lists.siebelschool.illinois.edu
Subject: Support for users of CCG software closed 7-27-20
List archive
- From: Greg Durrett <gdurrett AT eecs.berkeley.edu>
- To: illinois-ml-nlp-users AT cs.uiuc.edu
- Subject: [Illinois-ml-nlp-users] NER tagger not preserving line breaks
- Date: Sat, 12 Apr 2014 16:10:41 -0700
- List-archive: <http://lists.cs.uiuc.edu/pipermail/illinois-ml-nlp-users/>
- List-id: Support for users of CCG software <illinois-ml-nlp-users.cs.uiuc.edu>
Hi,
I'm trying to use the NER tagger:
I want to pass it data that has been pre-tokenized and sentence split. However, I can't seem to figure out how to get it to respect line breaks. I've attached the input (.sentences; the same thing happens with just one line break between each) and output (.tagged) when I run with the given config. The config is basically the default ontonotes file with forceNewSentenceOnLineBreaks set to true and additionally pathToTokenNormalizationData set to false (since I thought this might cause retokenization/re sentence splitting).
In the log, the tool states that one of the parameters is
keepOriginalFileTokenizationAndSentenceSplitting=false
which seems bad but the system doesn't seem to accept this as an argument?
Any advice?
Thanks!
Greg
Attachment:
conll-2012-dev-short.sentences
Description: Binary data
Attachment:
conll-2012-dev-short.tagged
Description: Binary data
Attachment:
greg.config
Description: Binary data
- [Illinois-ml-nlp-users] NER tagger not preserving line breaks, Greg Durrett, 04/12/2014
Archive powered by MHonArc 2.6.16.