illinois-ml-nlp-users AT lists.siebelschool.illinois.edu
Subject: Support for users of CCG software closed 7-27-20
List archive
- From: Lev Ratinov <why2ask AT gmail.com>
- To: Arvind Rajasekaran <arvind.rajasekaran AT hgdata.com>, illinois-ml-nlp-users <illinois-ml-nlp-users AT cs.uiuc.edu>, Mark Sammons <mssammon AT illinois.edu>
- Cc: Yuxiao Zhang <yuxiao.zhang AT hgdata.com>
- Subject: Re: [[Illinois-ml-nlp-users] ] Wikifier Missing Files
- Date: Mon, 1 Aug 2016 16:03:10 -0400
Dear Dr. Lev Ratinov,
I am a UCSB graduate student. We are trying to use wikifier for our dataset and are trying to update some of the source files. The following files are missing and we are only able to partially understand how they are created. It would be great if you gave us hints on how to create these files. If you shared these files with us, that would be immensely helpful as well.
The files are listed in the order of importance.
"./WikiData/Index/CompleteWikipediaIndexVer2.2/";
"./WikiData/Index/SurfaceToTitleIdMap.txt";
"./WikiData/Index/LinkabilityScoresWithGoogleProb.txt";
"./WikiData/Index/TitleIdToSurfaceMap.txt";
categories.tokens.hist.txt
titletoken.hist.txt
WikiArticleWithTopicabilityAndTypes
SurfaceToTitleIdMap.txt
LinkabilityScoresWithGoogleProb.txt
Regards
Arvind
P. S. We analyzed what fields from the files are being used. Our analysis is below. We would greatly appreciate your help.
pathToCategoryKeywordCounts =
§ categories.tokens.hist.txt
§ Field1: count (integer)
§ Field2: Word (String)
§ used in :
§ addCategoryTokensNormalizationData
§
pathToAllTokensKeywordsCount =§ titletoken.hist.txt
o field1: count
o field2: tokens
§ used in addTokenInfoAndArticleCountInfo
pathToCompleteIndexOldVersion =§ completeWikipediaIndexVer2.2
§ used in AggregateData and BuildProtobufferIndices
§ should contains the following fields: (found in BuildProtobufferIndices)
o field: titleID
o field: titleAppearanceCount
o field: categoriesIDs
o field: categoriesTitles
o field: Text
o field: leftContext
o field: rightContext
o field: linkedFromIDs
o field: linkedFromTitles
o field: linkedToIDs
o field: linkedToTitles
o field: titleForm (found in BuildWikiTrainingDataFile)
pathTitleToSurfaceFormTextFile =§ TitleIdToSurfaceMap.txt ?? ( not sure about structure of fields)
o field: “TitleId” (String)
o field: “Surface” (String) [surface1 number1 surface2 number2 .. ]
o field: ConditionalSurfaceFormProb (Double) (avg(number1, number2, ..))
§ used in IndexSurfaceFormsData
pathToSurfaceFormInfoTextFile =
§ SurfaceToTitleIDMap.txt ?? ( not sure about structure of fields)
o field: surface form (String) [surface1 number1 surface2 number2 .. ]
o field: TitleId (integer)
o field: ConditionalTitleAppearance (Double) avg(number1, number2, ..))§ used in IndexSurfaceFormsData
pathToLinkabilityFile =§ LinkabilityScoreWithGoogleProb.txt
o Field1: surface form
o Field2: var27.nextToken (unused?)
o Field3: LinkedAppearanceCount (Integer)
o Field4: TotalAppearanceCount (Integer)
o Field5: LogProbOnWebGoogle (Double)
§ used in IndexSurfaceFormsData
- Re: [[Illinois-ml-nlp-users] ] Wikifier Missing Files, Lev Ratinov, 08/01/2016
- RE: [[Illinois-ml-nlp-users] ] Wikifier Missing Files, Sammons, Mark, 08/03/2016
Archive powered by MHonArc 2.6.19.