Divider
  Speech Technology and Research Laboratory
  People
  Current Research Activities
  Past Research Activities
  Publications
  Career Opportunities
  Seminars
  Technologies for License
  In the News
  Contact Us
  STAR Search
  Information and Computing Sciences Division
SpacerAbout UsDividerR and D DivisionsDividerCareersDividerNewsroomDividerContact UsDividerSRI HomeSpacer

Spacer
         
  SRI Logo

Search SRILM-USER Archives

Match: Format: Sort by:
Search:

Re: Naive question about unknown words

From: "Anand Venkataraman (Roaming)" <anand at ADDRESS HIDDEN>
Date: Tue, 11 Oct 2005 09:19:19 -0700

Arnaud

When you created the language model, you specified that you wanted to
create an unknown word (placeholder for out-of-vocabulary items) with a
non-zero probability.  Since you didn't invoke ngram also with the -unk
option, it warns that you are using a supposedly closed vocabulary lm,
but that it has a non-zero prob for unk.  You can avoid it by specifying
-unk for ngram as well, or alternately, building a closed vocab lm to
start with (i.e. ngram-count without -unk).  Although you state that you
want to have a non-zero weight for unknown unigrams, I would recommend
that if at all possible, you predetermine the domain vocab and build a
closed vocab LM.

Regards

&

gaudinat wrote:
> Sorry for this naive question:
>
> I create my LM with this command:
> ngram-count  -text learningdb.txt -lm GT -unk
>
> I evaluate a sentence with the following command:
> ngram -lm GT -ppl sentence.txt
>
> I obtain coherent results but I get also the following warning message:
> "warning: non-zero probability for <unk> in closed-vocabulary LM"
>
> Can anyone give me some information about this warning and how to avoid it?
> Of course I need to give a weight for the unknown words.
>
> Thanks in advance,
>
> Arnaud.

Click here to go to the SRILM home page.

 

About Us  Vertical divider  R&D Divisions  Divider  Careers  Divider  Newsroom  Divider  Contact Us
©2006 SRI International, 333 Ravenswood Avenue, Menlo Park, CA 94025-3493
SRI International is an independent, nonprofit corporation. Privacy policy

Last modified Dec 02, 2008