Divider
  Speech Technology and Research Laboratory
  People
  Current Research Activities
  Past Research Activities
  Publications
  Career Opportunities
  Seminars
  Technologies for License
  In the News
  Contact Us
  STAR Search
  Information and Computing Sciences Division
SpacerAbout UsDividerR and D DivisionsDividerCareersDividerNewsroomDividerContact UsDividerSRI HomeSpacer

Spacer
         
  SRI Logo

Search SRILM-USER Archives

Match: Format: Sort by:
Search:

Ney's absolute discounting and zeroton words

From: Tanel =?ISO-8859-1?Q?Alum=E4e?= <tanel.alumae at ADDRESS HIDDEN>
Date: Mon, 13 Jun 2005 18:20:09 +0300

Hello,

I continue my quest with zeroton words. I want to control the amount of
probability that is distributed upon words that are in the vocabulary
but are not in the training corpus. It seems that Ney's absolute
discounting is good for that.

So, I started experimenting with the constant for Ney's discounting.
Here are the unigram probability for an unseen word, for different
discounting factors:
0.1       -1.410174
0.01      -2.410174
0.001     -3.410148
0.0001    -4.410249
0.00001   -5.409665
0.000001  -1.278751
0.0000001 -1.278753

As you see, there is a abrupt increase in probability when the constant
gets to 0.000001, which is unexpected. Is this how it should be or
caused by some numerical problems? I'm using SRILM on 32-bit x86
processor.

The numbers here are given for a small test set but I've seen similar
behaviour for large sets.

Regards,

Tanel

Click here to go to the SRILM home page.

 

About Us  Vertical divider  R&D Divisions  Divider  Careers  Divider  Newsroom  Divider  Contact Us
©2006 SRI International, 333 Ravenswood Avenue, Menlo Park, CA 94025-3493
SRI International is an independent, nonprofit corporation. Privacy policy

Last modified Dec 02, 2008