Divider
  Speech Technology and Research Laboratory
  People
  Current Research Activities
  Past Research Activities
  Publications
  Career Opportunities
  Seminars
  Technologies for License
  In the News
  Contact Us
  STAR Search
  Information and Computing Sciences Division
SpacerAbout UsDividerR and D DivisionsDividerCareersDividerNewsroomDividerContact UsDividerSRI HomeSpacer

Spacer
         
  SRI Logo

Search SRILM-USER Archives

Match: Format: Sort by:
Search:

Re: once occuring trigram discarded

From: Andreas Stolcke <stolcke>
Date: Mon, 31 Jan 2005 10:01:16 PST

In message <41FE6F03.5040103 at ADDRESS HIDDEN>you wrote:
> Hi,
> I made a trigram model using Kneser-Ney modified smoothing and
> interpolation and I don't understand why there are only 5828 trigrams in
> the model whereas there are 102520 trigrams in the corpus. I think that
> the trigrams discarded occur just once because there are 96692 trigrams
> occuring once which is the difference between the trigrams in the corpus
> and the trigram in the model. I tried to use other smoothing and even no
> smoothing but every time the trigrams are discarded.
> I don't understand why since the bigram occuring once (there are 58764
> of such bigrams) are not discarded in the bigram model I built using
> Kneser-Ney modified smoothing and interpolation.

The default cutoff for trigrams (and higher) is count 2.  
The default cutoff for unigrams and bigrams is count 1.

Use ngram-count -gt3min 1 to include all trigrams.

ngram-count -help displays the default values for all the options.

--Andreas

Click here to go to the SRILM home page.

 

About Us  Vertical divider  R&D Divisions  Divider  Careers  Divider  Newsroom  Divider  Contact Us
©2006 SRI International, 333 Ravenswood Avenue, Menlo Park, CA 94025-3493
SRI International is an independent, nonprofit corporation. Privacy policy

Last modified Dec 02, 2008