Divider
  Speech Technology and Research Laboratory
  People
  Current Research Activities
  Past Research Activities
  Publications
  Career Opportunities
  Seminars
  Technologies for License
  In the News
  Contact Us
  STAR Search
  Information and Computing Sciences Division
SpacerAbout UsDividerR and D DivisionsDividerCareersDividerNewsroomDividerContact UsDividerSRI HomeSpacer

Spacer
         
  SRI Logo

Search SRILM-USER Archives

Match: Format: Sort by:
Search:

Re: Perplexity calculation: Strange behavior

From: Stefan Hahn <hahn at ADDRESS HIDDEN>
Date: Thu, 1 Sep 2005 11:52:29 +0200

Hi again!

Your guess was perfectly right, I simply overlooked to specify the -order
option for perplexity calculation....

Thanks again,
Stefan

> In message <200508312031.45859.hahn at ADDRESS HIDDEN>you wrote:
> > Hi!
> >
> > During some language modeling using the SRI Toolkit (V.1.4.3 and V.1.4.5)
> > on i686 Intel GNU/Linux I encountered some strange behavior concerning
> > perplexit y
> > calculation:
> > For any order greater than 3, the perplexity calculated with ngram seems
> > to b e
> > fixed and wrong.
> > For example, I used Defoe's "Robinson Crusoe" to create modified
> > Kneser-Ney discounted Language Models for orders 1 up to 6 and calculated
> > the perplexity
> >
> > for the same text using "ngram" and our own software:
> >
> >         +------------------------+
> >         I      perplexity        I
> > +-------+-------------+----------+
> > I order | SRI-Toolkit I our Tool I
> > +-------+-------------+----------+
> > I   1   I   394.79    I 394.794  I
> > +-------+-------------+----------+
> > I   2   I   68.0706   I 68.071   I
> > +-------+-------------+----------+
> > I   3   I   54.29     I 54.2903  I
> > +-------+-------------+----------+
> > I   4   I   57.1554   I 52.6306  I
> > +-------+-------------+----------+
> > I   5   I   57.1554   I 52.6502  I
> > +-------+-------------+----------+
> > I   6   I   57.1554   I 52.7033  I
> > +-------+-------------+----------+
>
> I haven't looked at your script, but my guess is that you didn't specify
> the -order option when evaluating the LM.  The default is to only use
> up to trigram probabilities regardless of what is in the LM file.
> (That's for historical reasons.)  So of course you get same result for
> any LM order >=4 . Also, because of KN, you are getting a degradation
> relative to the trigram, as the lower-order probabilities are optimized
> to minimize the higher-order estimates.
>
> If this is not the case then we may have a bug, but I can assure you that
> we use order >= 4 all the time.
>
> --Andreas
>
> > The script I used to download "Robinson Crusoe", create the LMs and
> > SRI-results:
> >
> > wget "http://www-i6.informatik.rwth-aachen.de/~gollan/make-lm-01.sh"
> > chmod a+x make-lm-01.sh
> > ./make-lm-01.sh
> >
> > Is there any error in my script?
> > Thanks,
> >  Stefan

Click here to go to the SRILM home page.

 

About Us  Vertical divider  R&D Divisions  Divider  Careers  Divider  Newsroom  Divider  Contact Us
©2006 SRI International, 333 Ravenswood Avenue, Menlo Park, CA 94025-3493
SRI International is an independent, nonprofit corporation. Privacy policy

Last modified Dec 02, 2008