Divider
  Speech Technology and Research Laboratory
  People
  Current Research Activities
  Past Research Activities
  Publications
  Career Opportunities
  Seminars
  Technologies for License
  In the News
  Contact Us
  STAR Search
  Information and Computing Sciences Division
SpacerAbout UsDividerR and D DivisionsDividerCareersDividerNewsroomDividerContact UsDividerSRI HomeSpacer

Spacer
         
  SRI Logo

Search SRILM-USER Archives

Match: Format: Sort by:
Search:

Language Model output problem using FLM

From: Antoine Ghaoui <Antoine.Ghaoui at ADDRESS HIDDEN>
Date: Thu, 15 Feb 2007 10:09:39 +0200

Hello,

I'm trying to use fngram-count to generate a Language Model based on  
Morphology.
I'm trying to generate a trigram model in order to be familiar with  
the tool.

The factor file is:

## word trigram
1
W : 2 W(-1) W(-2) ntextfile_99.flm.cnt ntextfile_99.flm.lm 3
W1W2    W2      kndiscount gtmin 1 interpolate
W1      W1      kndiscount gtmin 1 interpolate
0       0       kndiscount gtmin 1

The command line used is:
fngram-count -factor-file flm_spc.1 -text ntextfile_99.flm -lm  
ntextfile_99.flm.lm -vocab ntextfile.vocab.flm

The lm file generated is a little bit strange. A part of it is shown  
below:
\data\
ngram 0x0=18119
ngram 0x1=2855740
ngram 0x2=0
ngram 0x3=6490198

\0x0-grams:
-2.313375       </s>
-99     <s>
.
.
\0x1-grams:
-0.9892201      <s> W-LTN       -1.629908
.
.
\\0x2-grams:

\0x3-grams:
-0.9725394      <s> <s> W-LTN   -1.654503
.
.
\end\

Can you please help on this? Is it normal to have ngram 0x2=0? How  
can I get the old format?

Thanks for your help

Antoine

Click here to go to the SRILM home page.

 

About Us  Vertical divider  R&D Divisions  Divider  Careers  Divider  Newsroom  Divider  Contact Us
©2006 SRI International, 333 Ravenswood Avenue, Menlo Park, CA 94025-3493
SRI International is an independent, nonprofit corporation. Privacy policy

Last modified Dec 02, 2008