Divider
  Speech Technology and Research Laboratory
  People
  Current Research Activities
  Past Research Activities
  Publications
  Career Opportunities
  Seminars
  Technologies for License
  In the News
  Contact Us
  STAR Search
  Information and Computing Sciences Division
SpacerAbout UsDividerR and D DivisionsDividerCareersDividerNewsroomDividerContact UsDividerSRI HomeSpacer

Spacer
         
  SRI Logo

Search SRILM-USER Archives

Match: Format: Sort by:
Search:

Re: ngram manipulation

From: Andreas Stolcke <stolcke at ADDRESS HIDDEN>
Date: Thu, 08 Mar 2007 08:21:42 PST

There is a hack to do it.
Remove from your LM any ngrams involving the <s> or </s> token (without
changing the other probabilities nad backoff weights).
Then feed your ngrams to "ngram -debug 1 -ppl").  The "sentence"
log probabilities will now correspond to joint ngram probabilities,
since the initial word will back off to a unigram probability, and
the final </s> will count as an OOV and not contrinute to the total
log probability.

It would be easy to add an option somewhere to make this more convenient,
without the need to hack the LM itself.

--Andreas

In message <45F01ED0.2030305 at ADDRESS HIDDEN>you wrote:
> Hello SRILM users,
>
> I have a question on the use of srilm toolkit for LM manipulation.
>
> The language model in the arpa format gives conditional probabilities
> e.g  p(wd3|wd1, wd2)
> Can I compute the joint probability p(wd1, wd2, wd3)  using any utility.
>
> I have a heavy LM with (ngram 1=50002, ngram 2=29077135, ngram 3=40083381).
>
>
> Any help would be greatly appreciated.
> Thanks,
> joel.
>
>
> arpa format:
> p(wd3|wd1,wd2) = if(trigram exists)           p_3(wd1,wd2,wd3)
>                 else if(bigram w1,w2 exists) bo_wt_2(w1,w2)*p(wd3|wd2)
>                 else                         p(wd3|w2)
>
> p(wd2|wd1)= if(bigram exists) p_2(wd1,wd2)
>             else              bo_wt_1(wd1)*p_1(wd2)
>

Click here to go to the SRILM home page.

 

About Us  Vertical divider  R&D Divisions  Divider  Careers  Divider  Newsroom  Divider  Contact Us
©2006 SRI International, 333 Ravenswood Avenue, Menlo Park, CA 94025-3493
SRI International is an independent, nonprofit corporation. Privacy policy

Last modified Dec 02, 2008