Search SRILM-USER Archives
cross-entropy with OOV
From: "Sergey Protasov" <svp at ADDRESS HIDDEN>
Date: Fri, 2 Nov 2007 13:45:45 +0300
Dear experts,
I need to compute entropy with OOV words...
For example..
If we have dict_size diffrent words in training corpora
then for test corpora (per word)
entr2 = entr1 +
stats.numOOVs*log2(dict_size_train_corpora)/num_words_test_corpora
entr1 = log2(ppl1)
But in C++ code TextStats.cc I don't know how to get Dict_size_train_corpora
to compute this.
Dict_size_train_corpora = number_unigrams_train_corpora
Anybody help?
Thanx in advance!
Click here to go to the SRILM home page.
|