kaldi识别输出latticedecode代码理解

更新时间: 2022-07-17 16:33:51 点击数:

背景

kaldi的识别输出和lattice有关，lattice经常翻译为晶格，它和路径有关系。锐英源软件在优化识别输出代码时，用到了lattice，因为本文和lattice有关，所以就翻译下，加深理解，也帮大家掌握kaldi。再次致谢kaldi开源团队。请看到客观点个关注，后面还有不少kalid文章发布，结合看好理解，谢谢。

正文

我们目前正在使用lattice-mbr-decode 来创建一个混淆网络，并且想知道lattice-mbr-decode 本身是否考虑了完整的lattice 文件，或者它本身会删除使用lattice-oracle 时考虑的某些替代方案。此外，与以某种方式解析香肠统计数据相比，是否有一种更简单的方法可以获得预言机假设的词级置信度分数？像lattice-to-ctm-conf这样只考虑最佳路径的东西？最后，我注意到lattice-to-ctm-conf 和lattice-mbr-decode 产生的时间之间的差异。时间产生非整数帧。这是由于平均帧数超过考虑的替代方案吗？

当我运行以下命令时：

/尝试将oracle路径保存在0000000006590022-VC787776.oraclelat
cat development_set.reference_transcription_kaldi_format | sed 's:::g' | scripts/sym2int.pl --ignore-first-field ./graph/words.txt | ~/halef-cassandra/kaldi-070615/src/latbin/lattice-oracle --write-lattices=ark,t:0000000006590022-VC787776.oraclelat --word-symbol-table=./graph/words.txt ark:./lats/0000000006590022-VC787776.lat ark:- ark,t:tmp.txt

~/halef-cassandra/kaldi-070615/src/latbin/lattice-to-ctm-conf --acoustic-scale=0.1 --decode-mbr=false ark:./lats/0000000006590022-VC787776.lat ark:0000000006590022-VC787776.oraclelat tmp.ctm

我得到一个空的 tmp.ctm 文件

当我尝试将 oracle 路径存储为 binary ark:0000000006590022-VC787776.oraclelat
我得到：

WARNING (lattice-to-ctm-conf:Read():util/kaldi-holder-inl.h:255) BasicVectorHolder::Read, could not interpret line: ▒▒~

此外，我无法以这种方式从与lattice-align-word-lexicon对齐的点阵中提取oraclepath

难道我做错了什么？

我认为我在lattice-to-ctm-conf 中为<1best-rspecifier> 使用了错误的格式。

您需要提供给lattice-to-ctm-conf 的是tmp.txt，即来自lattice-oracle 的。

这是我的输出示例。看起来它按预期工作。在最佳路径中置信度为 1 的每个单词也以置信度为 1 出现在预言机中。这是完美的。谢谢你。

/1best.ctm0000000006590022-VC787776 1 0.48 1.00 9758 0.480000000006590022-VC787776 1 1.49 0.39 22646 0.930000000006590022-VC787776 1 2.05 0.10 20757 0.560000000006590022-VC787776 1 2.15 0.30 9145 0.490000000006590022-VC787776 1 2.44 0.34 20886 0.930000000006590022-VC787776 1 2.78 0.41 20743 1.000000000006590022-VC787776 1 3.26 0.44 21627 1.000000000006590022-VC787776 1 4.22 0.48 21574 0.850000000006590022-VC787776 1 4.80 0.32 21574 0.88/oracle.ctm0000000006590022-VC787776 1 1.06 0.67 22714 0.010000000006590022-VC787776 1 1.82 0.21 20757 0.560000000006590022-VC787776 1 2.12 0.25 9145 0.490000000006590022-VC787776 1 2.44 0.34 20886 0.930000000006590022-VC787776 1 2.78 0.41 20743 1.000000000006590022-VC787776 1 3.26 0.44 21627 1.000000000006590022-VC787776 1 4.21 0.48 21574 0.840000000006590022-VC787776 1 4.79 0.32 21574 0.89

没关系，帮助和讨论之间没有真正的区别。

我们目前正在使用lattice-mbr-decode 来创建一个混淆网络，并且想知道lattice-mbr-decode 本身是否考虑了完整的lattice 文件，或者它本身会删除使用lattice-oracle 时考虑的某些替代方案。

lattice-mbr-decode 确实考虑了整个文件，但极低概率的路径对结果的影响很小。此外，一些使用lattice-mbr-decode的脚本可能会在它之前
加上lattice-prune，在这种情况下，不会考虑所有路径（但这几乎不会影响结果）。

此外，与以某种方式解析香肠统计数据相比，是否有一种更简单的方法可以获得预言机假设的词级置信度分数？像lattice-to-ctm-conf这样只考虑最佳路径的东西？

瓦西尔，你有时间帮忙吗？他要求的是可能的，但需要一点编码。在 sausages.cc 中，有一个以此开头的块：
// Now set R_ to one best in the FST.
可以使用 FST 参数向 MinimumBayesRisk类添加一个新构造函数，该参数表示您想要计算置信度的最佳路径。（注意：您通常希望将do_mbr_设置为 false 以防止更新 R_，但为了更大的灵活性，这可以是可选的），并且它将使用函数 GetLinearSymbolSequence计算整数向量 R_ 作为提供的 FST 参数中的单词序列
, 而不是作为最好的提供的晶格的路径。

可以修改lattice-to-ctm-conf 程序，因此它需要一个可选的第二个参数（即 3 个参数的用法），它将采用线性点阵向量（<1best-lattice-rspecifier>），在这种情况下它将使用另一个构造函数。我们应该警告用户，在 3-argument 形式中，将 --decode-mbr 设置为 false 通常是有意义的，否则它
只会使用提供的 1-best 作为优化的起点。

最后，我注意到lattice-to-ctm-conf 和lattice-mbr-decode 产生的时间之间的差异。时间产生非整数帧。这是由于平均帧数超过考虑的替代方案吗？

是的。这在论文“基于编辑距离递归的最小贝叶斯风险解码和系统组合”中进行了解释，Haihua Xu、Daniel Povey、Lidia Mangu 和 Jie Zhu，计算机语音和语言，2011 年。

只是对此的跟进：我意识到将输入设为线性 FST 是没有意义的，将其设为 std::vector 更方便。

主要英文

It's OK, there is no real difference between help and discuss.

we are currently using lattice-mbr-decode to create a confusion network and are wondering whether lattice-mbr-decode itself considers the full lattice file or itself prunes out certain alternatives which are considered when using lattice-oracle.

lattice-mbr-decode does consider the full file, but very
low-probability paths will make very little difference to the result.
Also, some of the scripts that use lattice-mbr-decode may precede it
with lattice-prune, in which case not all of the paths will be
considered (but this will hardly affect the results).

Furthermore, is there a way to easier way to get the word-level confidence scores for the oracle hypothesis than to somehow parse the sausage stats? Something like lattice-to-ctm-conf which only considers the best path?

Vassil, do you have time to help with this? What he is asking for is
possible but would require a little coding.
In sausages.cc, there is a block that starts with this:
// Now set R_ to one best in the FST.
It would be possible to add a new constructor to the MinimumBayesRisk
class with an FST argument representing the best path you want to
compute confidences against. (note: you would usually want to set
do_mbr_ to false to prevent updating R_, but this could be made
optional for greater flexibility), and it would compute the
integer-vector R_ as the word sequence in the provided FST argument
using the function GetLinearSymbolSequence, instead of as the best
path of the provided lattice. The program lattice-to-ctm-conf could
be modified so it takes an optional second argument (i.e. a 3-argument
usage) where it would take a vector of linear lattices
(<1best-lattice-rspecifier>), and in that case it would use this other
constructor. We should warn users that in the 3-argument form it will
usually make sense to set --decode-mbr to false, otherwise it will
simply use the provided 1-best as a starting point for optimization.

Lastly, I noticed a difference between the times produced by lattice-to-ctm-conf and lattice-mbr-decode. The times produces a non integer number of frames. Is this due to averaging the frame count over the considered alternatives?

yes, it is. It's explained in the paper,
"Minimum Bayes Risk decoding and system combination based on a
recursion for edit distance", Haihua Xu, Daniel Povey, Lidia Mangu and
Jie Zhu, Computer Speech and Language, 2011.