cross.table {word.alignment} | R Documentation |
Constructing Cross Tables of the Source Language Words vs the Target Language Words of Sentence Pairs
Description
It is a function to create the cross tables of the source language words vs the target language words of sentence pairs as the gold standard or as the alignment matrix of another software. For the gold standard, the created cross table is filled by an expert. He/she sets '1' for Sure alignments and '2' for Possible alignments in cross between the source and the target words. For alignment results of another software, '1' in cross between each aligned source and target words is set by the user.
It works with two formats:
For the first format, it constructs a cross table of the source language words vs the target language words of a given sentence pair. Then, after filling as mentioned above sentence by sentence, it builds a list of cross tables and finally, it saves the created list as "file.align.RData".
In the second format, it creates an excel file with n
sheets. Each sheet includes a cross table of the two language words related each sentence pair. The file is as "file.align.xlsx". The created file to be filled as mentioned above.
Usage
cross.table( ...,
null.tokens = TRUE,
out.format = c('rdata','excel'),
file.align = 'alignment')
Arguments
... |
Further agguments to be passed to |
null.tokens |
logical. If |
out.format |
a character string including two options.For |
file.align |
the output file name. |
Value
an RData object as "file.align.RData" or an excel file as "file.align.xlsx".
Note
If you have not the non-ascii problem, you can set out.format
as 'rdata'
.
If ypu assign out.format
to 'excel'
, it is necessary to bring two notes into consideration. The first note is that in order to use the created excel file for evaluation
function, don't forget to use excel2rdata
function to convert the excel file into required R format. The second note focouses on this:
ocassionally, there is a problem with 'openxlsx' package which is used in the function and it might be solved by 'installr::install.rtools() on Windows'.
Author(s)
Neda Daneshgar and Majid Sarmad.
References
Holmqvist M., Ahrenberg L. (2011), "A Gold Standard for English-Swedish Word Alignment.", NODALIDA 2011 Conference Proceedings, 106 - 113.
Och F., Ney H.(2003), "A Systematic Comparison Of Various Statistical Alignment Models.", 2003 Association for Computational Linguistics, J03-1002, 29(1).
See Also
Examples
## Not run:
cross.table('http://www.um.ac.ir/~sarmad/word.a/euro.bg',
'http://www.um.ac.ir/~sarmad/word.a/euro.en',
n = 10, encode.sorc = 'UTF-8')
cross.table('http://www.um.ac.ir/~sarmad/word.a/euro.bg',
'http://www.um.ac.ir/~sarmad/word.a/euro.en',
n = 5, encode.sorc = 'UTF-8', out.format = 'excel')
## End(Not run)