udpipe_download_model {udpipe} | R Documentation |
Download an UDPipe model provided by the UDPipe community for a specific language of choice
Description
Ready-made models for 65 languages trained on 101 treebanks from https://universaldependencies.org/ are provided to you.
Some of these models were provided by the UDPipe community. Other models were build using this R package.
You can either download these models manually in order to use it for annotation purposes
or use udpipe_download_model
to download these models for a specific language of choice. You have the following options:
Usage
udpipe_download_model(
language = c("afrikaans-afribooms", "ancient_greek-perseus", "ancient_greek-proiel",
"arabic-padt", "armenian-armtdp", "basque-bdt", "belarusian-hse", "bulgarian-btb",
"buryat-bdt", "catalan-ancora", "chinese-gsd", "chinese-gsdsimp",
"classical_chinese-kyoto", "coptic-scriptorium", "croatian-set", "czech-cac",
"czech-cltt", "czech-fictree", "czech-pdt", "danish-ddt", "dutch-alpino",
"dutch-lassysmall", "english-ewt", "english-gum", "english-lines", "english-partut",
"estonian-edt", "estonian-ewt", "finnish-ftb", "finnish-tdt", "french-gsd",
"french-partut", "french-sequoia", "french-spoken", "galician-ctg",
"galician-treegal", "german-gsd", "german-hdt", "gothic-proiel", "greek-gdt",
"hebrew-htb", "hindi-hdtb", "hungarian-szeged", "indonesian-gsd", "irish-idt",
"italian-isdt", "italian-partut", "italian-postwita", "italian-twittiro",
"italian-vit", "japanese-gsd", "kazakh-ktb", "korean-gsd", "korean-kaist",
"kurmanji-mg", "latin-ittb", "latin-perseus", "latin-proiel", "latvian-lvtb",
"lithuanian-alksnis", "lithuanian-hse", "maltese-mudt", "marathi-ufal",
"north_sami-giella", "norwegian-bokmaal", "norwegian-nynorsk",
"norwegian-nynorsklia", "old_church_slavonic-proiel", "old_french-srcmf",
"old_russian-torot", "persian-seraji", "polish-lfg", "polish-pdb", "polish-sz",
"portuguese-bosque", "portuguese-br", "portuguese-gsd", "romanian-nonstandard",
"romanian-rrt", "russian-gsd", "russian-syntagrus", "russian-taiga", "sanskrit-ufal",
"scottish_gaelic-arcosg", "serbian-set", "slovak-snk", "slovenian-ssj",
"slovenian-sst", "spanish-ancora", "spanish-gsd", "swedish-lines",
"swedish-talbanken", "tamil-ttb", "telugu-mtg", "turkish-imst", "ukrainian-iu",
"upper_sorbian-ufal", "urdu-udtb", "uyghur-udt", "vietnamese-vtb", "wolof-wtb"),
model_dir = getwd(),
udpipe_model_repo = c("jwijffels/udpipe.models.ud.2.5",
"jwijffels/udpipe.models.ud.2.4", "jwijffels/udpipe.models.ud.2.3",
"jwijffels/udpipe.models.ud.2.0", "jwijffels/udpipe.models.conll18.baseline",
"bnosac/udpipe.models.ud"),
overwrite = TRUE,
...
)
Arguments
language |
a character string with a Universal Dependencies treebank which was used to build the model. Possible values are: Each language should have a treebank extension (e.g. english-ewt, russian-syntagrus, dutch-alpino, ...). If you do not provide a treebank extension (e.g. only english, russian, dutch), the function will use the default treebank of that language as was used in Universal Dependencies up to version 2.1. |
model_dir |
a path where the model will be downloaded to. Defaults to the current working directory |
udpipe_model_repo |
location where the models will be downloaded from.
Either 'jwijffels/udpipe.models.ud.2.5', 'jwijffels/udpipe.models.ud.2.4', 'jwijffels/udpipe.models.ud.2.3', 'jwijffels/udpipe.models.ud.2.0', 'jwijffels/udpipe.models.conll18.baseline' or 'bnosac/udpipe.models.ud'.
See the Details section for further information on which languages are available in each of these repositories. |
overwrite |
logical indicating to overwrite the file if the file was already downloaded. Defaults to |
... |
currently not used |
Details
The function allows you to download the following language models based on your setting of argument udpipe_model_repo
:
'jwijffels/udpipe.models.ud.2.5': https://github.com/jwijffels/udpipe.models.ud.2.5
UDPipe models constructed on data from Universal Dependencies 2.5
languages-treebanks: afrikaans-afribooms, ancient_greek-perseus, ancient_greek-proiel, arabic-padt, armenian-armtdp, basque-bdt, belarusian-hse, bulgarian-btb, catalan-ancora, chinese-gsd, chinese-gsdsimp, classical_chinese-kyoto, coptic-scriptorium, croatian-set, czech-cac, czech-cltt, czech-fictree, czech-pdt, danish-ddt, dutch-alpino, dutch-lassysmall, english-ewt, english-gum, english-lines, english-partut, estonian-edt, estonian-ewt, finnish-ftb, finnish-tdt, french-gsd, french-partut, french-sequoia, french-spoken, galician-ctg, galician-treegal, german-gsd, german-hdt, gothic-proiel, greek-gdt, hebrew-htb, hindi-hdtb, hungarian-szeged, indonesian-gsd, irish-idt, italian-isdt, italian-partut, italian-postwita, italian-twittiro, italian-vit, japanese-gsd, korean-gsd, korean-kaist, latin-ittb, latin-perseus, latin-proiel, latvian-lvtb, lithuanian-alksnis, lithuanian-hse, maltese-mudt, marathi-ufal, north_sami-giella, norwegian-bokmaal, norwegian-nynorsk, norwegian-nynorsklia, old_church_slavonic-proiel, old_french-srcmf, old_russian-torot, persian-seraji, polish-lfg, polish-pdb, portuguese-bosque, portuguese-gsd, romanian-nonstandard, romanian-rrt, russian-gsd, russian-syntagrus, russian-taiga, scottish_gaelic-arcosg, serbian-set, slovak-snk, slovenian-ssj, slovenian-sst, spanish-ancora, spanish-gsd, swedish-lines, swedish-talbanken, tamil-ttb, telugu-mtg, turkish-imst, ukrainian-iu, urdu-udtb, uyghur-udt, vietnamese-vtb, wolof-wtb
license: CC-BY-SA-NC
'jwijffels/udpipe.models.ud.2.4': https://github.com/jwijffels/udpipe.models.ud.2.4
UDPipe models constructed on data from Universal Dependencies 2.4
languages-treebanks: afrikaans-afribooms, ancient_greek-perseus, ancient_greek-proiel, arabic-padt, armenian-armtdp, basque-bdt, belarusian-hse, bulgarian-btb, catalan-ancora, chinese-gsd, classical_chinese-kyoto, coptic-scriptorium, croatian-set, czech-cac, czech-cltt, czech-fictree, czech-pdt, danish-ddt, dutch-alpino, dutch-lassysmall, english-ewt, english-gum, english-lines, english-partut, estonian-edt, estonian-ewt, finnish-ftb, finnish-tdt, french-gsd, french-partut, french-sequoia, french-spoken, galician-ctg, galician-treegal, german-gsd, gothic-proiel, greek-gdt, hebrew-htb, hindi-hdtb, hungarian-szeged, indonesian-gsd, irish-idt, italian-isdt, italian-partut, italian-postwita, italian-vit, japanese-gsd, korean-gsd, korean-kaist, latin-ittb, latin-perseus, latin-proiel, latvian-lvtb, lithuanian-alksnis, lithuanian-hse, maltese-mudt, marathi-ufal, north_sami-giella, norwegian-bokmaal, norwegian-nynorsk, norwegian-nynorsklia, old_church_slavonic-proiel, old_french-srcmf, old_russian-torot, persian-seraji, polish-lfg, polish-pdb, portuguese-bosque, portuguese-gsd, romanian-nonstandard, romanian-rrt, russian-gsd, russian-syntagrus, russian-taiga, serbian-set, slovak-snk, slovenian-ssj, slovenian-sst, spanish-ancora, spanish-gsd, swedish-lines, swedish-talbanken, tamil-ttb, telugu-mtg, turkish-imst, ukrainian-iu, urdu-udtb, uyghur-udt, vietnamese-vtb, wolof-wtb
license: CC-BY-SA-NC
'jwijffels/udpipe.models.ud.2.3': https://github.com/jwijffels/udpipe.models.ud.2.3
UDPipe models constructed on data from Universal Dependencies 2.3
languages-treebanks: afrikaans-afribooms, ancient_greek-perseus, ancient_greek-proiel, arabic-padt, armenian-armtdp, basque-bdt, belarusian-hse, bulgarian-btb, catalan-ancora, chinese-gsd, coptic-scriptorium, croatian-set, czech-cac, czech-cltt, czech-fictree, czech-pdt, danish-ddt, dutch-alpino, dutch-lassysmall, english-ewt, english-gum, english-lines, english-partut, estonian-edt, finnish-ftb, finnish-tdt, french-gsd, french-partut, french-sequoia, french-spoken, galician-ctg, galician-treegal, german-gsd, gothic-proiel, greek-gdt, hebrew-htb, hindi-hdtb, hungarian-szeged, indonesian-gsd, irish-idt, italian-isdt, italian-partut, italian-postwita, japanese-gsd, korean-gsd, korean-kaist, latin-ittb, latin-perseus, latin-proiel, latvian-lvtb, lithuanian-hse, maltese-mudt, marathi-ufal, north_sami-giella, norwegian-bokmaal, norwegian-nynorsk, norwegian-nynorsklia, old_church_slavonic-proiel, old_french-srcmf, persian-seraji, polish-lfg, polish-sz, portuguese-bosque, portuguese-gsd, romanian-nonstandard, romanian-rrt, russian-gsd, russian-syntagrus, russian-taiga, serbian-set, slovak-snk, slovenian-ssj, slovenian-sst, spanish-ancora, spanish-gsd, swedish-lines, swedish-talbanken, tamil-ttb, telugu-mtg, turkish-imst, ukrainian-iu, urdu-udtb, uyghur-udt, vietnamese-vtb
license: CC-BY-SA-NC
'jwijffels/udpipe.models.ud.2.0': https://github.com/jwijffels/udpipe.models.ud.2.0
UDPipe models constructed on data from Universal Dependencies 2.0
languages-treebanks: ancient_greek-proiel, ancient_greek, arabic, basque, belarusian, bulgarian, catalan, chinese, coptic, croatian, czech-cac, czech-cltt, czech, danish, dutch-lassysmall, dutch, english-lines, english-partut, english, estonian, finnish-ftb, finnish, french-partut, french-sequoia, french, galician-treegal, galician, german, gothic, greek, hebrew, hindi, hungarian, indonesian, irish, italian, japanese, kazakh, korean, latin-ittb, latin-proiel, latin, latvian, lithuanian, norwegian-bokmaal, norwegian-nynorsk, old_church_slavonic, persian, polish, portuguese-br, portuguese, romanian, russian-syntagrus, russian, sanskrit, slovak, slovenian-sst, slovenian, spanish-ancora, spanish, swedish-lines, swedish, tamil, turkish, ukrainian, urdu, uyghur, vietnamese
license: CC-BY-SA-NC
'jwijffels/udpipe.models.conll18.baseline': https://github.com/jwijffels/udpipe.models.conll18.baseline
UDPipe models constructed on data from Universal Dependencies 2.2
languages-treebanks: afrikaans-afribooms, ancient_greek-perseus, ancient_greek-proiel, arabic-padt, armenian-armtdp, basque-bdt, bulgarian-btb, buryat-bdt, catalan-ancora, chinese-gsd, croatian-set, czech-cac, czech-fictree, czech-pdt, danish-ddt, dutch-alpino, dutch-lassysmall, english-ewt, english-gum, english-lines, estonian-edt, finnish-ftb, finnish-tdt, french-gsd, french-sequoia, french-spoken, galician-ctg, galician-treegal, german-gsd, gothic-proiel, greek-gdt, hebrew-htb, hindi-hdtb, hungarian-szeged, indonesian-gsd, irish-idt, italian-isdt, italian-postwita, japanese-gsd, kazakh-ktb, korean-gsd, korean-kaist, kurmanji-mg, latin-ittb, latin-perseus, latin-proiel, latvian-lvtb, mixed, north_sami-giella, norwegian-bokmaal, norwegian-nynorsk, norwegian-nynorsklia, old_church_slavonic-proiel, old_french-srcmf, persian-seraji, polish-lfg, polish-sz, portuguese-bosque, romanian-rrt, russian-syntagrus, russian-taiga, serbian-set, slovak-snk, slovenian-ssj, slovenian-sst, spanish-ancora, swedish-lines, swedish-talbanken, turkish-imst, ukrainian-iu, upper_sorbian-ufal, urdu-udtb, uyghur-udt, vietnamese-vtb
license: CC-BY-SA-NC
'bnosac/udpipe.models.ud': https://github.com/bnosac/udpipe.models.ud
UDPipe models constructed on data from Universal Dependencies 2.1
This repository contains models build with this R package on open data from Universal Dependencies 2.1 which allows for commercial usage. The license of these models is mostly CC-BY-SA. Visit that github repository for details on the licenses of the language of your choice. And contact www.bnosac.be if you need support on these models or require models tuned to your needs.
languages-treebanks: afrikaans, croatian, czech-cac, dutch, english, finnish, french-sequoia, irish, norwegian-bokmaal, persian, polish, portuguese, romanian, serbian, slovak, spanish-ancora, swedish
license: license is treebank-specific but mainly CC-BY-SA and GPL-3 and LGPL-LR
If you need to train models yourself for commercial purposes or if you want to improve models, you can easily do this with
udpipe_train
which is explained in detail in the package vignette.
Note that when you download these models, you comply to the license of your specific language model.
Value
A data.frame with 1 row and the following columns:
language: The language as provided by the input parameter
language
file_model: The path to the file on disk where the model was downloaded to
url: The URL where the model was downloaded from
download_failed: A logical indicating if the download has failed or not due to internet connectivity issues
download_message: A character string with the error message in case the downloading of the model failed
References
https://ufal.mff.cuni.cz/udpipe, https://github.com/jwijffels/udpipe.models.ud.2.5, https://github.com/jwijffels/udpipe.models.ud.2.4, https://github.com/jwijffels/udpipe.models.ud.2.3, https://github.com/jwijffels/udpipe.models.conll18.baseline https://github.com/jwijffels/udpipe.models.ud.2.0, https://github.com/bnosac/udpipe.models.ud
See Also
Examples
## Not run:
x <- udpipe_download_model(language = "dutch-alpino")
x <- udpipe_download_model(language = "dutch-lassysmall")
x <- udpipe_download_model(language = "russian")
x <- udpipe_download_model(language = "french")
x <- udpipe_download_model(language = "english-partut")
x <- udpipe_download_model(language = "english-ewt")
x <- udpipe_download_model(language = "german-gsd")
x <- udpipe_download_model(language = "spanish-gsd")
x <- udpipe_download_model(language = "spanish-gsd", overwrite = FALSE)
x <- udpipe_download_model(language = "dutch-alpino",
udpipe_model_repo = "jwijffels/udpipe.models.ud.2.5")
x <- udpipe_download_model(language = "dutch-alpino",
udpipe_model_repo = "jwijffels/udpipe.models.ud.2.4")
x <- udpipe_download_model(language = "dutch-alpino",
udpipe_model_repo = "jwijffels/udpipe.models.ud.2.3")
x <- udpipe_download_model(language = "dutch-alpino",
udpipe_model_repo = "jwijffels/udpipe.models.ud.2.0")
x <- udpipe_download_model(language = "english", udpipe_model_repo = "bnosac/udpipe.models.ud")
x <- udpipe_download_model(language = "dutch", udpipe_model_repo = "bnosac/udpipe.models.ud")
x <- udpipe_download_model(language = "afrikaans", udpipe_model_repo = "bnosac/udpipe.models.ud")
x <- udpipe_download_model(language = "spanish-ancora",
udpipe_model_repo = "bnosac/udpipe.models.ud")
x <- udpipe_download_model(language = "dutch-ud-2.1-20180111.udpipe",
udpipe_model_repo = "bnosac/udpipe.models.ud")
x <- udpipe_download_model(language = "english",
udpipe_model_repo = "jwijffels/udpipe.models.conll18.baseline")
## End(Not run)
x <- udpipe_download_model(language = "sanskrit",
udpipe_model_repo = "jwijffels/udpipe.models.ud.2.0",
model_dir = tempdir())
x
## cleanup for CRAN
if(file.exists(x$file_model)) file.remove(x$file_model)