udpipe_download_model {udpipe}R Documentation

Download an UDPipe model provided by the UDPipe community for a specific language of choice

Description

Ready-made models for 65 languages trained on 101 treebanks from https://universaldependencies.org/ are provided to you. Some of these models were provided by the UDPipe community. Other models were build using this R package. You can either download these models manually in order to use it for annotation purposes or use udpipe_download_model to download these models for a specific language of choice. You have the following options:

Usage

udpipe_download_model(
  language = c("afrikaans-afribooms", "ancient_greek-perseus", "ancient_greek-proiel",
    "arabic-padt", "armenian-armtdp", "basque-bdt", "belarusian-hse", "bulgarian-btb",
    "buryat-bdt", "catalan-ancora", "chinese-gsd", "chinese-gsdsimp",
    "classical_chinese-kyoto", "coptic-scriptorium", "croatian-set", "czech-cac",
    "czech-cltt", "czech-fictree", "czech-pdt", "danish-ddt", "dutch-alpino",
    "dutch-lassysmall", "english-ewt", "english-gum", "english-lines", "english-partut",
    "estonian-edt", "estonian-ewt", "finnish-ftb",      "finnish-tdt", "french-gsd",
    "french-partut", "french-sequoia", "french-spoken", "galician-ctg",
    "galician-treegal", "german-gsd", "german-hdt", "gothic-proiel", "greek-gdt",
    "hebrew-htb", "hindi-hdtb", "hungarian-szeged", "indonesian-gsd", "irish-idt",
    "italian-isdt", "italian-partut", "italian-postwita", "italian-twittiro",
    "italian-vit", "japanese-gsd", "kazakh-ktb", "korean-gsd", "korean-kaist",
    "kurmanji-mg", "latin-ittb", "latin-perseus", "latin-proiel", "latvian-lvtb",
    "lithuanian-alksnis",      "lithuanian-hse", "maltese-mudt", "marathi-ufal",
    "north_sami-giella", "norwegian-bokmaal", "norwegian-nynorsk",
    "norwegian-nynorsklia", "old_church_slavonic-proiel", "old_french-srcmf",
    "old_russian-torot", "persian-seraji", "polish-lfg", "polish-pdb", "polish-sz",
    "portuguese-bosque", "portuguese-br", "portuguese-gsd", "romanian-nonstandard",
    "romanian-rrt", "russian-gsd", "russian-syntagrus", "russian-taiga", "sanskrit-ufal",
    "scottish_gaelic-arcosg", "serbian-set", "slovak-snk", "slovenian-ssj",     
    "slovenian-sst", "spanish-ancora", "spanish-gsd", "swedish-lines",
    "swedish-talbanken", "tamil-ttb", "telugu-mtg", "turkish-imst", "ukrainian-iu",
    "upper_sorbian-ufal", "urdu-udtb", "uyghur-udt", "vietnamese-vtb", "wolof-wtb"),
  model_dir = getwd(),
  udpipe_model_repo = c("jwijffels/udpipe.models.ud.2.5",
    "jwijffels/udpipe.models.ud.2.4", "jwijffels/udpipe.models.ud.2.3",
    "jwijffels/udpipe.models.ud.2.0", "jwijffels/udpipe.models.conll18.baseline",
    "bnosac/udpipe.models.ud"),
  overwrite = TRUE,
  ...
)

Arguments

language

a character string with a Universal Dependencies treebank which was used to build the model. Possible values are:
afrikaans-afribooms, ancient_greek-perseus, ancient_greek-proiel, arabic-padt, armenian-armtdp, basque-bdt, belarusian-hse, bulgarian-btb, buryat-bdt, catalan-ancora, chinese-gsd, chinese-gsdsimp, coptic-scriptorium, croatian-set, czech-cac, czech-cltt, czech-fictree, czech-pdt, danish-ddt, dutch-alpino, dutch-lassysmall, english-ewt, english-gum, english-lines, english-partut, estonian-edt, finnish-ftb, finnish-tdt, french-gsd, french-partut, french-sequoia, french-spoken, galician-ctg, galician-treegal, german-gsd, german-hdt, gothic-proiel, greek-gdt, hebrew-htb, hindi-hdtb, hungarian-szeged, indonesian-gsd, irish-idt, italian-isdt, italian-partut, italian-postwita, italian-twittiro, japanese-gsd, kazakh-ktb, korean-gsd, korean-kaist, kurmanji-mg, latin-ittb, latin-perseus, latin-proiel, latvian-lvtb, lithuanian-hse, maltese-mudt, marathi-ufal, north_sami-giella, norwegian-bokmaal, norwegian-nynorsk, norwegian-nynorsklia, old_church_slavonic-proiel, old_french-srcmf, persian-seraji, polish-lfg, polish-sz, portuguese-bosque, portuguese-br, portuguese-gsd, romanian-nonstandard, romanian-rrt, russian-gsd, russian-syntagrus, russian-taiga, sanskrit-ufal, scottish_gaelic-arcosg, serbian-set, slovak-snk, slovenian-ssj, slovenian-sst, spanish-ancora, spanish-gsd, swedish-lines, swedish-talbanken, tamil-ttb, telugu-mtg, turkish-imst, ukrainian-iu, upper_sorbian-ufal, urdu-udtb, uyghur-udt, vietnamese-vtb

Each language should have a treebank extension (e.g. english-ewt, russian-syntagrus, dutch-alpino, ...). If you do not provide a treebank extension (e.g. only english, russian, dutch), the function will use the default treebank of that language as was used in Universal Dependencies up to version 2.1.

model_dir

a path where the model will be downloaded to. Defaults to the current working directory

udpipe_model_repo

location where the models will be downloaded from. Either 'jwijffels/udpipe.models.ud.2.5', 'jwijffels/udpipe.models.ud.2.4', 'jwijffels/udpipe.models.ud.2.3', 'jwijffels/udpipe.models.ud.2.0', 'jwijffels/udpipe.models.conll18.baseline' or 'bnosac/udpipe.models.ud'.
Defaults to 'jwijffels/udpipe.models.ud.2.5'.

  • 'bnosac/udpipe.models.ud' contains models mainly released under the CC-BY-SA license constructed on Universal Dependencies 2.1 data, and some models released under the GPL-3 and LGPL-LR license

  • 'jwijffels/udpipe.models.ud.2.5' contains models released under the CC-BY-NC-SA license constructed on Universal Dependencies 2.5 data

  • 'jwijffels/udpipe.models.ud.2.4' contains models released under the CC-BY-NC-SA license constructed on Universal Dependencies 2.4 data

  • 'jwijffels/udpipe.models.ud.2.3' contains models released under the CC-BY-NC-SA license constructed on Universal Dependencies 2.3 data

  • 'jwijffels/udpipe.models.ud.2.0' contains models released under the CC-BY-NC-SA license constructed on Universal Dependencies 2.0 data

  • 'jwijffels/udpipe.models.conll18.baseline' contains models released under the CC-BY-NC-SA license constructed on Universal Dependencies 2.2 data for the 2018 conll shared task

See the Details section for further information on which languages are available in each of these repositories.

overwrite

logical indicating to overwrite the file if the file was already downloaded. Defaults to TRUE indicating it will download the model and overwrite the file if the file already existed. If set to FALSE, the model will only be downloaded if it does not exist on disk yet in the model_dir folder.

...

currently not used

Details

The function allows you to download the following language models based on your setting of argument udpipe_model_repo:

Note that when you download these models, you comply to the license of your specific language model.

Value

A data.frame with 1 row and the following columns:

References

https://ufal.mff.cuni.cz/udpipe, https://github.com/jwijffels/udpipe.models.ud.2.5, https://github.com/jwijffels/udpipe.models.ud.2.4, https://github.com/jwijffels/udpipe.models.ud.2.3, https://github.com/jwijffels/udpipe.models.conll18.baseline https://github.com/jwijffels/udpipe.models.ud.2.0, https://github.com/bnosac/udpipe.models.ud

See Also

udpipe_load_model

Examples

## Not run: 
x <- udpipe_download_model(language = "dutch-alpino")
x <- udpipe_download_model(language = "dutch-lassysmall")
x <- udpipe_download_model(language = "russian")
x <- udpipe_download_model(language = "french")
x <- udpipe_download_model(language = "english-partut")
x <- udpipe_download_model(language = "english-ewt")
x <- udpipe_download_model(language = "german-gsd")
x <- udpipe_download_model(language = "spanish-gsd")
x <- udpipe_download_model(language = "spanish-gsd", overwrite = FALSE)

x <- udpipe_download_model(language = "dutch-alpino", 
                           udpipe_model_repo = "jwijffels/udpipe.models.ud.2.5")
x <- udpipe_download_model(language = "dutch-alpino", 
                           udpipe_model_repo = "jwijffels/udpipe.models.ud.2.4")
x <- udpipe_download_model(language = "dutch-alpino", 
                           udpipe_model_repo = "jwijffels/udpipe.models.ud.2.3")
x <- udpipe_download_model(language = "dutch-alpino", 
                           udpipe_model_repo = "jwijffels/udpipe.models.ud.2.0")
x <- udpipe_download_model(language = "english", udpipe_model_repo = "bnosac/udpipe.models.ud")
x <- udpipe_download_model(language = "dutch", udpipe_model_repo = "bnosac/udpipe.models.ud")
x <- udpipe_download_model(language = "afrikaans", udpipe_model_repo = "bnosac/udpipe.models.ud")
x <- udpipe_download_model(language = "spanish-ancora", 
                           udpipe_model_repo = "bnosac/udpipe.models.ud")
x <- udpipe_download_model(language = "dutch-ud-2.1-20180111.udpipe", 
                           udpipe_model_repo = "bnosac/udpipe.models.ud")                           
x <- udpipe_download_model(language = "english", 
                           udpipe_model_repo = "jwijffels/udpipe.models.conll18.baseline")

## End(Not run)

x <- udpipe_download_model(language = "sanskrit", 
                           udpipe_model_repo = "jwijffels/udpipe.models.ud.2.0", 
                           model_dir = tempdir())
x
## cleanup for CRAN
if(file.exists(x$file_model)) file.remove(x$file_model)

[Package udpipe version 0.8.11 Index]