Finna training corpora

Dataset contains TF-IDF data matrices targeted for machine learning use. Matrices are generated from document corpora based on metadata that has been extracted from the Finna.fi service in 2019 via its open API. There are corpora in Finnish, Swedish and English.

Data resources

Additional Info

Collection Open Data
Maintainer CSC – IT Center For Science Ltd.
Maintainer email
  1. analytics@csc.fi
Links to additional information
  1. https://github.com/NatLibFi/Annif-corpora/tree/master/training/2019
Update frequency
Last modified 26.02.2021
Show change log
Created on 24.02.2021