The development team is on vacation from June 25th to July 27th. Service maintenance and possible bug fixes will also be on summer break. General advice on the services of the Digital and Population Data Agency: organisaatiopalvelut@dvv.fi.

Have a nice summer!

Data resources

Lisenssi

Creative Commons CCZero 1.0

Stats

Weekly visits for last 12 months

Download counts:: During last 30 days: 0; During last 12 months: 0; All time: 0
Page visits:: During last 30 days: 0; During last 12 months: 0; All time: 3

Download

Finna-koulutuskorpukset

Data matrix for training XMTC machine learning models (TF-IDF) with Snowball lemmatisation (sv)

Data matrix for training XMTC machine learning models (TF-IDF) with Snowball lemmatisation. Based on Swedish corpus. Textual data follows the Bag-of-Words feature file format of The Extreme Classification Repositoryn (http://manikvarma.org/downloads/XC/XMLRepository.html).

The first line is formatted as:

total_documents number_of_features number_of_labels

All other lines represent one document per line:

label1,label2,...,labelk ft1:ft1_val ft2:ft2_val ft3:ft3_val .. ftd:ftd_val

i.e, comma-separated list of labels followed by all non-zero components of the TF-IDF vector given as component_number:value.

Preview

There are no views created for this data resource yet.

Additional information

Format	TXT
File size	108998787
Data status	Current version
Temporal Coverage	-
Data last updated	24 February 2021
Metadata last updated	16 April 2025
Created	24 February 2021
SHA256	6057f52a303c618afed0145ce0cd04171dcbdcb82d16e09861a1b8c89ce6b2d6