Please note! Maintenance break in the Avoindata service on 16.04.2025 at 12:00-16:00. During the maintenance break, there may be disruptions in the use of the service. We apologise for any inconvenience caused by the disruption.

Data resources

Lisenssi

Creative Commons CCZero 1.0

Stats

Weekly visits for last 12 months

Download counts:: During last 30 days: 0; During last 12 months: 0; All time: 0
Page visits:: During last 30 days: 0; During last 12 months: 2; All time: 8

Download

Finna-koulutuskorpukset

Data matrix for training XMTC machine learning models (TF-IDF) without lemmatisation (en)

Data matrix for training XMTC machine learning models (TF-IDF) without lemmatisation. Based on a English corpus. Textual data follows the Bag-of-Words feature file format of The Extreme Classification Repositoryn (http://manikvarma.org/downloads/XC/XMLRepository.html).

The first line is formatted as:

total_documents number_of_features number_of_labels

All other lines represent one document per line:

label1,label2,...,labelk ft1:ft1_val ft2:ft2_val ft3:ft3_val .. ftd:ftd_val

i.e, comma-separated list of labels followed by all non-zero components of the TF-IDF vector given as component_number:value.

Preview

There are no views created for this data resource yet.

Additional information

Format	TXT
File size	348097263
Data status	Current version
Temporal Coverage	-
Data last updated	24 February 2021
Metadata last updated	24 February 2021
Created	24 February 2021
SHA256	20b230dbde0e317775d21acbb0c802e863ab1e5fa61df0683fbc89c3ad3b3098