Please note! Maintenance break in the Avoindata service on 16.04.2025 at 12:00-16:00. During the maintenance break, there may be disruptions in the use of the service. We apologise for any inconvenience caused by the disruption.

Data resources

Lisenssi

Creative Commons CCZero 1.0

Stats

Weekly visits for last 12 months

Download counts:: During last 30 days: 0; During last 12 months: 0; All time: 0
Page visits:: During last 30 days: 0; During last 12 months: 0; All time: 4

Download

Jyväskylän yliopiston opinnäytetöitä

Data matrix for training XMTC machine learning models (TF-IDF) with TNPP lemmatisation (fi)

Data matrix for training XMTC machine learning models (TF-IDF) with TNPP lemmatisation (Turku Neural Parser Pipeline). Based on a Finnish corpus. Textual data follows the Bag-of-Words feature file format of The Extreme Classification Repositoryn (http://manikvarma.org/downloads/XC/XMLRepository.html).

The first line is formatted as:

total_documents number_of_features number_of_labels

All other lines represent one document per line:

label1,label2,...,labelk ft1:ft1_val ft2:ft2_val ft3:ft3_val .. ftd:ftd_val

i.e, comma-separated list of labels followed by all non-zero components of the TF-IDF vector given as component_number:value.

Preview

There are no views created for this data resource yet.

Additional information

Format	TXT
File size	43872371
Data status	Current version
Temporal Coverage	01.01.2010 - 31.12.2017
Data last updated	24 February 2021
Metadata last updated	24 February 2021
Created	24 February 2021
SHA256	91e79f0bb2e19d294ca3f2052bdd29c0f0bb044f8d70e4e7c7ffecec1a7a4619