Opinnäytetyöt Tieto- ja viestintätekniikka, metadata
Thesis in Finnish metadata 2017-2022
This dataset has been gathered from the 478 bachelor‘s theses written for the Finnish-taught ICT (Information and Communication Technology) degree program which were published between 2017 and 2022. Most of the theses can be found in the joint database of Finnish Universities of Applied Sciences, Theseus (https://www.theseus.fi). The students were able to choose whether to write their thesis in Finnish or English (135 theses).
The data was extracted from the theses’ pdf files by a python script into two excel files, one for the ones written in Finnish, one for those in English, which were then cleaned and converted into csv and json format. While extracting the data, about 100 theses which’s metadata could not be read in full, have already been discarded, most for not following the template given, also for providing Finnish metadata for an English thesis or not mentioning the author‘s name. Rows marked as “restricted“ (8 in Fi, 2 in En) “does not follow the template” (2 in En) or “not in theseus“ (17 in Fi, 1 in En) have been removed by hand, as well as a few lines with obvious logical errors (e.g. more keyword appearances than words in the thesis). Dots at the end of keywords and spaces in the middle of words have been removed, minor typos have been corrected.
The word count includes only the thesis itself, neither abstract nor appendix. 10 pages were provided by the template given. The supervisor id‘s are matching those in the dataset for the English-taught ICT degree program.
The dataset contains the following fields:
Total References – Total number of references
Printed References – Number of printed references
Internet References – Number of references from the internet
Weak References – Number of weak references (wikipedia, reddit, blog, youtube)
Pages – Number of pages
Total Word Count – Number of words
Study Credits – Number of study credit at the moment of graduation
Study Entitlement Days – Length of study entitlement measured in days
Grade – Thesis grade (1-5, 1 is the lowest passing grade and 5 the highest)
The dataset contains the following fields:
Total References – Total number of references
Printed References – Number of printed references
Internet References – Number of references from the internet
Weak References – Number of weak references (wikipedia, reddit, blog, youtube)
Pages – Number of pages
Total Word Count – Number of words
Study Credits – Number of study credit at the moment of graduation
Study Entitlement Days – Length of study entitlement measured in days
Grade – Thesis grade (1-5, 1 is the lowest passing grade and 5 the highest)
Supervisor ID - Supervisors ID
Keywords – Keywords and the number of times they occur in the thesis
Total Occurrences – Total number of keyword occurrences
Theses produced as group work are evaluated separately and the metadata can appear multiple times in the dataset.
Preview
There are no views created for this data resource yet.
Additional information
Format | JSON |
---|---|
File size | 151593 |
Temporal Coverage | - |
Data last updated | 27 February 2024 |
Metadata last updated | 22 March 2024 |
Created | 27 February 2024 |
SHA256 | 7d84785f4fa810e32b2bf58bf71822214b46800a0fa777576158756d977f5320 |