Title: | Dataset and Tools to Research the Riddle of Literary Quality |
---|---|
Description: | Dataset and functions to explore quality of literary novels. The package is a part of the Riddle of Literary Quality project, and it contains the data of a reader survey about fiction in Dutch, a description of the novels the readers rated, and the results of stylistic measurements of the novels. The package also contains functions to combine, analyze, and visualize these data. For more details, see: Eder M, van Zundert J, Lensink S, van Dalen-Oskam K (2022). Replicating The Riddle of Literary Quality: The litRiddle package for R. In _Digital Humanities 2022: Conference Abstracts_, 636-637. |
Authors: | Maciej Eder [aut, cre], Joris van Zundert [aut], Karina van Dalen-Oskam [aut], Saskia Lensink [aut] |
Maintainer: | Maciej Eder <[email protected]> |
License: | GPL (>= 3) |
Version: | 1.0.0 |
Built: | 2025-01-09 05:26:29 UTC |
Source: | https://github.com/cran/litRiddle |
The package contains the data of a reader survey about fiction in Dutch, a description of the novels the readers rated, and the results of stylistic measurements of the novels. The package also contains functions to combine, analyze, and visualize these data.
We will be grateful if you cite the package in your publications. To get the updated citation information please type: citation("litRiddle")
.
The package litRiddle presents data generated in the project The Riddle of Literary Quality (2012–2019) in which a team of digital humanists aimed to find out if books that readers considered to be highly literary have a different set of values for stylistic features than books the same readers did not consider to be very literary.
The package contains five data sets:
The reviews gathered from a hired representative panel of citizens of the Netherlands and in a large online survey called The National Reader Survey (2013). Type help(reviews) for details.
The motivations that reviewers give for a subset or all of their ratings are provided as plain text and as POS tagged data. Type help(motivations) for details.
Data about the reviewers: age, gender, zipcode, average number of books read per year etc. Type help(respondents) for details.
A list of the 401 books that the survey respondents evaluated with metadata such as author, title, publisher, gender of the author, and for translations the original language, etc., as well as a number of stylometric measurements such as the average sentence lengh etc. Type help(books) for details.
For each of the 401 books, the relative frequencies of 5000 most frequent words are provided (due to copyright issues the books themselves cannot be made available). Type help(frequencies) for details.
To learn more about the functions provided to analyze the above datasets, type the function explain()
in your terminal.
Maciej Eder, Joris van Zundert, Karina van Dalen-Oskam, Saskia Lensink
Information in Dutch about the package can be found at https://karinavdo.github.io/RaadselLiteratuur/02_07_data_en_R_package.html
Information in English at https://github.com/karinavdo/LitRiddleData/blob/master/README.md
Karina van Dalen-Oskam (2023). The Riddle of Literary Quality: A Computational Approach. Amsterdam University Press.
Karina van Dalen-Oskam (2021). Het raadsel literatuur. Is literaire kwaliteit meetbaar? Amsterdam University Press.
Maciej Eder, Saskia Lensink, Joris van Zundert, Karina van Dalen-Oskam (2022). Replicating The Riddle of Literary Quality: The litRiddle package for R, in: Digital Humanities 2022 Conference Abstracts. The University of Tokyo, Japan, 25–29 July 2022, p. 636-637 https://dh2022.dhii.asia/dh2022bookofabsts.pdf
Karina van Dalen-Oskam (2015). The Riddle of Literary Quality. Op zoek naar conventies van literariteit. "Vooys: tijdschrift voor letteren" 32(3): 25-33, https://literaryquality.huygens.knaw.nl/?p=537#more-537
Corina Koolen, Karina van Dalen-Oskam, Andreas van Cranenburgh, Erica Nagelhout (2020). Literary quality in the eye of the Dutch reader: The National Reader Survey. "Poetics" 79: 101439, doi:10.1016/j.poetic.2020.101439
More publications from the project: see https://literaryquality.huygens.knaw.nl/?page_id=588
books
, reviews
, respondents
, explain
, make.table
Measurements (including word count, number of sentences, number of paragraphs, average sentence length, etc.) of 401 novels in Dutch.
data(books)
data(books)
This is a dataframe containing numerical, ordinal and lexical data
(as well as metadata) for 401 novels. To see which variables are
provided, type get.columns()
. To learn more about what
the column names really mean, type explain("books")
.
Karina van Dalen-Oskam, Joris van Zundert
The dataset is a part of The Riddle of Literary Quality Project.
get.columns
, explain
, reviews
,
respondents
, frequencies
, motivations
data(books) print(books) summary(books)
data(books) print(books) summary(books)
Function to combine all information of the survey, reviews, and books into one big dataframe. The user can specify whether or not they want to also load the freqTable
with the frequency counts of the word n-grams of the books.
combine.all(load.freq.table = FALSE)
combine.all(load.freq.table = FALSE)
load.freq.table |
specify whether or not you want to add the |
In order to identify (possible) correlations between particular reviews (e.g. the scores by the reviewers) with metadata about the reviewers themselves, it is usually required, or at least convenient, to combine two or more datasets into one large table.
A data frame combining the two (optionally three) datasets: books
,
respondents
, and reviews
.
Saskia Lensink, Maciej Eder
https://literaryquality.huygens.knaw.nl/
reviews
, respondents
, motivations
, books
# combine and load all data from the books, respondents and reviews into # a new dataframe (tibble format) combine.all(load.freq.table = FALSE) # combine and load all data from the books, respondents and reviews into # a new dataframe (tibble format), and additionally also load the frequency # table of all word 1grams of the corpus used. combine.all(load.freq.table = TRUE)
# combine and load all data from the books, respondents and reviews into # a new dataframe (tibble format) combine.all(load.freq.table = FALSE) # combine and load all data from the books, respondents and reviews into # a new dataframe (tibble format), and additionally also load the frequency # table of all word 1grams of the corpus used. combine.all(load.freq.table = TRUE)
Function that lists a short explanation of what the different column names refer to and what their levels consist of.
explain(dataset = "")
explain(dataset = "")
dataset |
specify whether or not you want to add the |
In the current version, the option dataset = TRUE
is not fully
implemented.
A character vector being a description of the dataset.
Saskia Lensink, Maciej Eder
https://literaryquality.huygens.knaw.nl/
reviews
, respondents
, motivations
, books
explain("books") explain("reviews") explain("respondents")
explain("books") explain("reviews") explain("respondents")
Return the name of the dataset where a column can be found.
find.dataset(name = NULL)
find.dataset(name = NULL)
name |
specify the name of the variable you want to find. |
The function returns the name of the data table containing a given column name.
A character vector containing names of relevant datasets.
Saskia Lensink, Maciej Eder
https://literaryquality.huygens.knaw.nl/
reviews
, respondents
, motivations
, books
find.dataset("book.id") find.dataset("age.resp")
find.dataset("book.id") find.dataset("age.resp")
Word frequencies (5000 most frequent words) of 401 novels in Dutch.
data(frequencies)
data(frequencies)
This is a dataframe containing numerical values for word frequencies
of the 5000 most frequent words (in a descending order of frequency)
of 401 literary novels in Dutch. The table contains relative frequencies,
meaning that the original word occurencies from a book were divided
by the total number of words of the book in question. The measurments
were obtained using the R package stylo
, and were later rounded
to the 5th digit. To learn more
about the novels themselves, type help(books)
.
Karina van Dalen-Oskam, Maciej Eder
The dataset is a part of The Riddle of Literary Quality Project.
get.columns
, explain
, books
,
reviews
, respondents
, motivations
data(frequencies) print(frequencies) summary(frequencies)
data(frequencies) print(frequencies) summary(frequencies)
The function creates a list of all the column names from all three datasets, i.d. reviews
, respondents
, books
.
get.columns()
get.columns()
This simple function works best when combined with explain
,
which provides a detailed description of particular variables. Type help(explain)
for more details.
A list with three elements: books
, respondents
, and reviews
, each containing the names of supported variables.
Saskia Lensink, Maciej Eder
https://literaryquality.huygens.knaw.nl/
reviews
, respondents
, books
, motivations
, explain
A function to make a table of frequency counts for one variable, and to plot a histogram of the results.
make.table(table.of = NULL, plot = TRUE, xlab = table.of, ylab = "count", title = table.of, barcolor = "grey", barfill = "darkgrey")
make.table(table.of = NULL, plot = TRUE, xlab = table.of, ylab = "count", title = table.of, barcolor = "grey", barfill = "darkgrey")
table.of |
which variable will be chosen? If not sure what variables are there, try typing |
plot |
do you want a plot to be plotted? Default: |
xlab |
name of the X axis |
ylab |
name of the Y axis |
title |
title of the plot |
barcolor |
outline color of the content |
barfill |
color used to fill the bars |
A basic way to show the distribution of an indicated variable from
the litRiddle
package. It provides the values, but also
a simple histrogram.
A character vector containing one chosen variable, optionally followed by a plot.
Saskia Lensink, Maciej Eder
https://literaryquality.huygens.knaw.nl/
make.table(table.of = "age.resp") make.table(table.of = "age.resp", xlab = "age respondent", ylab = "number of people", title = "Distribution of respondent age", barcolor = "red", barfill = "white")
make.table(table.of = "age.resp") make.table(table.of = "age.resp", xlab = "age respondent", ylab = "number of people", title = "Distribution of respondent age", barcolor = "red", barfill = "white")
A function to make a table of frequency counts for two variables, and to plot a histogram of the results.
make.table2(table.of = NULL, split = NULL, plot = TRUE, xlab = table.of, ylab = "counts", title = table.of, barcolor = "grey", barfill = "darkgrey")
make.table2(table.of = NULL, split = NULL, plot = TRUE, xlab = table.of, ylab = "counts", title = table.of, barcolor = "grey", barfill = "darkgrey")
table.of |
which variable will be chosen? If not sure what variables are there, try typing |
split |
the variable that will be used to split the data: see the Examples section below for, well, some examples. |
plot |
do you want a plot to be plotted? Default: |
xlab |
name of the X axis |
ylab |
name of the Y axis |
title |
title of the plot |
barcolor |
outline color of the content |
barfill |
color used to fill the bars |
Unlike make.table
, this function provides a comparison
of two variables at a time, or to be more precise: a distribution of
an indicated variable when subdivided into two or more groups.
The function provides the values themselves, but also a final histrogram.
A character vector containing one chosen variable, optionally followed by a plot.
Saskia Lensink, Maciej Eder
https://literaryquality.huygens.knaw.nl/
make.table2(table.of = "age.resp", split = "gender.resp") make.table2(table.of = "literariness.read", split = "gender.author") # Note that you can only provide an argument to the 'split' variable # that has less than 31 unique values, to avoid uninterpretable outputs: make.table2(table.of = "age.resp", split = "zipcode") # You can also adjust the x label, y label, title, and colors. make.table2(table.of = "age.resp", split = "gender.resp", xlab = "age respondent", ylab = "number of people", barcolor = "purple", barfill = "yellow") make.table2(table.of = "literariness.read", split = "gender.author", xlab = "Overall literariness scores", ylab = "number of people", barcolor = "black", barfill = "darkred")
make.table2(table.of = "age.resp", split = "gender.resp") make.table2(table.of = "literariness.read", split = "gender.author") # Note that you can only provide an argument to the 'split' variable # that has less than 31 unique values, to avoid uninterpretable outputs: make.table2(table.of = "age.resp", split = "zipcode") # You can also adjust the x label, y label, title, and colors. make.table2(table.of = "age.resp", split = "gender.resp", xlab = "age respondent", ylab = "number of people", barcolor = "purple", barfill = "yellow") make.table2(table.of = "literariness.read", split = "gender.author", xlab = "Overall literariness scores", ylab = "number of people", barcolor = "black", barfill = "darkred")
Reviewers' motivations for their scores (if provided by the respondents) from the survey called The National Reader Survey (2013).
data(motivations)
data(motivations)
This is a dataframe containing that lists all tokens from all
motivations together with lemma and POS tag information.
To see which variables are provided,
type get.columns()
. To learn more about what
the column names really mean, type explain("motivations")
.
Karina van Dalen-Oskam, Joris van Zundert
The dataset is a part of The Riddle of Literary Quality Project.
get.columns
, explain
, books
,
frequencies
, respondents
, reviews
data(motivations) head(motivations, n = 30) summary(motivations)
data(motivations) head(motivations, n = 30) summary(motivations)
Convenience function that produces a 'view' of the token table motivations
with one (plain text) sentence of each motivation per row, listening motivation.id
, book.id
, respondent.id
, sentence.id
, and sentence
.
motivations.sentences()
motivations.sentences()
None
A data table containing all sentences of all given motivations and IDs related to respondents and books.
Joris van Zundert, Saskia Lensink, Maciej Eder
https://literaryquality.huygens.knaw.nl/
motivations.text
, reviews
, respondents
, books
# to create a data frame with one sentence per motivation per row for all motivations: mots <- motivations.sentences() head( mots, n=10 )
# to create a data frame with one sentence per motivation per row for all motivations: mots <- motivations.sentences() head( mots, n=10 )
Convenience function that produces a 'view' of the token table motivations
with the full text of a motivation for each motivation, listening motivation.id
, book.id
, respondent.id
, and text
.
motivations.text()
motivations.text()
None
A data table containing motivations and IDs related to respondents and books.
Joris van Zundert, Saskia Lensink, Maciej Eder
https://literaryquality.huygens.knaw.nl/
motivations.sentences
, reviews
, respondents
, books
# to create a data frame with the full (plain) text of all motivations: mots <- motivations.text() head( mots, n=10 )
# to create a data frame with the full (plain) text of all motivations: mots <- motivations.text() head( mots, n=10 )
Function that transforms the survey responses into ordered factors. Levels quality.read
and quality.notread
: "very bad", "bad", "a bit bad", "neutral", "a bit good", "good", "very good", "NA". Levels literariness.read
and literariness.notread
: "absolutely not literary", "non-literary", "not very literary", "between literary and non-literary","a bit literary", "literary", "very literary", "NA". Levels statements 4/12: "completely disagree", "disagree", "neutral", "agree", "completely agree", "NA".
order.responses(bookratings.or.readingbehavior = NULL)
order.responses(bookratings.or.readingbehavior = NULL)
bookratings.or.readingbehavior |
Use either |
A data table containing relevant variables.
Saskia Lensink, Maciej Eder
https://literaryquality.huygens.knaw.nl/
reviews
, respondents
, motivations
, books
# to create a data frame with ordered factor levels of the questions # on reading behavior: dat.reviews = order.responses("readingbehavior") str(dat.reviews) # to create a data frame with ordered factor levels of the book ratings: dat.ratings = order.responses("bookratings") str(dat.ratings)
# to create a data frame with ordered factor levels of the questions # on reading behavior: dat.reviews = order.responses("readingbehavior") str(dat.reviews) # to create a data frame with ordered factor levels of the book ratings: dat.ratings = order.responses("bookratings") str(dat.ratings)
The information about the reviewers that participated in the survey called The National Reader Survey (2013).
data(respondents)
data(respondents)
This is a dataframe containing numerical, ordinal and textual data
about the 13541 reviewers that scored 401 novels. To see which
variables are provided, type get.columns()
. To learn more
about what the column names really mean,
type explain("respondents")
.
Karina van Dalen-Oskam, Joris van Zundert
The dataset is a part of The Riddle of Literary Quality Project.
get.columns
, explain
, books
,
reviews
, frequencies
, motivations
data(respondents) print(respondents) summary(respondents)
data(respondents) print(respondents) summary(respondents)
Reviewers' scores from the survey called The National Reader Survey (2013).
data(reviews)
data(reviews)
This is a dataframe containing numerical, ordinal and textual data
for thousands of individual reviews (and the reviewers' scores)
for 401 novels. To see which variables are provided,
type get.columns()
. To learn more about what
the column names really mean, type explain("reviews")
.
Karina van Dalen-Oskam, Joris van Zundert
The dataset is a part of The Riddle of Literary Quality Project.
get.columns
, explain
, books
,
frequencies
, respondents
, motivations
data(reviews) print(reviews) summary(reviews)
data(reviews) print(reviews) summary(reviews)