- Version bump associated with publication of vignette in the Journal of Statistical Software.
- Alignment with released
quanteda
v3.0.
- Features (or docvars) with names
"id"
,"sentence_id"
,"date"
,"word_count"
or"texts"
will not be accepted even whennumeric
, to avoid duplicate column names down the line. A clear error message is issued to alert users. - Replacement of
order()
calls ondata.frame
s where needed to avoid CRAN complaints. - Some small documentation fixes.
- Some documentation fixes.
- Release of a
pkgdown
website. - Fixed bug in
sento_corpus()
function that did not always order input correctly by date. - Fixed two minor bugs in
summary.sento_measures()
; the first one prevented printing of document-level weighting schemes, the second one did not removeNA
s when averaging over correlations. - Small bug fix in yearly aggregation (it did not account for the fact that
1970-01-01
is considered day zero). - Dropped horizontal 0-line automatically added in the
plot.sento_measures()
function as it distorts graphs of time series with values far away from zero. - Stopped exporting all defunct functions to clean up namespace.
- The function
print.sento_corpus()
now shows when corpus is multi-lingual.
- Alignment with released
quanteda
v2.0.
- New function:
print.sento_corpus()
. - Package update followed by release of a substantial update of the vignette (see https://doi.org/10.2139/ssrn.3067734).
- Changed some
warning()
calls tomessage()
calls to be more kind to the user. - Altered internal code to comply with the
corpus
object fromquanteda
>= v2.0. - Dropped all
"TF"
-inspired weights for within-document aggregation except for"TFIDF"
, and made this option return the same sentiment scores as would when using thequanteda
package (see the example on https://sentometrics-research.com/sentometrics/articles/examples/sentiment.html).
- Fixed memory allocation issue in the
compute_sentiment()
function.
- New functions:
as.data.table.sento_corpus()
,as.data.frame.sento_corpus()
, andas.data.frame.sento_measures()
. - Embedded a small workaround in
plot.attributions()
to guaranty same plotting behaviour after update ofggplot2
package that gave buggy output for thegeom_area()
layer. - Integrated for overall consistency
measures_global()
into theaggregate.sento_measures()
function, adding ado.global
argument to enact it. - Slightly changed the clusters-based sentence-level sentiment computation (different weighting of adversative conjunctions).
- Clarified the documentation for the
peakdates()
andpeakdocs()
functions. - Put the Shiny application made available in previous package update (i.e., the
sento_app()
function) in a separate sole-purpose packagesentometrics.app
(see https://github.com/sborms/sentometrics.app). - Moved the
data.table
package from Depends to Imports (see Rdatatable/data.table#3076). - No change by reference of input sentiment objects in the
merge.sentiment()
function anymore, and modified the merging to give for instance a simple column binding of sentment methods when all else is equal. - Correct pass-through of default
how
argument in thecompute_sentiment()
function. - Added a few adversative conjunctions to all word lists in
list_valence_shifters
. - Added a
do.normalize
option to theweights_beta()
andweights_exponential()
functions. - Added a
do.inverse
option to theweights_exponential()
function and associateddo.inverseExp
argument in thectr_agg()
function. - Modified some names of options for within-document or within-sentence aggregation (i.e., across tokens):
"squareRootCounts"
into"proportionalSquareRoot"
,"invertedExponential"
into"inverseExponential"
, and"invertedUShaped"
into"inverseUShaped"
. - Corrected the numerator (number of documents or sentences instead of token frequency) in all weighting schemes involving the inverse document frequency (IDF).
- Aligned all formulas concerning the exponential weighting curves.
- The
compute_sentiment()
function now also can do a sentence-level calculation using the bigrams valence shifting approach. - Fixed a small bug that did not allow to have different valence shifters lists for a multi-language sentiment calculation.
- New functions:
measures_update()
,subset.sento_measures()
,as.sentiment()
,as.sento_measures()
,as.data.table.sentiment()
,corpus_summarize()
,sento_app()
, andaggregate.sento_measures()
. - Defunct all deprecated functions as well as the functions replaced by the new functions (wiping the slate clean...).
- Handled reverse dependency issue raised by
quanteda
developers regarding their new corpus object. - Renamed the class objects coming from any
sento_xyz()
function into the name of the function (e.g., thesento_measures()
function now gives asento_measures
object instead of asentomeasures
object). - Fixed a small bug in the
aggregate.sento_measures()
(previouslymeasures_merge()
) function to take the mean instead of the sum in a particular case. - Added many more within- and across-document weighting schemes (see the
get_hows()
function for an overview). - Added the flexibility to do an explicit sentence-by-sentence sentiment computation (see
do.sentence
argument in thecompute_sentiment()
function). - Added the flexibility to create a multi-language
sento_corpus
object to do a multi-language sentiment computation (applying different lexicons to texts written in different languages). - Expanded the
compute_sentiment()
function to also taketm
SimpleCorpus
andVCorpus
objects. - Added the
tm
andNLP
packages to Suggests.
- New function:
peakdates()
. - Modified the purpose of the
peakdocs()
function and added apeakdates()
function to properly handle the entire functionality of extracting peaks. - A series of documentation fixes.
- New functions:
sentiment_bind()
, andto_sentiment()
. - Defined replacement (of lexicons and names) for a
sentolexicons
object. - Properly handled
lag = 1
in thectr_agg()
function, and set weights to 1 by default forn = 1
in theweights_beta()
function. - Solved single failing test for older R version (3.4.4).
- Removed the
abind
package from Imports. - Removed the
zoo
package from Imports, by replacing the single occurrence of thezoo::na.locf()
function by thefill_NAs()
helper function (written inRcpp
). - Extended the
quanteda::docvars()
replacement method to asentocorpus
object. - Modified information criterion estimators for edge cases to avoid them turning negative.
- Dropped the
"x"
output element from asentomodel
object (for large samples, this became too memory consuming). - Dropped the
"howWithin"
output element from asentomeasures
object, and simplified asentiment
object into adata.table
directly instead of alist
. - Expanded the
do.shrinkage.x
argument in thectr_model()
function to a vector argument. - Added a
do.lags
argument to theattributions()
function, to be able to circumvent the most time-consuming part of the computation . - Imposed a check in the
sento_measures()
function on the uniqueness of the names within and across the lexicons, features and time weighting schemes. - Solved a bug in the
measures_merge()
function that made full merging not possible. - The
n
argument in thepeakdocs()
function can now also be specified as a quantile.
- Minor modifications to resolve few CRAN issues.
- Set default value of
nCore
argument in thecompute_sentiment()
andctr_agg()
functions to 1. - Classed the output of the
compute_sentiment.sentocorpus()
function as asentiment
object, and modified theaggregate()
function toaggregate.sentiment()
.
- New functions:
weights_beta()
,get_dates()
,get_dimensions()
,get_measures()
, andget_loss_data()
. - Renamed following functions:
to_global()
tomeasures_global()
,perform_agg()
toaggregate()
,almons()
toweights_almon()
,exponentials()
toweights_exponential()
,setup_lexicons()
tosento_lexicons()
,retrieve_attributions()
toattributions()
,plot_attributions()
toplot.attributions()
. - Defunct the
ctr_merge()
function, so that all merge parameters have to be passed on directly to themeasures_merge()
function. - Expanded the use of the
center
andscale
arguments in thescale()
function. - Added the
dateBefore
anddateAfter
arguments to themeasures_fill()
function, and droppedNA
option of itsfill
argument. - Added a
"beta"
time aggregation option (see associatedweights_beta()
function). - Corrected update of
"attribWeights"
element of outputsentomeasures
object in requiredmeasures_xyz()
functions. - Added a new attribution dimension (
"lags"
) to theattributions()
function, and corrected some edge cases. - Made a slight correction to the information criterion estimators.
- Added a
lambdas
argument to thectr_model()
function, directly passed on to theglmnet::glmnet()
function if used. - Omitted
do.combine
argument inmeasures_delete()
andmeasures_select()
functions to simplify. - Expanded set of unit tests, included a coverage badge, and added
covr
to Suggests. - Reimplementation (and improved documentation) of the sentiment calculation in the
compute_sentiment()
function, by writing part of the code inRcpp
relying onRcppParallel
(added to Imports); there are now three approaches to computing sentiment (unigrams, bigrams and clusters). - Replaced the
dfm
argument in thecompute_sentiment()
andctr_agg()
functions by atokens
. argument, and altered the input and behaviour of thenCore
argument in these same two functions. - Switched from the
quanteda
package to thestringi
package for more direct tokenization. - Trimmed the
list_lexicons
andlist_valence_shifters
built-in word lists by keeping only unigrams, and included same trimming procedure in thesento_lexicons()
function. - Added a column type
"t"
to thelist_valence_shifters
built-in word list, and reset values of the"y"
column from 2 to 1.8 and from 0.5 to 0.2. - Updated the
epu
built-in dataset with the newest available series, up to July 2018. - Corrected the word 'sparesly' to 'sparsely' in
list_valence_shifters[["en"]]
. - Further shortened project page to the bare essence.
- Omitted statement printed ('Compute sentiment... Done.') in the
compute_sentiment()
function. - Slightly modified
print()
generic for asentomeasures
object. - Dropped the
"tf-idf"
option for within-document aggregation in thectr_agg()
function. - The
sento_lexicons()
function outputs asentolexicons
object, which thecompute_sentiment()
. function specifically requires as an input; asentolexicons
object also includes a"["
class-preserving extractor function. - The
attributions()
function outputs anattributions
object; theplot_attribtutions()
function is therefore replaced by theplot()
generic. - Defunct the
perform_MCS()
function, but the output of theget_loss_data()
function can easily be used as an input to theMCSprocedure()
function from theMCS
package (discarded from Imports). - Moved the
parallel
anddoParallel
packages to Suggests, as only needed (if enacted) in thesento_model()
function. - Sligthly modified appearance of plotting functions, to drop
ggthemes
from Imports.
- New functions:
measures_delete()
,nmeasures()
,nobs()
, andto_sentocorpus()
. - Renamed following functions: any
xyz_measures()
tomeasures_xyz()
,extract_peakdocs()
topeakdocs()
. - Dropped
do.normalizeAlm
argument in thectr_agg()
function (but kept in thealmons()
function). - Inverted order of rows in output of the
almons()
function to be consistent with Ardia et al. (IJF, 2019) paper. - Renamed
lexicons
tolist_lexicons
, andvalence
tolist_valence_shifters
. - The
stats
element of asentomeasures
object is now also updated inmeasures_fill()
. - Changed
"_eng"
to"_en"
' inlist_lexicons
andlist_valence_shifters
objects, to be in accordance with two-letter ISO language naming. - Changed
"valence_language"
naming to"language"
inlist_valence_shifters
object. - The
compute_sentiment()
function now also accepts aquanteda
corpus
object and acharacter
vector. - The
add_features()
function now also accepts aquanteda
corpus
object. - Added an
nCore
argument to thecompute_sentiment()
,ctr_agg()
, andctr_model()
functions to allow for (more straightforward) parallelized computations, and omitted thedo.parallel
argument in thectr_model()
function. - Added a
do.difference
argument to thectr_model()
function and expanded the use of the already existingoos
argument. - Brought
ggplot2
andforeach
to Imports.
- Faster
to_global()
. - Set
tolower = FALSE
ofquanteda::dfm()
constructor incompute_sentiment()
. - Changed
intercept
argument inctr_model()
todo.intercept
for consistency. - Proper checks on values of feature columns in
sento_corpus()
andadd_features()
.
- New functions:
diff()
,extract_peakdocs()
, andsubset_measures()
. - Modified R Depends from 3.4.2 to 3.3.0, and omitted import of
sentimentr
. - Word count per document now determined based on a separate tokenization.
- Improved valence shifters search (modified
incluce_valence()
helper function). - New option added for within-document aggregation (
"proportionalPol"
). - Now correct pass-through of
dfm
argument inctr_agg()
. - Simplified
select_measures()
, buttoSelect
argument expanded. - Calculation in
to_global()
changed (see vignette). - Improved
add_features()
: regex and non-binary (between 0 and 1) allowed. - All texts and lexicons now automatically to lowercase for sentiment calculation.
- (Re)translation of built-in lexicons and valence word lists.
- Small documentation clarifications and fixes.
- New vignette and run_vignette.R script.
- Shortened project page (no code example anymore).
- First public release.
- Google Summer of Code 2017 "release" (unstable).