Skip to content

Latest commit

 

History

History
185 lines (148 loc) · 14.4 KB

NEWS.md

File metadata and controls

185 lines (148 loc) · 14.4 KB

sentometrics 1.0.0

  • Version bump associated with publication of vignette in the Journal of Statistical Software.

sentometrics 0.8.4

  • Alignment with released quanteda v3.0.

sentometrics 0.8.3

  • Features (or docvars) with names "id", "sentence_id", "date", "word_count" or "texts" will not be accepted even when numeric, to avoid duplicate column names down the line. A clear error message is issued to alert users.
  • Replacement of order() calls on data.frames where needed to avoid CRAN complaints.
  • Some small documentation fixes.

sentometrics 0.8.2

  • Some documentation fixes.
  • Release of a pkgdown website.
  • Fixed bug in sento_corpus() function that did not always order input correctly by date.
  • Fixed two minor bugs in summary.sento_measures(); the first one prevented printing of document-level weighting schemes, the second one did not remove NAs when averaging over correlations.
  • Small bug fix in yearly aggregation (it did not account for the fact that 1970-01-01 is considered day zero).
  • Dropped horizontal 0-line automatically added in the plot.sento_measures() function as it distorts graphs of time series with values far away from zero.
  • Stopped exporting all defunct functions to clean up namespace.
  • The function print.sento_corpus() now shows when corpus is multi-lingual.

sentometrics 0.8.1

  • Alignment with released quanteda v2.0.

sentometrics 0.8.0

  • New function: print.sento_corpus().
  • Package update followed by release of a substantial update of the vignette (see https://doi.org/10.2139/ssrn.3067734).
  • Changed some warning() calls to message() calls to be more kind to the user.
  • Altered internal code to comply with the corpus object from quanteda >= v2.0.
  • Dropped all "TF"-inspired weights for within-document aggregation except for "TFIDF", and made this option return the same sentiment scores as would when using the quanteda package (see the example on https://sentometrics-research.com/sentometrics/articles/examples/sentiment.html).

sentometrics 0.7.6

  • Fixed memory allocation issue in the compute_sentiment() function.

sentometrics 0.7.5

  • New functions: as.data.table.sento_corpus(), as.data.frame.sento_corpus(), and as.data.frame.sento_measures().
  • Embedded a small workaround in plot.attributions() to guaranty same plotting behaviour after update of ggplot2 package that gave buggy output for the geom_area() layer.
  • Integrated for overall consistency measures_global() into the aggregate.sento_measures() function, adding a do.global argument to enact it.
  • Slightly changed the clusters-based sentence-level sentiment computation (different weighting of adversative conjunctions).
  • Clarified the documentation for the peakdates() and peakdocs() functions.
  • Put the Shiny application made available in previous package update (i.e., the sento_app() function) in a separate sole-purpose package sentometrics.app (see https://github.com/sborms/sentometrics.app).
  • Moved the data.table package from Depends to Imports (see Rdatatable/data.table#3076).
  • No change by reference of input sentiment objects in the merge.sentiment() function anymore, and modified the merging to give for instance a simple column binding of sentment methods when all else is equal.
  • Correct pass-through of default how argument in the compute_sentiment() function.
  • Added a few adversative conjunctions to all word lists in list_valence_shifters.
  • Added a do.normalize option to the weights_beta() and weights_exponential() functions.
  • Added a do.inverse option to the weights_exponential() function and associated do.inverseExp argument in the ctr_agg() function.
  • Modified some names of options for within-document or within-sentence aggregation (i.e., across tokens): "squareRootCounts" into "proportionalSquareRoot", "invertedExponential" into "inverseExponential", and "invertedUShaped" into "inverseUShaped".
  • Corrected the numerator (number of documents or sentences instead of token frequency) in all weighting schemes involving the inverse document frequency (IDF).
  • Aligned all formulas concerning the exponential weighting curves.
  • The compute_sentiment() function now also can do a sentence-level calculation using the bigrams valence shifting approach.
  • Fixed a small bug that did not allow to have different valence shifters lists for a multi-language sentiment calculation.

sentometrics 0.7.0

  • New functions: measures_update(), subset.sento_measures(), as.sentiment(), as.sento_measures(), as.data.table.sentiment(), corpus_summarize(), sento_app(), and aggregate.sento_measures().
  • Defunct all deprecated functions as well as the functions replaced by the new functions (wiping the slate clean...).
  • Handled reverse dependency issue raised by quanteda developers regarding their new corpus object.
  • Renamed the class objects coming from any sento_xyz() function into the name of the function (e.g., the sento_measures() function now gives a sento_measures object instead of a sentomeasures object).
  • Fixed a small bug in the aggregate.sento_measures() (previously measures_merge()) function to take the mean instead of the sum in a particular case.
  • Added many more within- and across-document weighting schemes (see the get_hows() function for an overview).
  • Added the flexibility to do an explicit sentence-by-sentence sentiment computation (see do.sentence argument in the compute_sentiment() function).
  • Added the flexibility to create a multi-language sento_corpus object to do a multi-language sentiment computation (applying different lexicons to texts written in different languages).
  • Expanded the compute_sentiment() function to also take tm SimpleCorpus and VCorpus objects.
  • Added the tm and NLP packages to Suggests.

sentometrics 0.5.6

  • New function: peakdates().
  • Modified the purpose of the peakdocs() function and added a peakdates() function to properly handle the entire functionality of extracting peaks.
  • A series of documentation fixes.

sentometrics 0.5.5

  • New functions: sentiment_bind(), and to_sentiment().
  • Defined replacement (of lexicons and names) for a sentolexicons object.
  • Properly handled lag = 1 in the ctr_agg() function, and set weights to 1 by default for n = 1 in the weights_beta() function.
  • Solved single failing test for older R version (3.4.4).
  • Removed the abind package from Imports.
  • Removed the zoo package from Imports, by replacing the single occurrence of the zoo::na.locf() function by the fill_NAs() helper function (written in Rcpp).
  • Extended the quanteda::docvars() replacement method to a sentocorpus object.
  • Modified information criterion estimators for edge cases to avoid them turning negative.
  • Dropped the "x" output element from a sentomodel object (for large samples, this became too memory consuming).
  • Dropped the "howWithin" output element from a sentomeasures object, and simplified a sentiment object into a data.table directly instead of a list.
  • Expanded the do.shrinkage.x argument in the ctr_model() function to a vector argument.
  • Added a do.lags argument to the attributions() function, to be able to circumvent the most time-consuming part of the computation .
  • Imposed a check in the sento_measures() function on the uniqueness of the names within and across the lexicons, features and time weighting schemes.
  • Solved a bug in the measures_merge() function that made full merging not possible.
  • The n argument in the peakdocs() function can now also be specified as a quantile.

sentometrics 0.5.1

  • Minor modifications to resolve few CRAN issues.
  • Set default value of nCore argument in the compute_sentiment() and ctr_agg() functions to 1.
  • Classed the output of the compute_sentiment.sentocorpus() function as a sentiment object, and modified the aggregate() function to aggregate.sentiment().

sentometrics 0.5.0

  • New functions: weights_beta(), get_dates(), get_dimensions(), get_measures(), and get_loss_data().
  • Renamed following functions: to_global() to measures_global(), perform_agg() to aggregate(), almons() to weights_almon(), exponentials() to weights_exponential(), setup_lexicons() to sento_lexicons(), retrieve_attributions() to attributions(), plot_attributions() to plot.attributions().
  • Defunct the ctr_merge() function, so that all merge parameters have to be passed on directly to the measures_merge() function.
  • Expanded the use of the center and scale arguments in the scale() function.
  • Added the dateBefore and dateAfter arguments to the measures_fill() function, and dropped NA option of its fill argument.
  • Added a "beta" time aggregation option (see associated weights_beta() function).
  • Corrected update of "attribWeights" element of output sentomeasures object in required measures_xyz() functions.
  • Added a new attribution dimension ("lags") to the attributions() function, and corrected some edge cases.
  • Made a slight correction to the information criterion estimators.
  • Added a lambdas argument to the ctr_model() function, directly passed on to the glmnet::glmnet() function if used.
  • Omitted do.combine argument in measures_delete() and measures_select() functions to simplify.
  • Expanded set of unit tests, included a coverage badge, and added covr to Suggests.
  • Reimplementation (and improved documentation) of the sentiment calculation in the compute_sentiment() function, by writing part of the code in Rcpp relying on RcppParallel (added to Imports); there are now three approaches to computing sentiment (unigrams, bigrams and clusters).
  • Replaced the dfm argument in the compute_sentiment() and ctr_agg() functions by a tokens. argument, and altered the input and behaviour of the nCore argument in these same two functions.
  • Switched from the quanteda package to the stringi package for more direct tokenization.
  • Trimmed the list_lexicons and list_valence_shifters built-in word lists by keeping only unigrams, and included same trimming procedure in the sento_lexicons() function.
  • Added a column type "t" to the list_valence_shifters built-in word list, and reset values of the "y" column from 2 to 1.8 and from 0.5 to 0.2.
  • Updated the epu built-in dataset with the newest available series, up to July 2018.
  • Corrected the word 'sparesly' to 'sparsely' in list_valence_shifters[["en"]].
  • Further shortened project page to the bare essence.
  • Omitted statement printed ('Compute sentiment... Done.') in the compute_sentiment() function.
  • Slightly modified print() generic for a sentomeasures object.
  • Dropped the "tf-idf" option for within-document aggregation in the ctr_agg() function.
  • The sento_lexicons() function outputs a sentolexicons object, which the compute_sentiment(). function specifically requires as an input; a sentolexicons object also includes a "[" class-preserving extractor function.
  • The attributions() function outputs an attributions object; the plot_attribtutions() function is therefore replaced by the plot() generic.
  • Defunct the perform_MCS() function, but the output of the get_loss_data() function can easily be used as an input to the MCSprocedure() function from the MCS package (discarded from Imports).
  • Moved the parallel and doParallel packages to Suggests, as only needed (if enacted) in the sento_model() function.
  • Sligthly modified appearance of plotting functions, to drop ggthemes from Imports.

sentometrics 0.4.0

  • New functions: measures_delete(), nmeasures(), nobs(), and to_sentocorpus().
  • Renamed following functions: any xyz_measures() to measures_xyz(), extract_peakdocs() to peakdocs().
  • Dropped do.normalizeAlm argument in the ctr_agg() function (but kept in the almons() function).
  • Inverted order of rows in output of the almons() function to be consistent with Ardia et al. (IJF, 2019) paper.
  • Renamed lexicons to list_lexicons, and valence to list_valence_shifters.
  • The stats element of a sentomeasures object is now also updated in measures_fill().
  • Changed "_eng" to "_en"' in list_lexicons and list_valence_shifters objects, to be in accordance with two-letter ISO language naming.
  • Changed "valence_language" naming to "language" in list_valence_shifters object.
  • The compute_sentiment() function now also accepts a quanteda corpus object and a character vector.
  • The add_features() function now also accepts a quanteda corpus object.
  • Added an nCore argument to the compute_sentiment(), ctr_agg(), and ctr_model() functions to allow for (more straightforward) parallelized computations, and omitted the do.parallel argument in the ctr_model() function.
  • Added a do.difference argument to the ctr_model() function and expanded the use of the already existing oos argument.
  • Brought ggplot2 and foreach to Imports.

sentometrics 0.3.5

  • Faster to_global().
  • Set tolower = FALSE of quanteda::dfm() constructor in compute_sentiment().
  • Changed intercept argument in ctr_model() to do.intercept for consistency.
  • Proper checks on values of feature columns in sento_corpus() and add_features().

sentometrics 0.3.0

  • New functions: diff(), extract_peakdocs(), and subset_measures().
  • Modified R Depends from 3.4.2 to 3.3.0, and omitted import of sentimentr.
  • Word count per document now determined based on a separate tokenization.
  • Improved valence shifters search (modified incluce_valence() helper function).
  • New option added for within-document aggregation ("proportionalPol").
  • Now correct pass-through of dfm argument in ctr_agg().
  • Simplified select_measures(), but toSelect argument expanded.
  • Calculation in to_global() changed (see vignette).
  • Improved add_features(): regex and non-binary (between 0 and 1) allowed.
  • All texts and lexicons now automatically to lowercase for sentiment calculation.
  • (Re)translation of built-in lexicons and valence word lists.
  • Small documentation clarifications and fixes.
  • New vignette and run_vignette.R script.
  • Shortened project page (no code example anymore).

sentometrics 0.2.0

  • First public release.

sentometrics 0.1.0

  • Google Summer of Code 2017 "release" (unstable).