EmojiSentR: An R Package for Integrated Text and Emoji Sentiment Analysis
DOI:
https://doi.org/10.35566/jbds/tonghlKeywords:
Emoji, Sentiment analysis, Text mining, R package, PDF extractionAbstract
Although emojis are used by 92% of the world’s online population, appear in over one quarter of social media posts, and carry affect beyond words, most software packages for sentiment analysis ignore emoji semantics. Our newly developed R package, EmojiSentR, addresses this gap by merging emoji valence with word-level sentiment in a tidy R workflow. The package bundles an internal lexicon derived from the Novak-1200 dataset (1,200 glyphs 8 emotions) and a pipeline function, sentiment_analysis(), that (1) extracts emojis, (2) cleans residual text, (3) scores each channel separately, (4) leverages VADER’s (Valence Aware Dictionary and sEntiment Reasoner) simple negation, and (5) returns a weighted composite score. Furthermore, by adding Unicode-cleanup and Poppler-based PDF-to-text utilities, the package also broadens data-source coverage. In an illustrative example, on a corpus of 17,996 English tweets, EmojiSentR changed the predicted polarity in 20.7% of messages while adding only 769.4 ms of computation time per 1,000 posts. For instance, polarity reversals appeared in sarcastic laughter and uncertainty cues. The EmojiSentR package makes it simple and convenient to treat emojis as first-class sentiment carriers within R. Its modular design supports user-tunable weights, transparent lexicon updates, and forthcoming multilingual extensions, offering researchers a reproducible tool for analyzing emoji-rich text as well as PDF-based data sources.
References
Adobe Inc. (2022). Future of creativity: 2022 global emoji trend report. (Retrieved from https://www.adobe.com/)
Aggarwal, C. C. (2015). Data mining: The textbook. Cham: Springer. doi: https://doi.org/10.1007/978-3-319-14142-8
Blei, D. M., & Lafferty, J. D. (2007). A correlated topic model of science. The Annals of Applied Statistics, 17–35. doi: https://doi.org/10.1214/07-aoas114
Blei, D. M., Ng, A. Y., & Jordan, M. I. (2003). Latent dirichlet allocation. Journal of Machine Learning Research, 3(Jan), 993–1022. doi: https://doi.org/10.7551/mitpress/1120.003.0082
Boia, M., Faltings, B., Musat, C., & Pu, P. (2013). A study of user assessment of the emoji sentiment lexicon. Proceedings of the International Conference on Social Informatics, 24–33.
Broni, K. (2023). What’s new on the 10th annual world emoji day. (Retrieved from https://blog.emojipedia.org/)
Gagolewski, M. (2024). stringi: Fast and portable character string processing in r [Computer software manual]. Retrieved from https://cran.r-project.org/package=stringi (R package version 1.8.4)
Hassani, S., Sabetzadeh, M., & Amyot, D. (2025). An empirical study on llm-based classification of requirements-related provisions in food-safety regulations. Empirical Software Engineering, 30(3), 72. doi: https://doi.org/10.1007/s10664-025-10619-z
Hotho, A., Nürnberger, A., & Paaß, G. (2005). A brief survey of text mining. LDV Forum, 20(1), 19–62. doi: https://doi.org/10.21248/jlcl.20.2005.68
Humphreys, A., & Wang, R. J. H. (2018). Automated text analysis for consumer research. Journal of Consumer Research, 44(6), 1274–1306. doi: https://doi.org/10.1093/jcr/ucx104
Hutto, C. J., & Gilbert, E. (2014). Vader: A parsimonious rule-based model for sentiment analysis of social media text. In Proceedings of the international aaai conference on web and social media (Vol. 8, pp. 216–225). doi: https://doi.org/10.1609/icwsm.v8i1.14550
Itani, M., Roast, C., & Al-Kjayatt, S. (2017). Developing resources for sentiment analysis of informal arabic text in social media. Procedia Computer Science, 117, 129–136. doi: https://doi.org/10.1016/j.procs.2017.10.101
Jockers, M. L. (2020). syuzhet: Extract sentiment and plot arcs in novels [Computer software manual]. Retrieved from https://cran.r-project.org/package=syuzhet (R package version 1.0.6)
Khaiser, F. K., Saad, A., & Mason, C. (2023). Sentiment analysis of students’ feedback using text-based classification and nlp. Journal of Language and Communication, 10(1), 101–111. doi: https://doi.org/10.47836/jlc.10.01.06
Kozik, R., Kula, S., Choraś, M., & Woźniak, M. (2022). Technical solution to counter potential crime: Text analysis to detect fake news and disinformation. Journal of Computational Science, 60, 101576. doi: https://doi.org/10.1016/j.jocs.2022.101576
Kralj Novak, P., Smailović, J., Sluban, B., & Mozetič, I. (2015). Sentiment of emojis. PLOS ONE, 10(12), e0144296. doi: https://doi.org/10.1371/journal.pone.0144296
Liu, H., Tsang, S., Wood, A., & Tong, X. (2025). Longitudinal sentiment analysis with conversation textual data. Fudan Journal of the Humanities and Social Sciences, 18(1), 193–214. doi: https://doi.org/10.1007/s40647-024-00417-0
Machová, K., Szabóová, M., Paralič, J., & Mičko, J. (2023). Detection of emotion by text analysis using machine learning. Frontiers in Psychology, 14, 1190326. doi: https://doi.org/10.3389/fpsyg.2023.1190326
Minaee, S., Kalchbrenner, N., Cambria, E., Nikzad, N., Chenaghlu, M., & Gao, J. (2021). Deep learning-based text classification: A comprehensive review. ACM Computing Surveys, 54(3), 1–40. doi: https://doi.org/10.1145/3439726
Nandwani, P., & Verma, R. (2021). A review on sentiment analysis and emotion detection from text. Social Network Analysis and Mining, 11(1), 81. doi: https://doi.org/10.1007/s13278-021-00776-6
Rinker, T. W. (2022). sentimentr: Calculate text polarity sentiment [Computer software manual]. Retrieved from https://cran.r-project.org/package=sentimentr (R package version 2.9.0)
Rinker, T. W. (2023). textclean: Text cleaning tools [Computer software manual]. Retrieved from https://cran.r-project.org/package=textclean (R package version 0.9.3)
Roberts, M. E., Stewart, B. M., & Tingley, D. (2019). stm: An r package for structural topic models. Journal of Statistical Software, 91(2), 1–40. doi: https://doi.org/10.18637/jss.v091.i02
Rudis, B., & Robinson, D. (2024). emoji: Data and functions to work with emojis [Computer software manual]. Retrieved from https://cran.r-project.org/package=emoji (R package version 0.2.0)
Selivanov, D., Wang, Q., & Tang, Y. (2024). text2vec: Modern text mining framework for r [Computer software manual]. Retrieved from https://cran.r-project.org/package=text2vec (R package version 0.6)
Thakur, N. (2023). Sentiment and text analysis of public discourse on twitter about covid-19 and mpox. Big Data and Cognitive Computing, 7(2), 116. doi: https://doi.org/10.3390/bdcc7020116
Urbanek, S. (2023). utf8: Unicode text processing [Computer software manual]. Retrieved from https://cran.r-project.org/package=utf8 (R package version 1.2.3)
Wankhade, M., Rao, A. C. S., & Kulkarni, C. (2022). A survey on sentiment analysis methods, applications, and challenges. Artificial Intelligence Review, 55(7), 5731–5780. doi: https://doi.org/10.1007/s10462-022-10144-1
Westgate, M. J., Barton, P. S., Pierson, J. C., & Lindenmayer, D. B. (2015). Text analysis tools for identification of emerging topics and research gaps in conservation science. Conservation Biology, 29(6), 1606–1614. doi: https://doi.org/10.1111/cobi.12605
Wickham, H. (2024). stringr: Simple, consistent wrappers for common string operations [Computer software manual]. Retrieved from https://cran.r-project.org/package=stringr (R package version 1.6.3)
Wickham, H., Francois, R., & D’Agostino McGowan, L. (2022). emo: Easily insert emojis into r documents [Computer software manual]. Retrieved from https://github.com/hadley/emo (R package version 0.0.0.9000)
Yu, G. (2022). emojifont: Emoji and font awesome in graphics [Computer software manual]. Retrieved from https://cran.r-project.org/package=emojifont (R package version 0.5.6)