U.S. flag

An official website of the United States government

Skip Header


Ranked short text classification using co-occurrence features and score functions

Written by:
Working Paper Number ADEP-WP-2024-06

Abstract

This article explores the use of co-occurrence features and score functions to perform ranked classification of short text. Unlike features based on word sequences, co-occurrence features are based on word combinations with no restrictions on word order or distance. Co-occurrence features are appropriate for short text because documents in this setting contain very few words. We consider a variation of the Vector Space Model called the “umbrella” vectorization that emphasizes textual details and reduces feature redundancy. We also propose a complementary score function based on a weighted average of the features’ class distributions in the corpus. For validation, the methods are applied to four short text datasets and compared to baseline classifiers. The proposed score function performs better than a modified BM25 classifier and achieves a level of accuracy similar to that of logistic regression.

Page Last Revised - July 2, 2024
Is this page helpful?
Thumbs Up Image Yes Thumbs Down Image No
NO THANKS
255 characters maximum 255 characters maximum reached
Thank you for your feedback.
Comments or suggestions?

Top

Back to Header