Repository logo
Log In(current)
  • Inicio
  • Personal de Investigación
  • Unidad Académica
  • Publicaciones
  • Colecciones
    Datos de Investigacion Divulgacion cientifica Personal de Investigacion Protecciones Proyectos Externos Proyectos Internos Publicaciones Tesis
  1. Home
  2. Universidad de Santiago de Chile
  3. Publicaciones ANID
  4. Effects on Time and Quality of Short Text Clustering During Real-Time Presentations
Details

Effects on Time and Quality of Short Text Clustering During Real-Time Presentations

Journal
Ieee Latin America Transactions
ISSN
1548-0992
Date Issued
2021
Author(s)
Lopez-Villarroel, M  
Ponce-Arias, H  
Abstract
Technologies for live presentations should consider users capabilities to manage large amounts of data in real-time, particularly, exchanges of short texts (e.g., phrases). This study examines the effects on time and quality of text clustering algorithms applied to short, medium, and long size texts, and examines whether short text clustering shows a reasonable performance for live presentations. We run several simulations in which we varied the number of phrases (from 5 to 200) contained in each text type (long, medium, and short) and the number of generated clusters (from 2 to 10). The algorithms used were snowball steamers, TF-IDF, and K-means for clustering; and the text types were Reuters, 20 NewsGroup and an experimental data set, for the long, medium, and short size texts, respectively. The first result showed that text size had a large effect on the algorithms execution time, with the shortest average time for the short texts and longer average time for the longest texts. The second result showed that the number of phrases in each text type significantly predicts execution time but not the number of clusters generated by K-means. Inertia and purity measures were used to test the quality of the clusters generated. Text size, number of phrases and number of clusters predict inertia; showing the lowest inertia for the short texts. Purity measures were like previously reported results for all text types. Thus, clustering algorithms for short texts can confidently be used in real-time presentations. © 2003-2012 IEEE.
Get Involved!
  • Source Code
  • Documentation
  • Slack Channel
Make it your own

DSpace-CRIS can be extensively configured to meet your needs. Decide which information need to be collected and available with fine-grained security. Start updating the theme to match your Institution's web identity.

Need professional help?

The original creators of DSpace-CRIS at 4Science can take your project to the next level, get in touch!

Logo USACH

Universidad de Santiago de Chile
Avenida Libertador Bernardo O'Higgins nº 3363. Estación Central. Santiago Chile.
ciencia.abierta@usach.cl © 2023
The DSpace CRIS Project - Modificado por VRIIC USACH.

  • Accessibility settings
  • Privacy policy
  • End User Agreement
  • Send Feedback
Logo DSpace-CRIS
Repository logo COAR Notify