User Feedback Analysis Using AI

Bridging the Gap Between Users and Developers

Introduction

Back in 1750BC, the first documented case of a customer complaint/suggestion was chiselled onto a clay tablet. The customer, Nanni, unimpressed by the service and the copper ore he received from the supplier, Ea-Nasir, made the unprecedented decision to give the business his feedback. We don't know how Ea-Nasir responded but we do know how and why customer feedback has evolved in the past four thousand years, and it's come a long way since the days of carving your complaint into stone!


The culture of feedbacks continued all the way from 1750BC until the current days. With the advance of technology and globalization, users are able to provide and give feedback to softwares in many ways, from sending an email to posting a public feedback on the platform. One big example for that is Amazon, who is known for its strong focus on customer feedback. They combine customer reviews and ratings with comments and feedback to improve their products and inform their development and marketing strategies. They also use customer feedback to inform their pricing and inventory management decisions.


One way of using the feedback is feeding the developers with useful data, as suggestions and bug complaints. The main problem that we face on the technology and big data is that we don't know which comments are truly useful for the developers to use, and how we could provide that bridge between them.

Objectives

Based of that information, this project aims to investigate the use of embedding models, clusterization algorithms and Natural Language Processing (NLP) and more specifically within text mining or information retrieval, to associate comments and descriptions with issues in versioning platforms, such as Github, in software development systems. The research will seek to evaluate different techniques to identify the most effective ones in representing and matching textual information in programming contexts, with the aim of improving prioritization and problem-solving processes.

Concepts

Embeddings

Embeddings are a way to represent objects such as text, images, or audio as vectors in a continuous, high-dimensional space. In this space, similar items are located close to one another, allowing machine learning algorithms to understand semantic relationships. For example, similar texts will have similar vector representations and be placed closer together in the space.


Clustering

Clustering is an unsupervised machine learning technique that groups similar data points into clusters based on patterns or features they share. In the context of text, clustering can help identify common topics or group together related comments, making it easier to find patterns or prioritize issues.

Methods

The first phase of the project will involve setting up a controlled testing environment or "playground" to experiment with various embedding models, clustering algorithms, and NLP techniques. The goal is to understand how each method behaves and how well it performs in representing and grouping textual data.


In the second phase, real-world data—provided by X will be used to validate the initial tests. Some parts of the pipeline may require supervised fine-tuning or hyperparameter optimization, especially in the case of clustering algorithms and NLP models that depend on specific configurations.


Finally, all results will be analyzed to produce practical insights and recommendations on which techniques are most effective for extracting meaningful, developer-relevant information from user feedback.

Schedule

Month Activities
May
  • Research and review related literature (NLP, embeddings, clustering)
  • Define scope and choose initial tools/technologies
  • Set up development environment ("playground")
June
  • Implement and test different embedding models (e.g., Word2Vec, BERT)
  • Explore preprocessing techniques for text data
July, August
  • Apply and compare clustering algorithms (e.g., K-Means, DBSCAN)
  • Evaluate combinations of embeddings + clustering
September
  • Final evaluation of techniques and summarize results
  • Start writing the final report
October
  • Finish writing and reviewing the TCC
  • Prepare presentation or defense materials

References

[1] Häring, M.; Stanik, C.; Maalej, W. (2021). Automatically Matching Bug Reports With Related App Reviews.

View Abstract

[2] Jurafsky, Dan; Martin, James H. (2021). Speech and Language Processing (3rd ed.).

View Book