Enhancing IT Support with NLP-driven Cherwell Issue Analysis

In this post, I'll be looking at the descriptions of around 27,000 tech issues and identifying their topic areas for future analysis. By utilizing Natural Language Processing (NLP) techniques, such as the Term Frequency-Inverse Document Frequency (TF-IDF) algorithm, (see formula on the left) meaningful insights can be extracted from large amounts of unstructured text data. By doing so, I can identify the most common topics that occur in tech issue descriptions, which can be invaluable for guiding future research and improvements in tech support services.

Data Preprocessing:

Before diving into the analysis, it's essential to preprocess the text data to improve the quality of the results. We applied the following preprocessing steps to the 'Description' column of our dataset:

  1. Replace missing values with empty strings

  2. Convert all descriptions to strings

  3. Remove greetings, sign-offs, email addresses, email headers, and extra whitespace

  4. Remove any irrelevant information enclosed within parentheses

The clean_text function was applied to the 'Description' column to perform these preprocessing tasks.

Topic Extraction with TF-IDF:

To identify the topics in each tech issue description, we used the TF-IDF algorithm. This approach assigns a weight to each term in a document based on its importance, which is determined by its frequency within that document and its rarity across all documents. By creating a TfidfVectorizer and fitting it to our preprocessed text corpus, we generated a sparse matrix of term weights. We then extracted the top term (word) for each document, considering it the topic for that specific description.

In the second visualisation. I went on to create a Resolution time column, calculated from the last modified DateTime - created Datetime. Then I allocated each row to the topic with the highest TFIDF score. When finding the most significant Topics (with over 150 occurrences), the median time take highly varies.

From the limited descriptions I had within the Dataset, the keywords of the topic nearly all make sense, but certainly aren’t the most insightful. For instance, having looked at some descriptions with a high score for the staff column it seems like many are using the term staff in the description to emphasise that they need a staff rather than student account. These changes can be time consuming so it’s reasonable that they may take a while.

In conclusion, I've analysed 27,000 tech issue descriptions using NLP techniques like TF-IDF to identify common topics and their resolution times. The findings offer valuable insights, but there's still room for improvement in both the dataset and the extraction methods. I'll continue working on this topic to better understand tech support issues and enhance the services provided. This has been a bit of a work in progress, so I’m hoping to do more work on it in the future.

Previous
Previous

Energy Usage Time-Series Forecast Application

Next
Next

Spotify Top Songs Exploratory Analysis