Client:
INFO664
Project Duration:
3 months
Role(s):
Coder, Editor, Analyst
Team:
Independent Project
Tools Used:
- Python
- Openrefine
- ChatGPT API
- R
Skills:
- Python: Numpy, matplotlib, pandas
- R
- API scraping
- Data Cleaning
Learning Outcome:
Technology: Choose and employ appropriate tools for data collection, manipulation, analysis, visualization, and storage
On February 1st, 2023, Glossier relaunched the product that made their brand: the Glossier Balm Dot Com. With new packaging and a new formula, there was no shortage of comments made by cult Glossier followers. With many commenting on missing the old formula, it became intriguing to see what else was being said, and how exactly customers were feeling about Glossier’s new product.
This project dives into the reviews left on Glossier’s site to see what was being said the most about this new formulation, and if there were any positives at all to come out of the situation.

Methodology
Tools Used
Python: Python is a popular coding language that can be used for web scraping and data analysis. In this case, the python packages used were the Pandas, Matplotlib, and Cloud Scraper libraries.
Openrefine: An open source data cleaning tool that can handle large amounts of data and compile datasets.
ChatGPT API: The ChatGPT API was used to perform sentiment analysis on the gathered reviews.
R: R is another very useful data analysis tool. For this project, R was used to create word clouds, as they were clearer to read than those created through Python.
Github: Github is a great tool for storing code projects and making version control easy. This project was saved and stored on Github for easy access and replication.
The project first began by finding the proper data source. While I initially wanted to use Twitter, there were some privacy issues connecting to Twitter’s API. (Note: this project was done prior to Twitter’s renaming) I then decided to scrape the reviews from the Glossier website, which were conveniently located in a Harper’s Bazaar public API. Using Python, the reviews were scraped and compiled into multiple csv files.
After gathering the data, it was then time to clean and compile each csv into one master csv file that would hold all glossier review data. Using Openrefine, I plugged each dataset in, removed unnecessary columns, and created a dataset that held the review text and review date.
Once the dataset was cleaned, it was just a matter of plugging it into Python and R to create the visualizations needed for analysis.

Results
As expected, there was an overall consensus amongst the reviews that Glossier’s new formula did not hold up to Glossier’s typical quality standards. It left customers’ lips feeling more dry, and the formulation to them felt more cheap than it used to be. According to the sentiment analysis, reviews with the phrase, “new formula” had a general sentiment of -.4074, and reviews with the phrase, “old formula” had a sentiment score of about -.4388. The lower sentiment score for “old formula” phrases was most likely due to users complaining about how they miss the old formula. Words most typically connected to the negative reviews were, “dry”, “cheap”, “disappointed”, “terrible”, “awful”, and “horrible”.
On the positive side, customers were enjoying the new applicator, which featured an angled straight-to-lip applicator, and they overall enjoyed the new packaging. The average sentiment score for reviews with the phrase, “new applicator” in them was around -.1634, which, while still negative, is much better than those regarding the new formula. The most common words used with positive reviews were, “love”, “applicator”, “moisturizing”, “hydrating”, “packaging”, and “perfect”. It is important to note that these reviews included reviews prior to the product relaunch, since they were reviews from all of 2023.
Below is the presentation for this project, which features the visualizations, top word counts, and a more detailed analysis of this sentiment analysis. Overall, despite having some initial difficulties with web scraping, this project came together really well and I am very happy with what I was able to uncover about one of my personal favorite brands, Glossier.
If you’d like to take a look at the code for this project, check out my Github:
Leave a comment