Team Members: Natalie Olson, Sabrina Chan, Andrew Chiang
Role: Data Analyst
Tools: Python, Github
Timeframe: September 2024 - December 2024
Context: Project for Text Mining and Data Analytics
Going out to eat at a restaurant provides customers with convenience, the ability to try new food, or a time to celebrate with friends and family. When discovering new restaurants, most people turn to Yelp or Google reviews to learn about the establishment. The opportunity to review restaurants provides customers with a platform to give feedback on their experience. On Yelp, each review is given a 1-5 star rating (with 5 being the highest and 1 being the lowest) and suggested to discuss “food, service, and ambiance” or add images to give others meaningful insight into that restaurant. These reviews can provide suggestions on dishes to try, general tips like parking or the best time to go, and in some cases, talk about negative experiences. Such negative experiences can range from poor service to bad food and, in some cases, foodborne illnesses.
When it comes to food safety in restaurants, customers rely on local health authorities to regulate and enforce standards. This comes in the form of health scores, which are a yearly check-in on the restaurant’s conditions in handling and ensuring food safety (County of Santa Clara). Health authorities use this data to determine areas of concern and potential outbreaks. To assist health authorities, the project aims to examine whether we can use restaurant reviews to determine areas/restaurants of concern. For example, if several restaurants in one area have reviews that mention they got sick after eating chicken, health authorities can use this data to determine if there is a more serious outbreak in the area.
How do review sentiments and ratings correlate with restaurants across different locations?
How has this changed over time, and can we use this to identify restaurants of concern to health authorities?