1. Sentiment Analysis on PR Comments

At GitStart, we started experimenting with contributing to open-source repos like Cypress. I stumbled upon this comment while reading PRs: ![[sentiment_analysis_PR_comments4.png]] Positive feedback is such a powerful source of motivation! If only someone could read all PRs comments and flag the most positive ones to share them with the team… Let’s build this “someone” 🤖! Plus, the required hacking session is a great excuse to catch up with my brother, who did a Ph.D. in the field. ### **Picking the right model** What type of model to search for? Two possible types are supervised learning and Natural Language Processing. Searching for supervised learning approaches surface this thesis, [*Affective Sentiment and Emotional Analysis of Pull Request Comments on GitHub*](https://uwspace.uwaterloo.ca/handle/10012/12728). Jackpot! Diving into the dataset closes this path. The model was trained on heavy asynchronous open-source communities where each comment is a mini-essay. Ten plus lines of comments, rigorous step-by-step descriptions,… those comments are strikingly different in virtually every variable from GitStart’s PR comments. And we don’t want to spend our day manually labeling data. Searches for NLP approaches surface below repositories: - [TextBlob](https://planspace.org/20150607-textblob_sentiment/) - [Sentiment Analysis Action](https://github.com/rob-derosa/sentiment-analysis-action) - [GitHub Sentiment Analysis](https://github.com/NAU-OSL/Github-Sentiment-Analysis) - [Twitter Sentiment Analysis](https://github.com/oraziorillo/TwitterSentimentAnalysis) Across all the models used in these repos, TextBlob seems best. It has the most stars (3x as many as OpenAI!) and seems to offer the best doc. The model’s most helpful attribute is polarity, describing the degree to which a text is negative or positive on a -1 to 1 scale. ### Validating the approach manually To avoid premature optimization, it’s best to start by manually validating the approach. We fetch the last 3k comments from GitStart’s database (there are hundreds of thousands 👀) and feed them to TextBlob. Then, we manually score a hundred PRs and compare the scores. We also look at the most extreme polarity scores to get a feel for false positives. Here are the 3 key results: 1. **Strong emotions transpire though ~5% of PR comments** - in our manually labeled sample, 18% of comments featured some emotions, and in 5% of cases, the emotional intensity was strong enough to be meaningful. Example: <aside> 🙌 Yes definitely agree! Great catch, thanks JR </aside> 2. **The model works great for positive emotions.** ![[sentiment_analysis_PR_comments.png]] 3. **The model fails to accurately label negative emotions.** - Any comments featuring ````suggestion```` is labeled as very negative, which introduces a lot of noise. ![[sentiment_analysis_PR_comments7.png]] - Almost all comments with negative scores turn out to be false positives. Why? In most NLP models like TextBlob, to every word is associated a vector. The model multiplies these vectors together to compute a sentence’s polarization score. As a result, rather positive comments like “Wow, I would have expected this to go badly” will be labeled as negative. A path forward might be exploring more refined models like those of OpenAI. ![[sentiment_analysis_PR_comments6.png]] - We were expecting this result. Most of engineers’ negativity is packaged in passive-aggressive messages, which are hard to spot for machines 🤖! ![[sentiment_analysis_PR_comments5.png]] To conclude our manual testing, we either scope down to only identifying positives comments, or we find a better one or tune one to make flagging negative comments meaningfully accurate. Release early, release often - we choose to scope down. Next step is to clarify what we want to build. ### **Spec’ing** The final product could be a weekly slack messages showing positive comments, their polarity score, and their URL. Technically, we could build in 3 blocks: 1. **scraper/fetcher** - dump all PR comments from a given repo into a database 2. **sentiment reader** - add a polarity score to each PR comment 3. **notifier** - casts and filter data to send selected comments on Slack We already have the scraper and won’t be able to finish notifier before dinner, so we decide to build the sentiment reader. To avoid messing up the database for this quick weekend project we might not maintain, we decide to build it as an external tool, which entails building an API. ### Replit - the awesomest way to set up an API ❤️ Replit is magical . You can create a repl, and… you’re done. It runs! It auto-installs all required packages and runs when you press `CMD + Enter`, like a Jupyter Notebook. This is HUGE! If you’ve spent a large share of your life crafting arcane shell commands to explore the inner workings of dreaded compilers like Babel, it doesn’t feel like much… but for someone who’s starting out, this is the difference between dropping one’s first coding project because of a package import error and discovering the thrill of programming armies of machines. And do you know how cheap it is? You can leave your Repl “always on” for 2 cents per day. This means you can have a dedicated resource for $7 per year! ![[sentiment_analysis_PR_comments3.png]] In mere minutes, and for 2 cents per day, we found ourselves with a server we could query through an API 🔥. Send a PR comment, and receive its polarization score. Replit is awesome. ### Next steps and ideas Next steps: 1. add an endpoint that accepts a payload of thousands of comments 2. build the scraper/fetcher 3. build the notifier (slack webhooks make it a breeze) 4. add a cron job (likely on Render) Another idea: what if we could run sentiment analysis in Slack? As a manager, I’ve nitted multiple drama episodes in the bud by noticing conversations were going astray and pinging people directly. It looks like [this person](https://ruarfff.com/slack-sentiment/) and [this company](https://chatacuity.com/) have played with the concepts, but they look at workspaces as aggregates, which I doubt provides actionable insights. This would be a fun tool. So many fun projects to build, so little time!