Abstract
This paper presents an annotated dataset of student feedback for end-of-semester reviews of teaching in a software engineering master’s program. The annotation was performed at the word span level in order to capture inputs with mixed annotations. The annotation was performed by a combination of students and faculty using labels that capture not only sentiment categories (POSITIVE, NEGATIVE), but also domain-specific labels that are relevant to better understand and process the content of student feedback, namely a SUGGESTION label for feedback that can distinguish a purely negative response from constructive criticism, a COMPARISON label for capturing comparisons that are not clearly an absolute positive or negative sentiment, and a REDACT label for identifying personal information of instructors or students that should be removed prior to wider data collection and dissemination. This paper is a pilot in that it only covers one instructor’s feedback from a variety of courses over several years. However, we supplement these data with non-official student feedback from online sources. Our primary contributions are the annotated dataset and preliminary machine learning results, including BERT, DistilBERT, and SpaCy span categorization models.