Movie Review Sentiment Analysis
Sentiment analysis is a Big Data problem that seeks to determine the general attitude of a writer given some text they have written. For instance, we would
like to have a program that could look at the text “The film was a breath of fresh air” and realize that it was a positive statement while “It made me want to
poke out my eye balls” is negative.
One algorithm that we can use for this is to assign a numeric value to any given word based on how positive or negative that word is and then score the
statement based on the values of the words. But, how do we come up with our word scores in the first place?
That’s the problem that we’ll solve in this assignment. You will write a program that reads a file containing movie reviews from the Rotten Tomatoes
(http://www.rottentomatoes.com/) website that have both a numeric score as well as text. You’ll use this to learn which words are positive and which are
negative. Then you’ll implement methods that can compute the average value of any sequence of words using this data.
Download the scenario for this assignment, which contains the images you need: program5.zip
(https://canvas.vt.edu/courses/29807/files/1535803/download?wrap=1) . Do not forget to unpack the zip file before you use it!
The scenario you download does not contain any classes, and will appear completely bare. The files it contains are all support files (image files and a data
file). You’ll have to create all the classes yourself.
The downloadable project for this assignment includes a data file containing the reviews you will use as the basis for this assignment. Note that each review
starts with a number 0 through 4 with the following meaning:
0 : negative
1 : somewhat negative
2 : neutral
3 : somewhat positive
4 : positive
Classes You Create
For this assignment, you will create three classes. Combined, they’ll look something roughly like this:
This class represents basic statistics about a single word that appears in reviews. The idea is that you will be loading reviews that consist of numeric
ratings together with a series of words (the body of the review). For each word, you’ll want to remember the review scores associated with it, along with how
frequently it occurs. Internally, the data associated with a word can be stored as two integers: a count of the number of times the word has been seen across
all reviews, together with a sum of all the review scores for all the reviews this word appears in. This class should provide the following methods:
WordSentiment() This default constructor simply initializes the internal data (all counts/accumulators should start at zero).
int getCount() Returns the number of times the word has occurred across all reviews.
Returns the sum of all the review scores for all the reviews this word appears in.
Returns the average sentiment score for this word, which is equal to the sum of review scores divided by the count of
occurrences of the word. If no occurrences have yet been recorded, return a neutral sentiment value of 2.0.
This method takes a review score as a parameter and “records” one occurrence of the word associated with the given review
score. You should call this method once each time the word is seen in a given review.
This class is a subclass of Actor that represents a smiley (or frowny) face that represents a sentiment score between 04. Remember that actors can
change their images to affect what they look like on screen using the method setImage() . The method setImage() takes a string as a parameter, where
the string is the name of an image file to use as the actor’s image. The starter project for this assignment includes 5 images (with names “0.png” through
“4.png”) that represent the 5 different review scores. You can change an actor’s image with a call like: this.setImage(“0.png”); .
The SmileyFace class should provide the following methods:
SmileyFace() This default constructor simply initializes the internal data of the class, initializing the face’s “score” to 2.0 (neutral).
Returns the current review score that controls this actor’s look.
Changes the score that controls this actor’s look, including changing the image of this actor to reflect the score. The image
chosen for the actor should be based on the rounded integer value of the sentiment score (for example, a score of 1.9 should
result in a neutral image, not a negative image).
boolean isPositive() Returns true if the score this actor represents has a positive sentiment (a score that is greater than or equal to 2.5).
boolean isNegative() Returns true if the score this actor represents has a negative sentiment (a score that is less than 1.5).
boolean isNeutral() Returns true if the score this actor represents has a neutral sentiment (see other methods for the appropriate numeric limits).
11/28/2016 Program 5
The SentimentAnalyzer is a subclass of World . You will need to devise your own strategy for the order in which you implement the methods in this class.
The world class represents the logic for loading reviews to compute word statistics and for computing the sentiment score for any provided string of text.
The primary data stored in this class should be a map from strings (words) to WordSentiment objects that represent the accumulated data about the
On screen, the world shows two objects: a SmileyFace actor that is used to illustrate the sentiment of the text you will be analyzing, and a TextShape (see
Chapter 9 (http://sofia.cs.vt.edu/cs1114ebooklet/chapter9.html) in our ebooklet) that shows the numeric sentiment score.
This class must provide the following methods:
SentimentAnalyzer() The default constructor should initialize the world using 72x72 pixel grid cells arranged in 5 rows by 8 columns. Your map will
initially be empty. The constructor should place the smiley face and text shape on the screen, vertically centered.
SmileyFace getFace() This getter returns the smiley face actor belonging to this object.
TextShape getText() This getter returns the text shape visible on this object.
This method takes a scanner as a parameter and loads all the reviews from the input source connected to the scanner.
Remember that the reviews are arranged one per line, and each line begins with an integer representing the review’s score. You
should repeatedly process all of the words occurring on the remainder of the line using the same review score. Note: any words
that do not begin with a letter should be ignored (hint: recall that the Character.isLetter() method can help you).
loadReviews() This method is an overloaded version of loadReviews() that loads the reviews from the default data file (the one named
“movieReviews.txt”). Be careful not to duplicate code.
This method uses the reviews you have loaded (if any) to compute the sentiment score for the given word. If no sentiment data is
available for the given word (i.e., it was never seen in any reviews loaded so far), then return a neutral score: 2.0.
This method examines all the words in the given text and computes the average sentiment score over all of the words. This
average represents the sentiment score for the entire text. Just as with loadReviews() , any word that starts with a nonletter
should be ignored for the purposes of computing the sentiment score.
show(String text) This method computes the average sentiment score for the given text, and uses this score to set the smiley face, while also
placing the humanreadable version of this score as the text in the displayed text shape.