Living in a great city like Boston has a lot of perks, such as a ton of history and culture, amazing food, and five (5!) Superbowl championships. As a major metropolitan area, the city also collects their data and then makes it available to the public. This makes it easy to download in byte-sized sets. All puns aside, this is an extremely important tool for journalists, especially since a ton of it is free!
Today, we’ll be showing you how to run through this publicly available data and build charts out of it. For the sake of ease, we’re going to use data from the Mayor’s 24 Hour Hotline. Specifically, we’re focusing on the number of complaints, comments, and compliments the Mayor’s hotline has received from 2011 to 2017.
Cleaning Up Your Data
First things first, let’s download some data. Go to Boston’s Data Portal to get started. You can follow along in our slideshow below to try to troubleshoot any questions you may have.
This portal tracks all of the 311 Service Requests from 2011 to 2017, so we need to cut down on all the noise. Use the blue Filter button along the ribbon to isolate just comments, compliments, and complaints made to the Mayor’s hotline.
Select “add a new filter condition”, choose “SUBJECT” and “contains”, type in “mayor”, and then add a new filter condition. In the second filter, select “CASE_TITLE” and “contains”, and type “comment”, “compliment”, and “complaint”.
Now that you’ve isolated the comments, compliments, and complaints that have been called in to the Mayor’s hotline, export the data to a .csv file for Excel.
Open up your new .csv file and finish cleaning. If you get pound signs in the date column, just widen the column. We just want to focus on the year the call was made and whether it was a comment, compliment, or complaint, so we can get rid of a lot of the data. Instead of deleting it, create a new sheet and copy the following columns into it: “OPEN_DT” and “CASE_TITLE”.
For the sake of ease, switch OPEN_DT and CASE_TITLE, so that CASE_TITLE comes first. Now, let’s simplify the OPEN_DT column. Select OPEN_DT, click the “Data” tab along the ribbon and choose “Text to columns”. Choose Delimited. On the next screen, select “Tab”, “Space”, and “Other” where you should type in a “/”. Run that and you should get a table like the one in Figure 5.
Go ahead and remove columns C, D, F, and G; we just want the year the cases were filed. Also, label the years YEAR_FILED and save the file as an Excel file instead of an csv.
Now let’s start consolidating the types of calls made. You’ll see that the comments, compliments, and complaints are labeled as to which department they’re about. We can speed this up by using the “Find and Replace” function.
Once your table is totally clean, we’ll save it and switch over to Access.
Upload your Excel file to Access, specifying Sheet1. Name this sheet CityCalls. Under the “Create” tab on the ribbon, select “Design Query”. In the query, select “CityCalls” and add three Fields: CASE_TITLE, YEAR_FILED, and a second YEAR_FILED. Click on Totals in the ribbon, and set CASE_TITLE and one YEAR_FILED as Group By, setting the second YEAR_FILED as Count. Run that query. You can then export the resulting query to an Excel file.
With all that tedious cleaning taken care of, let’s move over to Plotly and build our chart!
Building the Graph in Plotly
To analyze the clean data from the City Calls, we’ll copy the total number for comments, complaints and compliments and move over to Plot.ly
Plot.ly is a free collaboration platform for data science. It allows users to share and publish data findings. It also provides users easy to navigate graphing tools.
Once an account is created there is a button in the upper right hand corner that says create. Hover over the button and select chart.
The different sections may look intimidating but they are easy to navigate. The left column allows you to customize the style of the graph and the middle column is where you select the chart type and set the X and Y axis of your chart. The right column is split in two, the upper portion is where you import the data, and the lower portion is where the graph will be displayed.
Once you’re on the screen to create a chart you can paste the data from the City Calls into the grid.
After you insert the clean data you’ll be able to start building your graph. Row A is the year, row B is the total for comments, row C is for complaints and row D is for compliments.
The middle column is where you select the type of graph you want to use to represent the data. You’ll click on the chart type and it will pull up a window that shows all the charts you can build. They have several options available for free from bar charts to 3D Scatter plots. For the purpose of this exercise we’ll create a line chart to show the different between the comments, complaints and compliments.
After you select the type of chart you want to build you’re going to have to set the X and Y axis. For our data, row A is the X-axis and rows B through D are the Y-axis. You’ll click on the X-axis and assign A to that field and for the Y-axis you’ll assign B. You can only assign one y-axis at a time. After you assign the X and Y a sample graph will pop up in the lower right column.
But we have more than one type of call to consider, so we need to add the others. Looking at Figure 14, you’ll notice in the upper right hand corner there is a button that says Trace. Click the button and you’ll add a subplot. You’ll add two more subplots for rows C and D, and set the y-axis for the respective rows.
Once you have all the plots the graph will show show each of them charted.The graph can be customized using the using the style tab on the left-hand side and the subhead Traces. In that tab you can adjust the colors for the lines on your graph and add points.
After you’ve adjust the style and want to add the names into the graph, you can do that by clicking into the graph. The colored lines found on the right side of the graph represent row B to D, and you can name each row, which is comments, complaints and compliments. To add a title you click on the space above graph that says “Click to enter Plot title” in light grey and add the name of your graph. The end result will be the graph shown in Figure 17.