San Francisco Crime Visualization
The purpose of this post is two-folded
- To complete the crime analytics visualization assignment from Coursera: Communicating Data Science Results
- To try my hands at data visualization with ggplot2 and ggmap.
The dataset given by Coursera is actually a portion of the dataset from Kaggle: San Francisco Crime Challenge so maybe I’ll look into the Kaggle competition later as well.
In this assignment, you will analyze criminal incident data from Seattle or San Francisco to visualize patterns and, if desired, contrast and compare patterns across the two cities. You will produce a blog-post-style visual narrative consisting of a series of visualizations interspersed with sufficient descriptive text to make a convincing argument. You will use real crime data from Summer 2014 one or both of two US cities: Seattle and/or San Francisco.
For the Coursera assignment, I will produce a map of San Francisco with the top committed crimes and also look into the neighborhoods susceptible to particular kinds of crimes.
1 2 3 4 5 6 7
After importing the dataset, we will take a subset of the data since only the crime categories and the crime coordinates are needed for this visualization. For the visualization, we will use a popular R package called ggmap, which is designed for spatial visualization in combination with ggplot2. The map returned from the get_map command looks like this:
Then, we will start putting data points on the base map. However, before we do that, we need to take a look at the data first.
1 2 3 4 5
Category Counts (fctr) (int) 1 LARCENY/THEFT 9466 2 OTHER OFFENSES 3567 3 NON-CRIMINAL 3023 4 ASSAULT 2882 5 VEHICLE THEFT 1966 6 WARRANTS 1782 7 DRUG/NARCOTIC 1345 8 SUSPICIOUS OCC 1300 9 MISSING PERSON 1266 10 SECONDARY CODES 442 .. ... ...
It appears that theft is the most common crime in San Francisco, followed by “other offenses” and “non-criminal” crimes (not sure what those mean). Assault and Drug-related crimes are also quite common. Let’s put these on the map and see how they distribute spatially.
1 2 3 4 5 6 7 8 9 10 11 12
Due to the clustering of the data points, the coloring doesn’t work as well as expected but we can still get a good sense of how the crimes distributed spatially. The notorious Tenderloin neighborhood has the most concentrated crimes and that extends to the Downtown area north of it. The number of crimes declines significantly after crossing the Market St. to the south and the US101 to the west. The other interesting feature worth mentioning is that crimes decrease as we go east or north from the Tenderloin area into the Nob Hill and Financial District neighborhoods, but increase as we approach the water front.
Now let’s take a particular look at Larceny/Theft, the most common crime in San Francisco. The spatial distribution of theft matches the top 5 crimes distribution quite well. Apart from the Tenderloin neighborhood, the areas with high theft counts are SOMA, Western Addition and Mission St.
Now we’ll move away from the more common crimes and take a look at robbery, which is not among the top 10 most-committed crimes in San Francisco. While the count is still higher in Tenderloin neighborhood compared with other areas, robberies are actually pretty spread-out in the city.
For drug-related crimes, it is an entirely different story. The crimes related to drugs or narcotics are highly concentrated in the Tenderloin neighborhood north of Market St.and south of Geary St, while the distribution out of the Tenderloin area is quite sparse. I am guessing that drug-related crimes are usually organized by gangs, which tend to stay in certain area; robbery could be committed by any individual criminals, which makes it more spread out.
So, the takeaway advice after reading the post is clearly to avoid the Tenderloin area when traveling to the city by the bar.
The R code I used was largely borrowed from Ben Hamner in his script of San Francisco Top Crimes Map on Kaggle.