Solved – Clustering crime data which has {latitude, longitude, crime-type} tuples

classificationclusteringspatial

I have a data set which has thousands of rows of {latitude, longitude, crime-type} tuples.

Sample Data:

41.757366519   -87.642992854   THEFT
41.910469677   -87.585822373   ROBBERY
41.751270452   -87.690708662   BURGLARY
41.757366519   -87.642992854   THEFT
41.757366519   -87.642992854   THEFT
..             ..              ..
..             ..              ..

I am trying to cluster these based upon the crime types.

For example, if in any region, THEFT has a high frequency of occurrence, based on the data set, it should show up as a cluster. I have tried clustering using the lat-long data only, and that does not seem to have any meaning for this crime dataset.

I am fairly new to data mining, and gradually figuring my way out.

How can I cluster the data using the latitude and longitude values based such that the clusters are related to each other through the crime-type? Is there any tool available that can use the lat-long data and cluster them on the crime-type basis? Otherwise, I can even write a script once I understand how this can be done.

Also, has anyone had any previous experience in crime-data-mining? In what other ways can I find interesting patterns from a crime data-set?

Best Answer

I know of people who spatially cluster individual crime types: see the CrimeStat documentation for a number of applied examples. I don't see much utility in trying to separate different clusters based on the crime type though. Many places are crime generalists, such as a busy commercial area which will have many assaults, robberies, and thefts. These overlapping hot spots would be difficult to separate in any supervised clustering technique.

About the only crime type I might expect this is feasible is residential burglary; those hotspots tend to differ from areas of elevated crime due to more people walking around and interacting.

I can see some utility in such a project though. A hotspot that has many different crime types and a hotspot that only has one crime type may require different strategies by the police department to address the crime problems. That might call for unsupervised classification though.