Category Archives for "Data Science"

Hume’s Turkey: A Tale of Thanksgiving.

Like most turkeys, Hume's turkey lives on a farm. From this turkey's perspective life is good. Of course, there are bad days: the weather might turn cold, the older turkeys might box him out from the best seed and grass, a fox might get into the coupe. Life is good because of the farmer. The farmer loves his turkeys.

The farmer keeps the coup, which protects the turkey from the weather. The farmer defends the turkey from the fox. The farmer makes sure the turkey gets enough to eat. The turkey is happy because the farmer loves him.

The turkey lives his life, without concern for the future. The turkey has no way of knowing, based on experience, that he will soon be dinner.

We put ourselves at risk when looking to the past as a predictor of what will be. That is the problem with knowledge gained from induction, that is the turkey's dilemma. 

Bertrand Russell is credited with the idea of the turkey's dillema in highlighting the holes in David Humes arguments about knowledge gained by induction.

Adapted from The Black Swan, by Nassim Taleb.

Difference in Difference Estimation [Notes]

difference in difference estimation

Difference in Difference estimation is a linear regression methodology used to analyse the effect of some event in time (the treatment) by comparing results over time. The idea is to compare data before and after the treatment. If the treatment was effective then you will find material differences in the outcomes for the target variable, or Y.

Before and After. This is a basic comparison of means for the time periods before and after treatment. For this specification we assume the target (Y) in the post-treatment period is equal to the target (Y) in the pre-treatment period in the absence of treatment. Thus, any change is attributable to the treatment.

Difference in Difference estimation is the natural extension of the before and after analysis is to include a control group for comparison. The difference in difference specification allows us to do this. We can compare Y as we did for the before and after analysis. We can also compare Ys between treated and untreated groups. To complete a difference in difference specification we use two dummy variables that partition the sample into four groups. The first dummy variable, treatment, partitions the sample in two halves based on their treatment status. The second dummy variable, post, partitions the halves in quarters based on the time period. We then interact treatment and post (Post*Treatment); the coefficient on Post*Treatment estimates the statistical difference in Y.

​The Model:

Y = B0 + B1*Post + B2*Treatment + B3*Post*Treatment.

  • Y, is the target variable we are interested in estimating.
  • Post, is a binary dummy variable indicating whether an observation is in the post treatment period.
  • Treatment, is a binary dummy variable indicating whether an received treatment or not.
  • B3 is the estimator, the coefficient on Post*Treatment.

Machine Learning Under the Hood: Separating Signal from the Noise.

All data is a combination of signal and noise. Signal represents valuable consistent relationships that we want to learn. Noise is the random correlations and stuff that will not occur again in the future. The combination of signal and noise takes on familiar patterns or shapes that we can use to build a model.

Models can consider varying degrees of signal and noise. On end of the spectrum is the non-model which disregards signal & noise. Consider this common upsell question.

“Would you like fries with that?” This approach requires no model. It disregards signal and noise, rigidly canvassing everyone regardless of who they are or what they ordered:

“I’ll have a salad, hold the dressing. And a bottled water.”

“Would you like fries with that?”

“Carbs? Uh, no thank you.”

On the other end of the spectrum are flexible solutions that consider signal and noise. The predictions gathered are influenced by every piece of data, including outliers which can (and do) skew outcomes. These solutions can be more damaging than using no model at all. Such models lose predictive ability for new data because they are too tightly bound to their original training data. This is called overfitting.

​This post is a continuation of my last post where I went over how machine learning fits into the scope of data science. This post goes a step further to talk about how we use machine learning to separate signal from the noise. There are many machine learning algorithms. Think of them as tools in a tool box. Data scientists use these pre-built algorithms to tease models from their data sets.

A handful of these tools are based on classical statistical methods which makes them easy to interpret. If the model is being used to aid a human to make decisions it’s a good idea to develop the model with these classical methods:

  • Linear regression
  • Logistic regression
  • K-means classification
  • K-nearest neighbors
  • Hierarchical clustering
  • Naive Bayes

There are more options If there is no human involvement in the decision process; think of Netflix making a recommendation, no one reviews the recommendation before you get it. If there is no human element or if there is a high tolerance for opaque black box methods there is an additional group of modern machine learning algorithms:

  • Random Forest
  • Hidden Markov Models
  • Support Vector Regression
  • Artificial Neural Networks
  • Apriori Algorithms

Discussing each of these is outside the scope of this post. But if you are interested in learning more, make sure to subscribe to my email list for data savvy professionals and get a copy of “Bull Doze Thru Bull.”

Take away. If you are evaluating a machine learning project a great question is to ask about the algorithms that were used. LASSO & Random Forest are as close to out of the box all purpose tools as you will find so they are quite common. The classical methods are a conservative choice. The modern machine learning methods are really black box solutions which means they probably tried all of them and went with the tool that performed the best in testing.

What’s the difference between business intelligence and business analytics?

So . .. are you a BI guy?

I get that often and the answer is yes, yes I am. In actuality the answer is, “well . . . sort of. Maybe. It’s more machine learning. At least some people think so.” The quick answer is that Business Intelligence is an evolving industry. If I get into it there is a variety of follow up questions that usually start: “What’s the difference between …”

What’s the difference between business intelligence and machine learning? Even if you google these terms it’s hard to find a good definition. For sure you will find definitions, but not meaning and context for a never ending list of terms in a jargon rich field.

Quoting a Google employee, “Everything at the company is driven by machine learning.” What does that mean exactly? Is that big data? What about data mining? How does that fit in? Is this all just fancy jargon for old school econometrics and statistics?

In the next 3 min i am going to take on the job of getting you up to speed on what all these terms mean in relation to each other. It isn’t enough to have a list of definitions, you need to understand context. That is what I will give you here. Context.

What Business intelligence is . . . and isn’t.

When you think about Business Intelligence you might confuse it for Business Analytics. Business Intelligence runs the business. Business Analytics changes the business. Intelligence directs process. Analytics directs strategy. Intelligence focuses understanding for today. Analytics focuses planning for tomorrow.

BI is real time access to data. Reporting. BI identifies current problems, solutions, and enables informed decision making. Business Analytics explores data: statistical analysis, quantitative analysis, data mining, predictive modelling among other technologies and techniques to identify trends and make predictions. But the two areas are merging as evidenced by these headlines:

  • 5 Ways Machine Learning Can Make Your BI Better
  • Machine Learning: The Real Business Intelligence
  • Machine Learning: The Future of Business Intelligence.
  • Big Data & BI Trends 2017: Machine Learning, data lakes, and Hadoop Vs Spark

How Does  Machine Learning Relate to Business Analytics? What is it?

I’ve heard machine learning described as the brains behind AI. Machine learning is the subfield of computer science that gives "computers the ability to learn without being explicitly programmed." I think of Machine Learning as a collection of pre-built algorithms for building models to predict future outcomes. Business analytics is about using those models on the execution side, putting insight into context and making things happen. In my last post I talked about the difference between data science and data savvy. Business analytics requires data savvy while machine learning is a component of data science.

Data Science?​

Data Science deals with structured and unstructured data. In principle, everything that relates to data cleaning, preparation and analysis lies within the scope of Data Science. Data science is interdisciplinary requiring training in statistics, computer science, and industry. Solo practitioners with specialization in all three areas are rare so it is common to have data science teams: a data savvy manager, an econometrician, & a developer trained in machine learning.

Traditional Research. If you know anything about analytics (or statistics) you are probably familiar with regression: “ordinary least squares”. If not I highly recommend reading the book, Freakonomics. Regression is a mathematical way of drawing a straight line that most closely fits a scatterplot of data.

Regression is the basis for econometrics which is squarely found in the arena of traditional research. As you can see on the venn diagram traditional research blends classical statistics with industry knowledge. The emphasis of traditional econometrics is to use statistical tools to determine causal relationships in data. An econometrician wants to be able to tell why something is happening in the data. They want to tell a story about why you see correlations. And they do that using different variations of the regression technique.

Software development. We all have apps that make our lives easier & more entertaining. A relative few are lucky enough to earn a living developing and/or supporting software, SaaS (Software as a Service). Traditional software development makes processes more efficient. Most of development exists around this. This field requires both Computer Science (coders) as well as industry knowledge. This space is marked by partnerships between clients and their SaaS providers. 

Developers will spend a lot of time and resources understanding their client's existing process to build solutions around industry best practices.

Machine Learning. Which brings us back to machine learning, which is probably not as familiar as software development or traditional research. When a developer uses machine learning, what does that look like? It starts with a dataset. As is the case with traditional research the first step is to prepare the data for analysis. Data prep, munging or data wrangling, as it is called is the most time intensive step. The second step is to separate the data into two parts. Two thirds to 70% of the data is used for training the model. 

One third to 30% is saved for testing the model. A machine learning modeler has a variety of tools at their disposal to build a model of relationships based on the training data. The modeler will then make predictions about the test data based on this model. The more accurate the predictions, the better the model.


At this point you should have a clear idea of what data science is: a blending of machine learning, traditional research, and software development to create predictive models. To contrast BI focuses on dashboards and reporting for the here and now. BI focuses on process. DS focuses on strategy. Data science requires a variety of advanced skill sets which makes data science teams quite common (and full stack data scientists quite rare). Business analytics on the other hand requires data savvy: a survey level understanding of data science topics with the purpose of putting these models to good use by executing on business goals.​

Among these topics a data savvy professional should be familiar with is an understanding of machine learning and the strengths and limitations of the more common algorithms used in machine learning. If you are interested in learning more, make sure to subscribe to my email list for data savvy professionals and get a copy of “Bull Doze Thru Bull.” In my next post we are going to explore these topics and get under the hood with machine learning.​

Data Savvy Managers Have 6 Skills Tech StartUps Look For.

McKinsey says there will be a shortage of data skills in 2018. Mckinsey predicts a shortfall in meeting the demand for 1.5 Million Data Savvy Managers. Savvy managers can make use of data on the execution side, putting insight into context and making things happen.

A major hurdle to iterating and improving strategic data driven decision making is people. Data analytics is pretty straight forward; i.e. math is just that, math. It's people (humans) that is the problem. Which means people (could that be you?) are the solution. Data science relies heavily on statistical computing. Scripts and math. Algorithms. If (1) you start with good data and (2) you have a competent data scientist conduct and interpret the analysis, you still need (3) to put those results into context; make something happen. Someone has to do! Teams (doers) need to execute on insights.

Here are six skills tech startups are looking for in a data savvy manager:

Listen. Understand the problems your team, senior, & mid-level managers are facing.

Ask great questions. Frame the problem into a set of questions that, if answered, direct action. Understand (& communicate) that decisions must be made once these questions are answered.

Understand data science. Take a survey level course on data science. LinkedIn Learning offers a course that you can get through in an afternoon. When you understand the process you can ask actionable questions that lend themselves to be answered with a data model.

Evaluate alternatives. Data often suggests multiple approaches; assemble the right team that can prioritize them.

Acknowledge and mitigate bias. Team members have (and use) inherent bias. Teams that manage GroupThink will naturally make better evaluations.

Catalyze change. Communicate and empower decisions throughout the organization. Building the architecture need for changes to take place.

These six skills are crucial to developing processes that:

(1) generate meaningful questions

(2) pose those questions effectively

(3) build understanding around data driven decisions

(4) create a culture that can implement those decisions.

Data Science requires rare (specialist) qualities:

(1) an ability to take unstructured data and find order, meaning, and value.

(2) Deep analytical talent.

​Data Savvy doesn't.

To be a generalist, a data savvy manager, doesn’t. Data savvy doesn't require you to be a math expert,

learn more @

Excel – Clean, Trim, and other useful formulas to Cleaning up text (data).

Clean, Trim, & useful formulas to Clean up text (data).

When working with data it is often necessary to ‘clean’ the data first so that you can work with it.  Often formatting is distracting or inconsistent, there may be extra spaces between words and on occasion there may be special characters that are not visible but, get in the way of working with data. Sometimes data is not in a format that you can use. This data needs to be prepped for use. Sometimes this is called data processing; we refer to it polishing data. Two common formulas for cleaning data include:

=clean()  Removes special characters from your cell values. 

=trim()  Removes extra spaces between words and at the ends of your cell value.

This video goes into how to use these two functions.

Murder: Crime Displacement over Time and Space

In May of 2004 a coordinated search of Inwood Hills Park ended when police discovered the badly decomposed remains of Sarah Fox.  Fox had disappeared days before, shortly after she left her apartment for an afternoon jog.  In the days that followed, the neighborhood experienced a heightened level of caution, police presence in the area increased and media and news organizations held a constant vigil across the street from Fox’s apartment complex. It is a romantic suggestion that a neighborhood will come together in tragedy, albeit temporarily.  However, anecdotal evidence does suggest that under such circumstances the criminal element will lay low and bide its time until the heightened level of caution and police presence subsides.  While this makes intuitive sense, is there evidence to support such a conclusion?  This paper is the result of simple question: in the short term, is a neighborhood safer after a murder?  Is there a causal relationship between murder and common street crime?


The literature suggests two theories on criminal behavior: contagion effect and displacement effect.  These effects provide a theoretical explanation as to why the traditional choice model of constrained optimization does not fully explain the observable variability in crime across space and time.  Under the constrained optimization model, criminals seek to maximize utility given a fixed constraint such as civil and criminal penalties.  Criminals will commit crimes when the expected benefits exceed the expected costs (Becker 1968). Chang indicates that criminals look for easy opportunities; burglars prefer homes, large commercial buildings, or buildings with alleyway access over more closely guarded buildings (2011). This suggests that criminals consider costs and benefits and look to optimize benefits under risk constraints.  However, neighborhood demographics alone do not account for the variability in crime across time and space. Glaeser states, “Positive covariance across agents’ decisions about crime is the only explanation for variance in crime rates higher than the variance predicted by differences in local conditions” (Glaeser 1996, 508).  The gap in explanatory power suggest that there are other significant forces beyond the traditional choice model.


Narayan, Nielsen, and Smyth’s work in the American Journal of Economics and Sociology indicates that there is a natural rate of crime that remains impervious to efforts to reduce crime rate in the long-run (2010). They call short-run variations from the long-run natural crime rate structural breaks.  Contagion and displacement effects are the principal sources of these deviations from the natural rate of crime in a given time and space.


Glaeser explains contagion theory as peers’ influence overshadowing the criminal’s innate disposition towards crime (1996). Peers influence a criminal’s preferences by reducing stigma or increasing social distinction. Through misinformation, peers distort a criminal’s frame of reference and understanding of the costs and benefits. Peers can relax constraints by assisting directly or providing some technology or special knowledge that decreases a criminal’s likelihood of being caught (Glaeser 1996). Contagion effects involve copycats and retaliatory crimes as well (Jacob 2007). When peers’ actions influence a criminal’s decision to act there is a multiplier effect; the neighborhood’s actual crime rate will increase beyond a predicted crime rate based solely on neighborhood demographics (Glaeser 1996).


Hesseling explains the displacement effect as an inelastic demand for crime: “offenders are flexible and can commit a variety of crimes; and the opportunity structure offers unlimited alternative targets” (Hesseling 1994, 220). Criminals will not abandon their criminal pursuit at infinitum when encountering an obstacle. Instead, they will seek another opportunity to commit a crime with a similar cost benefit analysis. Criminals alter their course of action in order to bypass “conditions unfavorable to the offender’s usual mode of operating” (Gabor 1990, 66 in Harding 1994). The crime is displaced to “other times, places, methods, targets or offense” (Repetto 1976 in Hesseling 1994, 198). Individuals have a specific desired outcome for committing a crime. Hesseling’s detailed overview of related literature supports the rational choice aspect of displacement theory.


Displacement can occur in a variety of ways making it difficult to measure and account for. Researchers examine the effect case by case in discussions about motivations and strategy with criminals or in measuring the statistical impact of a policy on crime reduction in a myriad of times and venues (Hesseling 1994). We employ the later methodology.


We assume that murder, our quasi-experimental treatment, is particularly newsworthy and sufficiently shocking for the neighborhood to react strongly. We predict that we should see an increase in police presence and general wariness among the local population. Inhabitants of the neighborhood will be more cautious and act so as to limit their vulnerabilities for becoming victims themselves. We expect this to have a negative effect on street crime. Criminals will restrain illicit activities until such time where the community and police force are not as alert. Criminals will displace crime into a different area or into a later time period. Murder should have a displacement effect on street crime in the short run.  


Oakland, the eighth largest city in California, is rife with crime. It is home to 390,724 people. Since 2007, there have been 874 murders, and 105,589 crimes committed, that is a crime for every four people. Given the high level of crime, it will be the source for our experiment.


The Oakland Crimespotting project compiles data on crimes from the City of Oakland’s CrimeWatch program which publicizes data from daily police reports. Crimespotting combines crime data and mapping in a convenient matter.  The data comes in a format with a single observation for each crime.  Each observation includes variables that identify the type of crime, location (longitude and latitude), and date when it occurred.  This data is current to the day, and it extends back through 2007. Crimespotting divides the data into thirteen categories: aggravated assault, murder, robbery, simple assault, disturbing the peace, narcotics, alcohol, prostitution, theft, vehicle theft, vandalism, burglary, and arson.  However, Crimespotting does not include data on weather conditions.  It is plausible that weather would influence crime.  Weather Underground offers historic data on the temperature and level of precipitation.


We aggregate the data because is too granular to work with in its raw state.  We sum the observations across zip codes and by date in order to obtain the total amount of crime for each day in each zip code (or neighborhood). We identify the dates of all murders and subsequently calculate the average amount of crime that occurs in the following week as well as for the week prior.  


We apply the same process for those neighborhoods that did not experience a murder in order to determine the crime rates in neighborhoods in which a murder did not occur. From this second group we select neighborhoods that are comparable in terms of their crime rates. In order to ensure that we match similar neighborhoods, we restrict the data to a range of two standard deviations from the mean number of crimes committed per day in a zip code with a murder. To determine these cutoff points, we identify the average neighborhood crime rate conditional on there being a murder, which is 7.5.  The lowest average crime rate was 2.8, and the highest was 15 crimes a day. We drop from our data those neighborhood time period combinations that have an average of fewer than 4.5 crimes a day and neighborhoods that have greater than, on average, 10.5 crimes a day. Such a restriction would eliminate neighborhoods on both ends of the crime spectrum. After all, murders in an extremely crime ridden area would likely not illicit the same dramatic response as murders in a more moderate neighborhood. Additionally, neighborhoods with historically low crime rates will not boast substantial enough crime rates to generate any responsible understanding of causal effects.


Our model uses two specifications to determine if street crime in Oakland has any sensitivity to murder:  before and after, and difference in difference.  Both the before and after and difference in difference specifications assume the counter-factual that crime rate will remain unchanged in the absence of a murder in the area in which a murder took place.  The difference in difference also assumes that the general crime-rates in both the treatment and non treatment areas are similar; the standard errors for both groups are statistically equivalent.  If the standard errors are different or if we detect contagion or displacement effects the specification is unsubstantionable.  We check these assumptions with an OLS specification with a lagged variable for when a murder occurs to see its effect on crime.


insert equation 1


Preliminarily, we employ the before and after approach and look at only the neighborhoods that have murders.  We compare the crime rates before and after each murder.  This is a basic comparison of the average crime rates in the time periods before and after each murder. For this specification we assume that the crime rate in the post-treatment period is equal to the pre-treatment crime rate in the absence of a murder.  Without any shocks or intervening variables, crime rate should maintain its historical trend.  Thus, any change in the crime rate is attributable to the murder.


The natural extension of the before and after analysis is to include a control group for comparison.  The difference in difference specification allows us to do this.  We can compare the average crime rates before and after each murder.  We can also compare crime rates between  neighborhoods with murders and those without.  To complete a difference in difference specification we use two dummy variables that partition the sample into four groups.  The first dummy variable, treatment, partitions the sample in two halves based on their treatment status.  The second dummy variable, post, partitions the halves in quarters based on the time period.


insert equation 2


We identify neighborhoods in Oakland that match in terms of crime rates and geographic size. We assume that demographic and macroeconomics variables (economic prosperity, level of unemployment, ethnicity, and association with a given crime group) are held constant across the two neighborhoods during the week long intervals before and after a murder. This is a necessary evil given the lack of data available on such indicators on a scale smaller than the city level.


We control for several variables.  We assume that crime rates vary based on the date and whether school is on holiday; so, we control for the date a crime was committed. Additionally, we control for the particular neighborhood (zip code). Existing literature indicates that crime rates are highly correlated to drug use. We generate a dummy variable for the first seven days after the first of the month. Government distributes entitlements on the first of the month; therefore, drug addicts will likely spend this disposable income on drugs and spend the next couple of days in a drug induced stupor. Crime should decrease over that short time period. We control for weather, temperature, and precipitation.  It is likely that on hot, dry days crime will increase. However, on cooler or rainy days criminals are more likely to stay of the streets.


insert equation 3


Our results indicate that there is no causal relationship between murder and street crimes. (* Indicates a value that is not statistically significant.) Graph 1 depicts the average daily crime rate leading up to and after 40 murders in our sample. The red vertical line is day the murders happen. The black line represents total crime and the other lines represent corresponding crime categories: red for violent crime, green for crimes relating to property, and blue for miscellaneous crimes. This is visual evidence that murder does not have a causal effect on crime.





The before and after approach reveals that there is not sufficient preliminary evidence to support our hypothesis. A murder in Oakland decreases other street crimes by one twentieth of a crime, .05*. The average week in an Oakland neighborhood sees 8.58 crimes a day.  During the week following a murder, in a given neighborhood street crime drops to 8.53* crimes a day.


The difference in difference specification does not provide evidence to support our hypothesis.  Murder has a coefficient of 1.3757 (.3008) suggesting that murder, as a treatment, actually is positively correlated with crime rates. A post-treatment time frame variable indicates, on contrast a slightly negative correlation with crime rates. The interaction term between these two variables, and a more holistic view of the difference in difference effect, is .1805* (.4250). While ever so slightly positive, this correlation is statistically insignificant with a relatively large standard error and a particularly small t-statistic (.425). Indeed, only the treatment and the temperature control have a standard error large enough to prove significant (4.574 and 5.280 respectively).


We note that precipitation, first of the month dummy, and multiples homicides variables boarder on statistical significance. With a t-statistic of -1.806 and a coefficient of -.1802, multiple murders (or a weighted impact of murder) has a negative correlation to crime. On a rainy day, one can expect a small blimp in crime rates. Additionally, with a coefficient of .5362 (.2781), the first of the month dummy variable is positively correlated to crime rates. T-statistics less than 2 render the results dubious. See Table 1 for more details.


Table 1

Estimated Coefficient Standard Error T-value
Murder 1.375720 0.300775 4.574
Post 0.032416 0.301222 0.108
MurderPost 0.180461 0.425041 0.425
Temperature 0.076992 0.014583 5.280
Precipitation 0.010179 0.005694 1.788
First Week 0.526237 0.278059 1.893
Multiple Murders -0.180162 0.099745 -1.806


To check for the robustness of our model, we replace murder with another high profile crime: arson. Like murder, arson is positively correlated to high rates of crime with a coefficient of 3.598. When interacting with the after treatment variable, arson appears to have a very slight negative effect on crime rates (-.049)*.


Previously, we looked at crime as a whole; however, to analyse subtle effects we look at how murder impacts categories of crime. Murder is strongly correlated with violent crimes; however, it is more weakly correlated with property crimes and what we refer to as miscellaneous crimes: disturbing the peace, prostitution, alcohol, and narcotics. The difference in differences mechanism reveals a similar trend, albeit a statistically insignificant one. Murder has a small negative effect on future instances of miscellaneous and violent crimes; however, there was a very small positive correlation with property crimes. See Table 2 for specific figures.


Table 2

Types of crime Estimated Coefficient of Treatment Estimated Coefficient of Post Estimated Coefficient of TreatmentPost
Violent 1.315547 (0.125213) 0.030097* (0.125213) -0.137175*


Property 0.865536 (0.154010) 0.041623* (0.154010) 0.170580*


Miscellaneous 0.809375 (0.095011) 0.018875* (0.094612) -0.011316* (0.133721)


If our hypothesis were to hold, murder should have a more dramatic effect on crime rates during the short term (a two day time frame) compared to a long term (a 30 day time frame). However, both time frames yield similar results -.3686* and -.30036* respectively. While, the long term effect is slightly smaller, the difference is insignificant. However, the t-values are -.626 and -.738 respectively.


Our initial question has also gone unanswered due to lack of evidence.  Our simple question was not adequate to illuminate such a complex issue.  We found that murder is strongly correlated with crime, however, our results on the effects of murder on post-murder crime rates proved statistically insignificant and extremely small. The process was valuable in helping us form better ideas about pursuing this area of study.


Fundamentally, we need to ask a different question.  We learned that crime is a complex social issue with a lot of variability.  Looking at a short time frame was not beneficial in reducing the impact of this variability.  In light of our research into contagion and displacement effects, had we more time we would revisit our empirical specification.  To assume away contagion and displacement effects on the basis that they were unobserved is ill-advised. To rectify this we would want to explore more substantial time elements beyond the simple time frame of before and after.  Similarly, we would want to look at more subtle geographic variations in crime rates. This would preclude the difference in difference model in favor of another option.  


Similarly, the lack of controls for neighborhood demographic data concerns us. We cannot find such data at the neighborhood level.  We could have created a variable, day, that counts the days in relation to their relative position to a murder, similar to Graph 1.  We could then use this variable in a fixed effect specification.  Additionally, to test more effectively for geographic displacement, we would want to be able to look at geographic units smaller than the zip code level. However, with the data and tools accessible to us, this was impossible.  Finally, we should expand the study to include a broader range of cities. Any result based solely on Oakland data is not generalizable to the nation or criminology as a whole.


Our theoretical framework is based on the assumption that murder is a sufficiently heinous crime to illicit a strong response to it and therefore to induce a displacement effect in the short run. Ultimately, however, our experiment suggests that murder does not have a strong displacement effect. It simultaneously suggests that murder does not induce a contagion effect.




Becker, Gary S. 1968. “Crime and Punishment: An Economic Approach.” Journal of Political

Economy 76:169–217.

Chang, Dongkuk. 2011. “Social Crimes or Spatial Crime? Exploring the Effects of Social, Economical,

and Spatial Factors on Burglary Rates.” Environment & Behavior 43, no. 1: 26-52.

Gabor, T. 1990. “Crime Displacement and Situational Prevention: Toward the Development of Some

Principles.” Canadian Journal of Criminology 32: 41-73.

Glaeser, Edward L., Bruce Sacerdote, and José A. Scheinkman. 1996. “Crime and Social Interactions”

The Quarterly Journal of Economics 111, no. 2: 507-548.

Hesseling, René B.P. 1994. “Displacement: A Review of the Empirical Literature.” Research and

Documentation Centre, Ministry of Justice, The Netherlands. (accessed 7 April 2011)

Jacob, Brian, Lars Lefgren, and Enrico Moretti. 2007. “The Dynamics of Criminal Behavior: Evidence

from Weather Shocks.” Journal of Human Resources 42, no. 3: 489-537.

Narayan, Paresh Kumar, Ingrid Nielsen, and Russell Smyth. 2010. “Is There a Natural Rate of

Crime?.” American Journal of Economics and Sociology 69, no. 2: 759-782.

Reppetto, Thomas A. 1976. “Crime Prevention and the Displacement Phenomenon.”

Crime & Delinquency 22:166-177.

Tompson, Lisa, and Michael Townsley. 2010. “(Looking) Back to the Future: Using Space–time Patterns to Better Predict the Location of Street Crime.” International Journal of Police Science & Management 12, no. 1: 23-40.


The Economics of Economics Blogs

The Gist:  If you are a professor, the takeaway is that you want to have a department blog that you contribute to unless you are a prolific publisher with a horde of minions to do your bidding.

If you are a consumer of economic literature, you are not alone. Here is a list of econ blogs to start you off (google it).

Development Impact

Chris Blattman

NYT’s Economix

Marginal Revolution

Paul Krugman

The Statistic: Blogging about a paper causes a large increase in the number of abstract views and downloads in the same month: an average impact of an extra 70-95 abstract views in the case of Aid Watch and Blattman, 135 for Economix, 300 for Mararginal Revolution, and 450-470 for Freakonomics and Krugman.

The Article: The Economics of Economics Blogs:

Last week, the World Bank blog Development Impact wrote about the influence of economics blogs on downloads of research papers. It included, as well as 5 other blogs — Aid Watch, Chris Blattman, NYT’s Economix, Marginal Revolution, and Paul Krugman. Using stats from Research Papers in Economics, it found spikes after blogs cover a paper. For us, they found that when we blogged a paper, there was an additional 450-470 abstract views and downloads that month. Check out their cool graph:

(Courtesy of Berk Ozler and David McKenzie)

This is part of a series Development Impact is doing on economics blogs. Part Two is on whether a blog increases the blogger’s profile and whether that effects policy. Part Three, just posted on Sunday, measures the causal impact of econ blogs by “using a variety of data sources and empirical techniques, we feel we have provided quantitative evidence that economic blogs are doing more than just providing a new source of procrastination for writers and readers.”

So there you go, from the World Bank itself., changing the world one abstract at a time.

Research methodology for a Literature Review

This is my methodology / algorithm for completing a literature review on any given topic:


  1. I start with what is on Google Videos.
  2. I follow up with a visit to Wikipedia and a Google Search; it is a good idea to include “methodology” or “research methodology” in your search.
  3. Check out the ‘Annual Reviews’ Database at your local library website.
  4. 4.Finally I gather articles from the the appropriate EBSCO hosted Journals and other sources.


I found the following video was helpful:

How to Conduct a Literature Review by Barton Poulson




[Caveat: My process is tailored to my learning preferences (auditory / kinesthetic).]