Tag Archive

Tag Archives for " Data Science "

Hume’s Turkey: A Tale of Thanksgiving.

Like most turkeys, Hume's turkey lives on a farm. From this turkey's perspective life is good. Of course, there are bad days: the weather might turn cold, the older turkeys might box him out from the best seed and grass, a fox might get into the coupe. Life is good because of the farmer. The farmer loves his turkeys.

The farmer keeps the coup, which protects the turkey from the weather. The farmer defends the turkey from the fox. The farmer makes sure the turkey gets enough to eat. The turkey is happy because the farmer loves him.

The turkey lives his life, without concern for the future. The turkey has no way of knowing, based on experience, that he will soon be dinner.

We put ourselves at risk when looking to the past as a predictor of what will be. That is the problem with knowledge gained from induction, that is the turkey's dilemma. 

Bertrand Russell is credited with the idea of the turkey's dillema in highlighting the holes in David Humes arguments about knowledge gained by induction.

Adapted from The Black Swan, by Nassim Taleb.

How Great Leaders Inspire Action

Often I get a question that reveals a fundamental mistake we all make every day.

This is my answer.

It gets asked a lot. Maybe. Maybe it's just that to a hammer everything is a nail.

This is my hammer.

You might be leading a team, crafting the bullet points on your resume, making a recommendation on Facebook. What do you say to persuade others to take action?

What do you say?

Jamie Dimon puts it like this, "Leadership is relentless storytelling. We all forget. We all need to be reminded of our purpose." So,

What's your story?

​The Mistake That Is Costing You

​See if you catch yourself doing this the next time you have to write a performance evaluation. Do you focus on what you did? Take a look at your resume. What is the focus? Is it on what you did? Think of the last good movie you saw or book you read. In the comments below, write me a recommendation for why I should see / read it.

Are you tempted to give away the plot? You (and just about everyone else) is focused to much on the what.

You need to start with Why. Simon Sinek wrote a whole book on why. But you can get away with just watching the TED talk.

Anyone (and everyone) can tell you what they do.

Some can talk about how. This is 90% of what marketers will focus on: differentiating value proposition, proprietary process, secret sauce, or USP (unique selling proposition).

Very few people will talk about Why, Purpose, Cause, Belief. Why do you get out of bed in the morning? Why should anyone care? People don't buy what you do they buy why you do it. And the what serves as the proof of what you believe.

The goal is to do business with people who believe what you believe. It is not making money; that is a result. Start with Why, then follow it up with How and What.

How to Start with Why​

​I bought Start With Why hoping to learn how to start with Why. It's a good read. I've read and re-read it. The book comes up short at the end where Sinek up sells a $100 course to help you find your why. I opted to figure it out on my own. It took a little over a year to nail down a repeatable process. But it was free! If you are interested in learning how I did it, keep reading.

Articulating your Why is difficult. It's innately fuzzy. Your goal is to bring it into sharp focus. That might take some time. Just keep coming back to it.

My first attempts were heavily influenced by the Strengths Finder 2.0 Assessment. I highly recommend this exercise. It's free when you purchase the book, Strengths Finder 2.0. This is a great tool if you are looking for an objective measure of your innate strengths. Here is the catch, even as I explored my strengths I knew that strengths are not the same as purpose. This answers the question How, not Why. And that is valuable. Your strengths set you apart from the crowd, they are your USP, your secret sauce. So the exercise is worth taking the time to do. It's just not the final destination.

It's ironic that the question Simon Sinek leaves unanswered in his TED talk is answered by another TED talk: How to know your life purpose in 5 minutes by Adam Leipzig.

I'm sure you have heard of the idea of an elevator pitch. Adam goes through a series of questions so you can articulate yours in under five minutes: your own elevator pitch that starts with why. You can check out mine on my LinkedIn Profile. Or my Instagram Bio. Even the intro of my Facebook page.

​But that is not going to be enough. It isn't going to weather the storm. And the storms will come. Believe that.

What's It All For?

What's it all for? Tour bus, studio, and the fans

What's it all for? Chicks and whips with twenty inch rims

What's it all for? To feed my family and my friends

What's it all for? To change your world the best I can

~ What's It All For​

Bazaar Royale

The final piece came after I read Mark Manson's The Subtle Art of Not Giving A F*ck. I think he is a terrific writer. He utilizes a strong voice (and language, consider yourself warned).

I could not have come across his book at a better time, it was actually one of the most awful times in my life. I was suffering from shouldy values. One in particular, which I'll call "big house nice car." The house wasn't even that big, just bigger than mine. The car wasn't that nice, just nice to have. And somehow this justification made it less shallow. Have you ever heard someone say,

"When I'm done with school . . . "

"When I get that internship . . . "

"When I graduate . . . "

"When I get a job  . . . "

"When I get a better paying job that is actually in my field . . . "

"Once I make X per year . . .

. . . then I'll be happy / set / successful."

Have you ever heard those words? Was it you saying them? It was definitely me. These are dangerous words. They are a symptom of shouldy values.

The cure for shouldy values is to be mindful of great values. Your great values. Where I sit, people outsource this type of thinking to religion. Don't. For sure this is a great pool to draw from but, do your own homework. It's actually a fun exercise.

The Cure For Shouldy Values​

You are going to fill out bracket for your values. Think March Madness, but with values.

I've started you off with a pretty good list of values. Go through this list or feel free to google "list of values" yourself. Write down the ones that resonate with you on a post-it notes. One per note. Don't hold back, if a word catches your eye, write it on a post-it.​

​Don't skip this next step because if you do you will need to start over. You should have a pile of post-it notes now. Divide these post-its into categories: Personal Attributes, Activities, Relationships, Things, Skills. It's easiest to do this with Post-It notes. You might value certain categories more than others. It will become clear why this matters in the last step.

Finally you are going to fill a bracket hold a tournament (think March Madness) for each category. Each value is going to compete against another value. Consider a pair of Post-It notes. If you canonly have one, which do you choose to keep? The keeper advances to the next round. Do this until you have a clear winner for each category.

The categories are important because you might value one category over another. I found this was the case with me. Knowledge is one of my top values, but it is a thing and of all the categories I value things the least. Knowledge ended up getting killed by values in other categories. I saw it dropping down the ranks and knew something needed to change.

Which, brings me to the final step. The bracket is designed to spur your own intuition. It's not designed to be an objective measure. This isn't Highlander, there can be more than one. I actually created a top ten list for each category. You might even need different categories, if you do I'd like to know about it. Really. Tell me about it.

Tell Me Your Story.​

​Jamie Dimon once said, "Leadership is relentless storytelling. We all forget. We all need to be reminded of our purpose. So,

Tell me your story.

Now that you have done all of this work,

what story is yours to tell?

It's a story about you, your tribe, and what you do for them.

It starts with what you believe, what you value. But it can't just be about you. Your story is about what you do to help your tribe to reach their ideal, achieve their goals, or overcome their trials. You need to be really clear about who you are, what you believe and how you stand out. You need to be clear so you can clearly communicate to your tribe. How else will they know? If they share your values your Why will resonate with them. This is your contribution, what you do to serve them.​

Difference in Difference Estimation [Notes]

difference in difference estimation

Difference in Difference estimation is a linear regression methodology used to analyse the effect of some event in time (the treatment) by comparing results over time. The idea is to compare data before and after the treatment. If the treatment was effective then you will find material differences in the outcomes for the target variable, or Y.

Before and After. This is a basic comparison of means for the time periods before and after treatment. For this specification we assume the target (Y) in the post-treatment period is equal to the target (Y) in the pre-treatment period in the absence of treatment. Thus, any change is attributable to the treatment.

Difference in Difference estimation is the natural extension of the before and after analysis is to include a control group for comparison. The difference in difference specification allows us to do this. We can compare Y as we did for the before and after analysis. We can also compare Ys between treated and untreated groups. To complete a difference in difference specification we use two dummy variables that partition the sample into four groups. The first dummy variable, treatment, partitions the sample in two halves based on their treatment status. The second dummy variable, post, partitions the halves in quarters based on the time period. We then interact treatment and post (Post*Treatment); the coefficient on Post*Treatment estimates the statistical difference in Y.

​The Model:

Y = B0 + B1*Post + B2*Treatment + B3*Post*Treatment.

  • Y, is the target variable we are interested in estimating.
  • Post, is a binary dummy variable indicating whether an observation is in the post treatment period.
  • Treatment, is a binary dummy variable indicating whether an received treatment or not.
  • B3 is the estimator, the coefficient on Post*Treatment.

Data Savvy Managers Have 6 Skills Tech StartUps Look For.

McKinsey says there will be a shortage of data skills in 2018. Mckinsey predicts a shortfall in meeting the demand for 1.5 Million Data Savvy Managers. Savvy managers can make use of data on the execution side, putting insight into context and making things happen.

A major hurdle to iterating and improving strategic data driven decision making is people. Data analytics is pretty straight forward; i.e. math is just that, math. It's people (humans) that is the problem. Which means people (could that be you?) are the solution. Data science relies heavily on statistical computing. Scripts and math. Algorithms. If (1) you start with good data and (2) you have a competent data scientist conduct and interpret the analysis, you still need (3) to put those results into context; make something happen. Someone has to do! Teams (doers) need to execute on insights.

Here are six skills tech startups are looking for in a data savvy manager:

Listen. Understand the problems your team, senior, & mid-level managers are facing.

Ask great questions. Frame the problem into a set of questions that, if answered, direct action. Understand (& communicate) that decisions must be made once these questions are answered.

Understand data science. Take a survey level course on data science. LinkedIn Learning offers a course that you can get through in an afternoon. When you understand the process you can ask actionable questions that lend themselves to be answered with a data model.

Evaluate alternatives. Data often suggests multiple approaches; assemble the right team that can prioritize them.

Acknowledge and mitigate bias. Team members have (and use) inherent bias. Teams that manage GroupThink will naturally make better evaluations.

Catalyze change. Communicate and empower decisions throughout the organization. Building the architecture need for changes to take place.

These six skills are crucial to developing processes that:

(1) generate meaningful questions

(2) pose those questions effectively

(3) build understanding around data driven decisions

(4) create a culture that can implement those decisions.

Data Science requires rare (specialist) qualities:

(1) an ability to take unstructured data and find order, meaning, and value.

(2) Deep analytical talent.

​Data Savvy doesn't.

To be a generalist, a data savvy manager, doesn’t. Data savvy doesn't require you to be a math expert,

learn more @ www.assume-wisely.com/data-savvy-manager

Why Your Salary Will Always Be Below Average

By Rho Lall

Have you been on glassdoor lately? Maybe you’ve tried the Payscale salary calculator? Is your take home pay below average? Chances are it is. Use the calculator below and find out.

It’s ok . . . i’ll wait.

Did you check it out? Is that surprising? Before you start planning how to bring this up with your boss you might want to take a second look at that number. I'll explain.

Income follows a power law distribution.

There are two issues with this number. First you will run into trouble if you look for averages where there aren't any. Income follows a power law distribution.

What’s that?

If you have heard of Pareto’s 80/20 rule, that is a power distribution. For income, 80% of the income is earned by 20% of people. Don’t take that literally. 

If we plot out income (see image above) you would see a small number of people (in green) earn a disproportionately larger amount of money relative to everyone else (in yellow). 

If you try to take the average of a set of incomes (any power distribution) your average will wildly misrepresent the truth. It's going to underestimate a small number of people, and overestimate the majority. The average (in blue) makes it seem seem like higher incomes are more common than they are in actuality. Case and point:

Bill Gates walks into a bar and everyone inside becomes a millionaire . . .
 . . . on average.

Accurate, real-time salaries for thousands of careers.

So when you or someone else pulls up a report on glassdoor and circles the average salary, it is likely not telling the whole story.

But. You might ask, what if Bill Gates doesn’t walk into the bar? What if in this bar we only have locals who all work the same job. I like where your head's at. You might be onto something. But no, you’re not.

Income follows a power distribution even on a localized scale, it's just less noticeable. Let's look at SaaS Implementation Consultants in Provo, UT (see right). The average is $50,800. But look at the range. The low is $39K and the high is $78K. There are a few highly paid individuals driving the average up but most consultants probably earn less than 50K. In full disclosure I don’t know. But the point is neither do you.

The average is not representative of this sample. Let alone the salaries that were not reported.

Implementation consultant earn $50,800  in Provo, UT are on average.

Average is not the same as usual and customary.

Here is the second issue. What do you think of when I say average? When we talk averages, most people assume it's a mean. Most people would agree that average and mean are synonymous. That is not the case. An average doesn't have to be a mean. You can google the definition: a number expressing the central or typical value in a set of data, in particular:

the mode, median, or mean

When you read about an average, you could be reading about one of three different measurements. It's easy to be mislead. The government reports median income. Median is the middle number: 50% earn above median, 50% below. But what if I want to know what salary is usual and customary? What do most people make? This is the mode. If you want to get a sense of where the long tail on the power law distribution falls, the mode would work best. It will tell you what the most common salary is. That could be useful.

The lesson:

Don’t hang your hat on average salary. First, averages don’t fit the data very well. You can take the average, that doesn’t mean you should. Second, when you see an average take steps to learn what kind of average it is. Personally, I find the bookends, the high and low values of a range, to be more useful.

Do you want to learn more? If you a SaaS professional that struggles with aligning your team & getting to the truth then you have come to the right place. Find out how to use averages, bookends, and other KIPs to make better use of your data so you can . . .

Confront The Deluge of Information.

Perfect for people that want to become leaders! You don’t have to be an expert math person to be data literate - Download the FREE report.

Why would you want to learn to “Bull Doze Through Bull Sh*t”?​

  • Would you benefit from a deeper knowledge from your data? Probably.
  • Do statistics and data analysis intimidate you? It intimidates most people.
  • Do you want to be able to make use of all the data you have access to, so that you can make better business decisions? Of course you do!

Stop letting your fear of “number crunching” keep you from learning what is actually true. Sign up for my newsletter, and download my FREE Report on making sense of data without becoming a math expert!

Confront The Deluge of Information.


Perfect for people that want to become leaders! You don’t have to be an expert math person to be data literate - Download the FREE report.

Why would you want to learn to “Bull Doze Through Bull Sh*t”?

Would you benefit from deeper knowledge from your data?

Do statistics and data analysis intimidate you?
It intimidates most people.

Do you want to be able to make use of all the data you have access to, so that you can make better business decisions?
Of course you do!
Stop letting your fear of “number crunching” keep you from learning what is actually true. Sign up for my newsletter, and download my FREE Report on making sense of data without becoming a math expert! Powered by ConvertKit

Excel – Clean, Trim, and other useful formulas to Cleaning up text (data).

Clean, Trim, & useful formulas to Clean up text (data).

When working with data it is often necessary to ‘clean’ the data first so that you can work with it.  Often formatting is distracting or inconsistent, there may be extra spaces between words and on occasion there may be special characters that are not visible but, get in the way of working with data. Sometimes data is not in a format that you can use. This data needs to be prepped for use. Sometimes this is called data processing; we refer to it polishing data. Two common formulas for cleaning data include:

=clean()  Removes special characters from your cell values. 

=trim()  Removes extra spaces between words and at the ends of your cell value.

This video goes into how to use these two functions.

Murder: Crime Displacement over Time and Space

In May of 2004 a coordinated search of Inwood Hills Park ended when police discovered the badly decomposed remains of Sarah Fox.  Fox had disappeared days before, shortly after she left her apartment for an afternoon jog.  In the days that followed, the neighborhood experienced a heightened level of caution, police presence in the area increased and media and news organizations held a constant vigil across the street from Fox’s apartment complex. It is a romantic suggestion that a neighborhood will come together in tragedy, albeit temporarily.  However, anecdotal evidence does suggest that under such circumstances the criminal element will lay low and bide its time until the heightened level of caution and police presence subsides.  While this makes intuitive sense, is there evidence to support such a conclusion?  This paper is the result of simple question: in the short term, is a neighborhood safer after a murder?  Is there a causal relationship between murder and common street crime?


The literature suggests two theories on criminal behavior: contagion effect and displacement effect.  These effects provide a theoretical explanation as to why the traditional choice model of constrained optimization does not fully explain the observable variability in crime across space and time.  Under the constrained optimization model, criminals seek to maximize utility given a fixed constraint such as civil and criminal penalties.  Criminals will commit crimes when the expected benefits exceed the expected costs (Becker 1968). Chang indicates that criminals look for easy opportunities; burglars prefer homes, large commercial buildings, or buildings with alleyway access over more closely guarded buildings (2011). This suggests that criminals consider costs and benefits and look to optimize benefits under risk constraints.  However, neighborhood demographics alone do not account for the variability in crime across time and space. Glaeser states, “Positive covariance across agents’ decisions about crime is the only explanation for variance in crime rates higher than the variance predicted by differences in local conditions” (Glaeser 1996, 508).  The gap in explanatory power suggest that there are other significant forces beyond the traditional choice model.


Narayan, Nielsen, and Smyth’s work in the American Journal of Economics and Sociology indicates that there is a natural rate of crime that remains impervious to efforts to reduce crime rate in the long-run (2010). They call short-run variations from the long-run natural crime rate structural breaks.  Contagion and displacement effects are the principal sources of these deviations from the natural rate of crime in a given time and space.


Glaeser explains contagion theory as peers’ influence overshadowing the criminal’s innate disposition towards crime (1996). Peers influence a criminal’s preferences by reducing stigma or increasing social distinction. Through misinformation, peers distort a criminal’s frame of reference and understanding of the costs and benefits. Peers can relax constraints by assisting directly or providing some technology or special knowledge that decreases a criminal’s likelihood of being caught (Glaeser 1996). Contagion effects involve copycats and retaliatory crimes as well (Jacob 2007). When peers’ actions influence a criminal’s decision to act there is a multiplier effect; the neighborhood’s actual crime rate will increase beyond a predicted crime rate based solely on neighborhood demographics (Glaeser 1996).


Hesseling explains the displacement effect as an inelastic demand for crime: “offenders are flexible and can commit a variety of crimes; and the opportunity structure offers unlimited alternative targets” (Hesseling 1994, 220). Criminals will not abandon their criminal pursuit at infinitum when encountering an obstacle. Instead, they will seek another opportunity to commit a crime with a similar cost benefit analysis. Criminals alter their course of action in order to bypass “conditions unfavorable to the offender’s usual mode of operating” (Gabor 1990, 66 in Harding 1994). The crime is displaced to “other times, places, methods, targets or offense” (Repetto 1976 in Hesseling 1994, 198). Individuals have a specific desired outcome for committing a crime. Hesseling’s detailed overview of related literature supports the rational choice aspect of displacement theory.


Displacement can occur in a variety of ways making it difficult to measure and account for. Researchers examine the effect case by case in discussions about motivations and strategy with criminals or in measuring the statistical impact of a policy on crime reduction in a myriad of times and venues (Hesseling 1994). We employ the later methodology.


We assume that murder, our quasi-experimental treatment, is particularly newsworthy and sufficiently shocking for the neighborhood to react strongly. We predict that we should see an increase in police presence and general wariness among the local population. Inhabitants of the neighborhood will be more cautious and act so as to limit their vulnerabilities for becoming victims themselves. We expect this to have a negative effect on street crime. Criminals will restrain illicit activities until such time where the community and police force are not as alert. Criminals will displace crime into a different area or into a later time period. Murder should have a displacement effect on street crime in the short run.  


Oakland, the eighth largest city in California, is rife with crime. It is home to 390,724 people. Since 2007, there have been 874 murders, and 105,589 crimes committed, that is a crime for every four people. Given the high level of crime, it will be the source for our experiment.


The Oakland Crimespotting project compiles data on crimes from the City of Oakland’s CrimeWatch program which publicizes data from daily police reports. Crimespotting combines crime data and mapping in a convenient matter.  The data comes in a format with a single observation for each crime.  Each observation includes variables that identify the type of crime, location (longitude and latitude), and date when it occurred.  This data is current to the day, and it extends back through 2007. Crimespotting divides the data into thirteen categories: aggravated assault, murder, robbery, simple assault, disturbing the peace, narcotics, alcohol, prostitution, theft, vehicle theft, vandalism, burglary, and arson.  However, Crimespotting does not include data on weather conditions.  It is plausible that weather would influence crime.  Weather Underground offers historic data on the temperature and level of precipitation.


We aggregate the data because is too granular to work with in its raw state.  We sum the observations across zip codes and by date in order to obtain the total amount of crime for each day in each zip code (or neighborhood). We identify the dates of all murders and subsequently calculate the average amount of crime that occurs in the following week as well as for the week prior.  


We apply the same process for those neighborhoods that did not experience a murder in order to determine the crime rates in neighborhoods in which a murder did not occur. From this second group we select neighborhoods that are comparable in terms of their crime rates. In order to ensure that we match similar neighborhoods, we restrict the data to a range of two standard deviations from the mean number of crimes committed per day in a zip code with a murder. To determine these cutoff points, we identify the average neighborhood crime rate conditional on there being a murder, which is 7.5.  The lowest average crime rate was 2.8, and the highest was 15 crimes a day. We drop from our data those neighborhood time period combinations that have an average of fewer than 4.5 crimes a day and neighborhoods that have greater than, on average, 10.5 crimes a day. Such a restriction would eliminate neighborhoods on both ends of the crime spectrum. After all, murders in an extremely crime ridden area would likely not illicit the same dramatic response as murders in a more moderate neighborhood. Additionally, neighborhoods with historically low crime rates will not boast substantial enough crime rates to generate any responsible understanding of causal effects.


Our model uses two specifications to determine if street crime in Oakland has any sensitivity to murder:  before and after, and difference in difference.  Both the before and after and difference in difference specifications assume the counter-factual that crime rate will remain unchanged in the absence of a murder in the area in which a murder took place.  The difference in difference also assumes that the general crime-rates in both the treatment and non treatment areas are similar; the standard errors for both groups are statistically equivalent.  If the standard errors are different or if we detect contagion or displacement effects the specification is unsubstantionable.  We check these assumptions with an OLS specification with a lagged variable for when a murder occurs to see its effect on crime.


insert equation 1


Preliminarily, we employ the before and after approach and look at only the neighborhoods that have murders.  We compare the crime rates before and after each murder.  This is a basic comparison of the average crime rates in the time periods before and after each murder. For this specification we assume that the crime rate in the post-treatment period is equal to the pre-treatment crime rate in the absence of a murder.  Without any shocks or intervening variables, crime rate should maintain its historical trend.  Thus, any change in the crime rate is attributable to the murder.


The natural extension of the before and after analysis is to include a control group for comparison.  The difference in difference specification allows us to do this.  We can compare the average crime rates before and after each murder.  We can also compare crime rates between  neighborhoods with murders and those without.  To complete a difference in difference specification we use two dummy variables that partition the sample into four groups.  The first dummy variable, treatment, partitions the sample in two halves based on their treatment status.  The second dummy variable, post, partitions the halves in quarters based on the time period.


insert equation 2


We identify neighborhoods in Oakland that match in terms of crime rates and geographic size. We assume that demographic and macroeconomics variables (economic prosperity, level of unemployment, ethnicity, and association with a given crime group) are held constant across the two neighborhoods during the week long intervals before and after a murder. This is a necessary evil given the lack of data available on such indicators on a scale smaller than the city level.


We control for several variables.  We assume that crime rates vary based on the date and whether school is on holiday; so, we control for the date a crime was committed. Additionally, we control for the particular neighborhood (zip code). Existing literature indicates that crime rates are highly correlated to drug use. We generate a dummy variable for the first seven days after the first of the month. Government distributes entitlements on the first of the month; therefore, drug addicts will likely spend this disposable income on drugs and spend the next couple of days in a drug induced stupor. Crime should decrease over that short time period. We control for weather, temperature, and precipitation.  It is likely that on hot, dry days crime will increase. However, on cooler or rainy days criminals are more likely to stay of the streets.


insert equation 3


Our results indicate that there is no causal relationship between murder and street crimes. (* Indicates a value that is not statistically significant.) Graph 1 depicts the average daily crime rate leading up to and after 40 murders in our sample. The red vertical line is day the murders happen. The black line represents total crime and the other lines represent corresponding crime categories: red for violent crime, green for crimes relating to property, and blue for miscellaneous crimes. This is visual evidence that murder does not have a causal effect on crime.





The before and after approach reveals that there is not sufficient preliminary evidence to support our hypothesis. A murder in Oakland decreases other street crimes by one twentieth of a crime, .05*. The average week in an Oakland neighborhood sees 8.58 crimes a day.  During the week following a murder, in a given neighborhood street crime drops to 8.53* crimes a day.


The difference in difference specification does not provide evidence to support our hypothesis.  Murder has a coefficient of 1.3757 (.3008) suggesting that murder, as a treatment, actually is positively correlated with crime rates. A post-treatment time frame variable indicates, on contrast a slightly negative correlation with crime rates. The interaction term between these two variables, and a more holistic view of the difference in difference effect, is .1805* (.4250). While ever so slightly positive, this correlation is statistically insignificant with a relatively large standard error and a particularly small t-statistic (.425). Indeed, only the treatment and the temperature control have a standard error large enough to prove significant (4.574 and 5.280 respectively).


We note that precipitation, first of the month dummy, and multiples homicides variables boarder on statistical significance. With a t-statistic of -1.806 and a coefficient of -.1802, multiple murders (or a weighted impact of murder) has a negative correlation to crime. On a rainy day, one can expect a small blimp in crime rates. Additionally, with a coefficient of .5362 (.2781), the first of the month dummy variable is positively correlated to crime rates. T-statistics less than 2 render the results dubious. See Table 1 for more details.


Table 1

Estimated Coefficient Standard Error T-value
Murder 1.375720 0.300775 4.574
Post 0.032416 0.301222 0.108
MurderPost 0.180461 0.425041 0.425
Temperature 0.076992 0.014583 5.280
Precipitation 0.010179 0.005694 1.788
First Week 0.526237 0.278059 1.893
Multiple Murders -0.180162 0.099745 -1.806


To check for the robustness of our model, we replace murder with another high profile crime: arson. Like murder, arson is positively correlated to high rates of crime with a coefficient of 3.598. When interacting with the after treatment variable, arson appears to have a very slight negative effect on crime rates (-.049)*.


Previously, we looked at crime as a whole; however, to analyse subtle effects we look at how murder impacts categories of crime. Murder is strongly correlated with violent crimes; however, it is more weakly correlated with property crimes and what we refer to as miscellaneous crimes: disturbing the peace, prostitution, alcohol, and narcotics. The difference in differences mechanism reveals a similar trend, albeit a statistically insignificant one. Murder has a small negative effect on future instances of miscellaneous and violent crimes; however, there was a very small positive correlation with property crimes. See Table 2 for specific figures.


Table 2

Types of crime Estimated Coefficient of Treatment Estimated Coefficient of Post Estimated Coefficient of TreatmentPost
Violent 1.315547 (0.125213) 0.030097* (0.125213) -0.137175*


Property 0.865536 (0.154010) 0.041623* (0.154010) 0.170580*


Miscellaneous 0.809375 (0.095011) 0.018875* (0.094612) -0.011316* (0.133721)


If our hypothesis were to hold, murder should have a more dramatic effect on crime rates during the short term (a two day time frame) compared to a long term (a 30 day time frame). However, both time frames yield similar results -.3686* and -.30036* respectively. While, the long term effect is slightly smaller, the difference is insignificant. However, the t-values are -.626 and -.738 respectively.


Our initial question has also gone unanswered due to lack of evidence.  Our simple question was not adequate to illuminate such a complex issue.  We found that murder is strongly correlated with crime, however, our results on the effects of murder on post-murder crime rates proved statistically insignificant and extremely small. The process was valuable in helping us form better ideas about pursuing this area of study.


Fundamentally, we need to ask a different question.  We learned that crime is a complex social issue with a lot of variability.  Looking at a short time frame was not beneficial in reducing the impact of this variability.  In light of our research into contagion and displacement effects, had we more time we would revisit our empirical specification.  To assume away contagion and displacement effects on the basis that they were unobserved is ill-advised. To rectify this we would want to explore more substantial time elements beyond the simple time frame of before and after.  Similarly, we would want to look at more subtle geographic variations in crime rates. This would preclude the difference in difference model in favor of another option.  


Similarly, the lack of controls for neighborhood demographic data concerns us. We cannot find such data at the neighborhood level.  We could have created a variable, day, that counts the days in relation to their relative position to a murder, similar to Graph 1.  We could then use this variable in a fixed effect specification.  Additionally, to test more effectively for geographic displacement, we would want to be able to look at geographic units smaller than the zip code level. However, with the data and tools accessible to us, this was impossible.  Finally, we should expand the study to include a broader range of cities. Any result based solely on Oakland data is not generalizable to the nation or criminology as a whole.


Our theoretical framework is based on the assumption that murder is a sufficiently heinous crime to illicit a strong response to it and therefore to induce a displacement effect in the short run. Ultimately, however, our experiment suggests that murder does not have a strong displacement effect. It simultaneously suggests that murder does not induce a contagion effect.




Becker, Gary S. 1968. “Crime and Punishment: An Economic Approach.” Journal of Political

Economy 76:169–217.

Chang, Dongkuk. 2011. “Social Crimes or Spatial Crime? Exploring the Effects of Social, Economical,

and Spatial Factors on Burglary Rates.” Environment & Behavior 43, no. 1: 26-52.

Gabor, T. 1990. “Crime Displacement and Situational Prevention: Toward the Development of Some

Principles.” Canadian Journal of Criminology 32: 41-73.

Glaeser, Edward L., Bruce Sacerdote, and José A. Scheinkman. 1996. “Crime and Social Interactions”

The Quarterly Journal of Economics 111, no. 2: 507-548.

Hesseling, René B.P. 1994. “Displacement: A Review of the Empirical Literature.” Research and

Documentation Centre, Ministry of Justice, The Netherlands.

http://www.popcenter.org/library/crimeprevention/volume_03/07_hesseling.pdf (accessed 7 April 2011)

Jacob, Brian, Lars Lefgren, and Enrico Moretti. 2007. “The Dynamics of Criminal Behavior: Evidence

from Weather Shocks.” Journal of Human Resources 42, no. 3: 489-537.

Narayan, Paresh Kumar, Ingrid Nielsen, and Russell Smyth. 2010. “Is There a Natural Rate of

Crime?.” American Journal of Economics and Sociology 69, no. 2: 759-782.

Reppetto, Thomas A. 1976. “Crime Prevention and the Displacement Phenomenon.”

Crime & Delinquency 22:166-177.

Tompson, Lisa, and Michael Townsley. 2010. “(Looking) Back to the Future: Using Space–time Patterns to Better Predict the Location of Street Crime.” International Journal of Police Science & Management 12, no. 1: 23-40.


How to Learn (Not Just) a Language Quickly

The Gist. I had learned so much German by using one idea: errors. I was happy to make them. I teach it in all my science and engineering courses as the method of lumping (for example, Chapter 3 of Street-Fighting Mathematics, which is a freely licensed book). This method is based on the idea that in order for our minds to manage the complexity of the world, we must throw away information skillfully — by discarding the least useful information first. The motto: “When the going gets tough, the tough… lower their standards.”

The Article. How to Learn (Not Just) a Language Quickly

My Inflation Adjusted Two Cents. I am happy to make mistakes, most of the time. It remains a painful process. When a iron, orange-hot in emotional flames, is pressed to your soft tissue it is hard to learn anything beyond, “NEVER DO THAT AGAIN”. At that point mental horsepower is irrelevant, emotional intelligence matters.

The Economics of Economics Blogs

The Gist:  If you are a professor, the takeaway is that you want to have a department blog that you contribute to unless you are a prolific publisher with a horde of minions to do your bidding.

If you are a consumer of economic literature, you are not alone. Here is a list of econ blogs to start you off (google it).

Development Impact


Chris Blattman

NYT’s Economix

Marginal Revolution

Paul Krugman

The Statistic: Blogging about a paper causes a large increase in the number of abstract views and downloads in the same month: an average impact of an extra 70-95 abstract views in the case of Aid Watch and Blattman, 135 for Economix, 300 for Mararginal Revolution, and 450-470 for Freakonomics and Krugman.

The Article: The Economics of Economics Blogs:

Last week, the World Bank blog Development Impact wrote about the influence of economics blogs on downloads of research papers. It included Freakonomics.com, as well as 5 other blogs — Aid Watch, Chris Blattman, NYT’s Economix, Marginal Revolution, and Paul Krugman. Using stats from Research Papers in Economics, it found spikes after blogs cover a paper. For us, they found that when we blogged a paper, there was an additional 450-470 abstract views and downloads that month. Check out their cool graph:

(Courtesy of Berk Ozler and David McKenzie)

This is part of a series Development Impact is doing on economics blogs. Part Two is on whether a blog increases the blogger’s profile and whether that effects policy. Part Three, just posted on Sunday, measures the causal impact of econ blogs by “using a variety of data sources and empirical techniques, we feel we have provided quantitative evidence that economic blogs are doing more than just providing a new source of procrastination for writers and readers.”

So there you go, from the World Bank itself. Freakonomics.com, changing the world one abstract at a time.

Research methodology for a Literature Review

This is my methodology / algorithm for completing a literature review on any given topic:


  1. I start with what is on Google Videos.
  2. I follow up with a visit to Wikipedia and a Google Search; it is a good idea to include “methodology” or “research methodology” in your search.
  3. Check out the ‘Annual Reviews’ Database at your local library website.
  4. 4.Finally I gather articles from the the appropriate EBSCO hosted Journals and other sources.


I found the following video was helpful:

How to Conduct a Literature Review by Barton Poulson




[Caveat: My process is tailored to my learning preferences (auditory / kinesthetic).]