Double Dubs at Systematic HR has post today asking whether Google's much-ballyhooed algorithm for recruitment is really old fish wrapped in new paper. I've studied a lot more statistics than most people, and they've made me more skeptical than anything else of these types of tools.

Statistical analyses are kind of like recipes: the more variables you put in the mix, the more opportunities there are to mess things up. If you show me a regression with more than 2 or 3 variables and can't explain what heteroskedasticity* (for instance) means, I'm going to start getting skeptical about your forecasting technique. Howard's comment on the post linked above with its example of "hundreds of variables" makes me wince. Especially if this was in the 80s, when computing time was much dearer--today you can do in a few minutes what might have taken hours, so you're more likely to try lots of scenarios and models before accepting the conclusions.

The real problem, as always in this game, is the choice of variables. A successful retail cashier, for instance, can probably be defined very well in terms of integrity, accuracy, diligence, and attitude. If you generally show up on time, don't steal, treat customers nicely when you're haing a bad day, and count change carefully, then you're doing great. With a population of hundreds of thousands of workers to survey, a good predictive assessment is very attainable.

As jobs become more specialized in terms of skills and knowledge, it becomes harder to build a representative sample pool from which reliable assumptions can be drawn. While large companies theoretically have an advantage here, my experience has been that organizational opacity increases with size, and determining what and who generates value becomes harder.

Evidence for this comes from the widely-shared sense of futility around the traditional performance review process. Forget about recruiting--many companies still have a very hard time quantifying the value of an employee just crossing her one-year anniversary, and where she will be in 12, 24, or 36 months. Perhaps once a company has that part down reasonably well they can start thinking about predicting success among people they barely know.

The reason for this is that as you move farther up the education ladder, the effects on career become more complicated. While a Bachelor's degree opens up a world of opportunities, most of which are better-paying, many of the people earning higher degrees are doing so not to make more money, but in order to access jobs (like a university professorship) which are rewarding in completely different terms, and in many cases actually worse-paying. This is "heteroskedastic error" in a nutshell.

In order to accurately assess the effect of years of education on income, we need to control for this variation either by focusing on one type of job, or by using a more sophisticated modeling approach that controls for this type of variation. As it happens, the solution to this problem earned the economist Robert Engle a Nobel about five years ago.

Statistical analyses are kind of like recipes: the more variables you put in the mix, the more opportunities there are to mess things up. If you show me a regression with more than 2 or 3 variables and can't explain what heteroskedasticity* (for instance) means, I'm going to start getting skeptical about your forecasting technique. Howard's comment on the post linked above with its example of "hundreds of variables" makes me wince. Especially if this was in the 80s, when computing time was much dearer--today you can do in a few minutes what might have taken hours, so you're more likely to try lots of scenarios and models before accepting the conclusions.

The real problem, as always in this game, is the choice of variables. A successful retail cashier, for instance, can probably be defined very well in terms of integrity, accuracy, diligence, and attitude. If you generally show up on time, don't steal, treat customers nicely when you're haing a bad day, and count change carefully, then you're doing great. With a population of hundreds of thousands of workers to survey, a good predictive assessment is very attainable.

As jobs become more specialized in terms of skills and knowledge, it becomes harder to build a representative sample pool from which reliable assumptions can be drawn. While large companies theoretically have an advantage here, my experience has been that organizational opacity increases with size, and determining what and who generates value becomes harder.

Evidence for this comes from the widely-shared sense of futility around the traditional performance review process. Forget about recruiting--many companies still have a very hard time quantifying the value of an employee just crossing her one-year anniversary, and where she will be in 12, 24, or 36 months. Perhaps once a company has that part down reasonably well they can start thinking about predicting success among people they barely know.

*** Can't help yourself? Here goes:***Heteroskedasticity*is actually more complicated to spell than it is to understand at a basic level. A great example is looking at education versus income, as seen on this chart. Going from a HS diploma to a 4-year degree is huge, but the incremental improvement at each step beyond is less obvious (especially relative to the time/money investment). Most notable is that a Ph.D. (which takes 4-8 years) is worth less than a professional degree which requires between 2 and 4.The reason for this is that as you move farther up the education ladder, the effects on career become more complicated. While a Bachelor's degree opens up a world of opportunities, most of which are better-paying, many of the people earning higher degrees are doing so not to make more money, but in order to access jobs (like a university professorship) which are rewarding in completely different terms, and in many cases actually worse-paying. This is "heteroskedastic error" in a nutshell.

In order to accurately assess the effect of years of education on income, we need to control for this variation either by focusing on one type of job, or by using a more sophisticated modeling approach that controls for this type of variation. As it happens, the solution to this problem earned the economist Robert Engle a Nobel about five years ago.