The correlation between personal stats and winning

Understanding the concept of correlation with Lebron James

Hunter Carver
6 min readJul 7, 2021
Photo by JC Gellidon on Unsplash

LeBron James has proven over his career that he can carry any franchise to the NBA finals. Whether it was Cleveland, Miami, or Los Angeles, any team that LeBron has joined, has won a championship. It’s hard to define exactly what part of LeBron’s well-rounded game helps elevate his teams to championship levels. He can do it all from passing, rebounding, and scoring.

LeBron is undoubtedly one of the most versatile players to ever play the sport of basketball, but what part of his game helped his team win the most? Was it his incredible passing ability, his scoring prowess, or his humbleness to grab boards that has helped his teams win over the years?

Today, we are going to answer these questions by observing the correlation between LeBron’s assists, rebounds, and points to his team’s overall win percentage. Correlation is defined as any statistical relationship between two variables. A statistical relationship between variables simply means that if one of the variables changes, then we expect the other variable in the relationship to also change. When most people discuss correlation, they are referring to linear correlation. Linear correlation means that the statistical relationship between the variables is either positive or negative. A positive correlation between variables is a statistical relationship where when one of the variables increases or decreases the corresponding variable also increases or decreases. A negative correlation is a statistical relationship between variables where when one variable increases the other decreases.

In the table above, we have listed LeBron’s season averages and the changes between his averages during his first two seasons with the Los Angeles Lakers. We can see that as James’s average assist per game increases, so does the Lakers’ total wins. This would be an example of a positive correlation because as his assist increased over each season so did the Lakers’ team wins. Another example of a positive relationship we can see in this table is between LeBron’s average points and rebounds. Though both of these stats are decreasing, this is still a positive correlation as both of the variables are moving in the same direction.

A negative correlation that can be seen in the table is between LeBron’s average points per game and the number of wins the Lakers had in a season. This table shows the opposite relationship that probably many casual NBA fans would assume between scoring and winning. Based on this negative correlation, one could assume that if LeBron averaged less scoring, the Lakers would average more wins. However, they would also be assuming that correlation is causation; that’s not always the case.

There are many more correlations we could find in this table, but a simple rule to know whether a correlation is positive or negative is to simply look at how the variables changed between samples. In the table above, we used the color red to indicate that the variable decreased between samples and the color green to represent that it increased. Positive correlations between two variables will always have the same type of change between samples. In the example we went through above, we showed two positive relationships: One that was between two red variables (decreased in value), and one that was between two green variables (increased in value). The negative correlation between two variables will always have opposite types of change for each sample. We saw this above as our negative correlation was made between a green variable (increased in value) and a red variable (decreased in value).

The above table only deals with two of LeBron’s 20 seasons in the NBA. This makes it easy to make statements about which stats correlate to his team winning, but as we review more seasons, this task becomes more difficult.

In the table above, we have listed LeBron’s season averages during his time playing for the Miami Heat. By applying the trick of observing the type of changes between samples of two variables, we can see that LeBron’s average points once again had a negative correlation to team wins, while his assists had a positive correlation.

This trick doesn’t work for rebounds or turnovers because each stat has a row that doesn’t fit the pattern needed for a positive or negative correlation. We can see that in 2011, LeBron’s average rebounds increased but the total games won by the Heat decreased from the previous season. This indicates a negative correlation. Then, the following two seasons, we see a positive correlation between rebounds and team wins, as they either both increased or decreased between samples. This makes it challenging to make assumptions whether there is truly a positive or negative correlation between rebounds and team wins.

In this scenario, we need to find the correlation coefficient to confirm the type of statistical relationship that is defined between our variables. Correlation coefficients are numbers between (-1) and (1). They tell us the type and strength of the correlation between two variables. A correlation coefficient below 0 indicates a negative correlation, whereas a coefficient above 0 indicates a positive correlation. If a correlation coefficient is exactly 0, that indicates that there is no linear correlation between the variables.

A correlation coefficient can also tell us the strength of the statistical relationship between variables. A coefficient of -1 indicates that for every increase of one variable in a sample, the corresponding variable in the relationship decreased. A positive 1 correlation coefficient indicates that for every increase or decrease in a variable in the sample, the corresponding variable in the relationship also decreased or increased.

Example of Data with Correlation Coefficient of 1

For example, let us say we have a player that never misses a shot. The correlation coefficient between his field goals attempted and his points scored would be 1. This would be the case because every time his number of shots increased, his number of points would also increase.

Code implements the pearsonr method from Scipy

Real-world data rarely has a perfect correlation between variables. That’s why it’s important to calculate the correlation coefficient to understand the type and strength of relationships in your data. There are multiple ways to calculate the correlation between data. The most common approach is Pearson’s Correlation Coefficient Formula. Doing the math to find Pearson’s coefficients can be a tedious task and is best done by writing a quick program to do it for you.

Now that we understand correlation and its coefficients, let’s get back to the questions at hand: What part of LeBron’s game helped his team win the most? Was the skill that postured Lebron’s teams to succeed the same on each team?

Table showing the stat that had the highest positive correlation with team wins

It’s only fitting that LeBron’s value to a team can not be defined by one stat. While his amazing scoring abilities may have been the most important to win games throughout his career, he has also tailored his game to fit his team’s needs to win in the chase of championships. But after all this reading, I must leave you with the biggest caveat that comes with correlation. Sadly, Correlation is not causation.

--

--