Income, obesity, and heart disease in US states

The figure below combines data on median income by state (bottom-left and top-right), as well as a plot of heart disease death rates against percentage of population with body mass index (BMI) greater than 30 percent. The data are recent, and have been provided by CNN.com and creativeclass.com, respectively.


Heart disease deaths and obesity are strongly associated with each other, and both are inversely associated with median income. US states with lower median income tend to have generally higher rates of obesity and heart disease deaths.

The reasons are probably many, complex, and closely interconnected. Low income is usually associated with high rates of stress, depression, smoking, alcoholism, and poor nutrition. Compounding the problem, these are normally associated with consumption of cheap, addictive, highly refined foods.

Interestingly, this is primarily an urban phenomenon. If you were to use hunter-gatherers as your data sources, you would probably see the opposite relationship. For example, non-westernized hunter-gatherers have no income (at least not in the “normal” sense), but typically have a lower incidence of obesity and heart disease than mildly westernized ones. The latter have some income.

Tragically, the first few generations of fully westernized hunter-gatherers usually find themselves in the worst possible spot.

Low nonexercise activity thermogenesis: Uncooperative genes or comfy furniture?

The degree of nonexercise activity thermogenesis (NEAT) seems to a major factor influencing the amount of fat gained or lost by an individual. It also seems to be strongly influenced by genetics, because NEAT is largely due to involuntary activities like fidgeting.

But why should this be?

The degree to which different individuals will develop diseases of civilization in response to consumption of refined carbohydrate-rich foods can also be seen as influenced by genetics. After all, there are many people who eat those foods and are thin and healthy, and that appears to be in part a family trait. But whether we consume those products or not is largely within our control.

So, it is quite possible that NEAT is influenced by genetics, but the fact that NEAT is low in so many people should be a red flag. In the same way that the fact that so many people who eat refined carbohydrate-rich foods are obese should be a red flag. Moreover, modern isolated hunter-gatherers tend to have low levels of body fat. Given the importance of NEAT for body fat regulation, it is not unreasonable to assume that NEAT is elevated in hunter-gatherers, compared to modern urbanites. Hunter-gatherers live more like our Paleolithic ancestors than modern urbanites.

True genetic diseases, caused by recent harmful mutations, are usually rare. If low NEAT were truly a genetic “disease”, those with low NEAT should be a small minority. That is not the case. It is more likely that the low NEAT that we see in modern urbanites is due to a maladaptation of our Stone Age body to modern life, in the same way that our Stone Age body is maladapted to the consumption of foods rich in refined grains and seeds.

What could have increased NEAT among our Paleolithic ancestors, and among modern isolated hunter-gatherers?

One thing that comes to mind is lack of comfortable furniture, particularly comfortable chairs (photo below from: prlog.org). It is quite possible that our Paleolithic ancestors invented some rudimentary forms of furniture, but they would have been much less comfortable than modern furniture used in most offices and homes. The padding of comfy office chairs is not very easy to replicate with stones, leaves, wood, or even animal hides. You need engineering to design it; you need industry to produce that kind of thing.


I have been doing a little experiment with myself, where I do things that force me to sit tall and stand while working in my office, instead of sitting back and “relaxing”. Things like putting a pillow on the chair so that I cannot rest my back on it, or placing my computer on an elevated surface so that I am forced to work while standing up. I tend to move a lot more when I do those things, and the movement is largely involuntary. These are small but constant movements, a bit like fidgeting. (It would be interesting to tape myself and actually quantify the amount of movement.)

It seems that one can induce an increase in NEAT, which is largely due to involuntary activities, by doing some voluntary things like placing a pillow on a chair or working while standing up.

Is it possible that the unnaturalness of comfy furniture, and particularly of comfy chairs, is contributing (together with other factors) to not only making us fat but also having low-back problems?

Both obesity and low-back problems are widespread among modern urbanites. Yet, from an evolutionary perspective, they should not be. They likely impaired survival success among our ancestors, and thus impaired their reproductive success. Evolution “gets angry” at these things; over time it wipes them out. In my reading of studies of hunter-gatherers, I don’t recall a single instance in which obesity and low-back problems were described as being widespread.

Strong causation can exist without any correlation: The strange case of the chain smokers, and a note about diet

Researchers like to study samples of data and look for associations between variables. Often those associations are represented in the form of correlation coefficients, which go from -1 to 1. Another popular measure of association is the path coefficient, which usually has a narrower range of variation. What many researchers seem to forget is that the associations they find depend heavily on the sample they are looking at, and on the ranges of variation of the variables being analyzed.

A forgotten warning: Causation without correlation

Often those who conduct multivariate statistical analyses on data are unaware of certain limitations. Many times this is due to lack of familiarity with statistical tests. One warning we do see a lot though is: Correlation does not imply causation. This is, of course, absolutely true. If you take my weight from 1 to 20 years of age, and the price of gasoline in the US during that period, you will find that they are highly correlated. But common sense tells me that there is no causation whatsoever between these two variables.

So correlation does not imply causation alright, but there is another warning that is rarely seen: There can be strong causation without any correlation. Of course this can lead to even more bizarre conclusions than the “correlation does not imply causation” problem. If there is strong causation between variables B and Y, and it is not showing as a correlation, another variable A may “jump in” and “steal” that “unused correlation”; so to speak.

The chain smokers “study”

To illustrate this point, let us consider the following fictitious case, a study of “100 cities”. The study focuses on the effect of smoking and genes on lung cancer mortality. Smoking significantly increases the chances of dying from lung cancer; it is a very strong causative factor. Here are a few more details. Between 35 and 40 percent of the population are chain smokers. And there is a genotype (a set of genes), found in a small percentage of the population (around 7 percent), which is protective against lung cancer. All of those who are chain smokers die from lung cancer unless they die from other causes (e.g., accidents). Dying from other causes is a lot more common among those who have the protective genotype.

(I created this fictitious data with these associations in mind, using equations. I also added uncorrelated error into the equations, to make the data look a bit more realistic. For example, random deaths occurring early in life would reduce slightly any numeric association between chain smoking and cancer deaths in the sample of 100 cities.)

The table below shows part of the data, and gives an idea of the distribution of percentage of smokers (Smokers), percentage with the protective genotype (Pgenotype), and percentage of lung cancer deaths (MLCancer). (Click on it to enlarge. Use the "CRTL" and "+" keys to zoom in, and CRTL" and "-" to zoom out.) Each row corresponds to a city. The rest of the data, up to row 100, has a similar distribution.


The graphs below show the distribution of lung cancer deaths against: (a) the percentage of smokers, at the top; and (b) the percentage with the protective genotype, at the bottom. Correlations are shown at the top of each graph. (They can vary from -1 to 1. The closer they are to -1 or 1, the stronger is the association, negative or positive, between the variables.) The correlation between lung cancer deaths and percentage of smokers is slightly negative and statistically insignificant (-0.087). The correlation between lung cancer deaths and percentage with the protective genotype is negative, strong, and statistically significant (-0.613).


Even though smoking significantly increases the chances of dying from lung cancer, the correlations tell us otherwise. The correlations tell us that lung cancer does not seem to cause lung cancer deaths, and that having the protective genotype seems to significantly decrease cancer deaths. Why?

If there is no variation, there is no correlation

The reason is that the “researchers” collected data only about chain smokers. That is, the variable “Smokers” includes only chain smokers. If this was not a fictitious case, focusing the study on chain smokers could be seen as a clever strategy employed by researchers funded by tobacco companies. The researchers could say something like this: “We focused our analysis on those most likely to develop lung cancer.” Or, this could have been the result of plain stupidity when designing the research project.

By restricting their study to chain smokers the researchers dramatically reduced the variability in one particular variable: the extent to which the study participants smoked. Without variation, there can be no correlation. No matter what statistical test or software is used, no significant association will be found between lung cancer deaths and percentage of smokers based on this dataset. No matter what statistical test or software is used, a significant and strong association will be found between lung cancer deaths and percentage with the protective genotype.

Of course, this could lead to a very misleading conclusion. Smoking does not cause lung cancer; the real cause is genetic.

A note about diet

Consider the analogy between smoking and consumption of a particular food, and you will probably see what this means for the analysis of observational data regarding dietary choices and disease. This applies to almost any observational study, including the China Study. (Studies employing experimental control manipulations would presumably ensure enough variation in the variables studied.) In the China Study, data from dozens of counties were collected. One may find a significant association between consumption of food A and disease Y.

There may be a much stronger association between food B and disease Y, but that association may not show up in statistical analyses at all, simply because there is little variation in the data regarding consumption of food B. For example, all those sampled may have eaten food B; about the same amount. Or none. Or somewhere in between, within a rather small range of variation.

Statistical illiteracy, bad choices, and taxation

Statistics is a “necessary evil”. It is useful to go from small samples to large ones when we study any possible causal association. By doing so, one can find out whether an observed effect really applies to a larger percentage of the population, or is actually restricted to a small group of individuals. The problem is that we humans are very bad at inferring actual associations from simply looking at large tables with numbers. We need statistical tests for that.

However, ignorance about basic statistical phenomena, such as the one described here, can be costly. A group of people may eliminate food A from their diet based on coefficients of association resulting from what seem to be very clever analyses, replacing it with food B. The problem is that food B may be equally harmful, or even more harmful. And, that effect may not show up on statistical analyses unless they have enough variation in the consumption of food B.

Readers of this blog may wonder why we explicitly use terms like “suggests” when we refer to a relationship that is suggested by a significant coefficient of association (e.g., a linear correlation). This is why, among other reasons.

One does not have to be a mathematician to understand basic statistical concepts. And doing so can be very helpful in one’s life in general, not only in diet and lifestyle decisions. Even in simple choices, such as what to be on. We are always betting on something. For example, any investment is essentially a bet. Some outcomes are much more probable than others.

Once I had an interesting conversation with a high-level officer of a state government. I was part of a consulting team working on an information technology project. We were talking about the state lottery, which was a big source of revenue for the state, comparing it with state taxes. He told me something to this effect:

Our lottery is essentially a tax on the statistically illiterate.

The China Study II: Wheat flour, rice, and cardiovascular disease

In my last post on the China Study II, I analyzed the effect of total and HDL cholesterol on mortality from all cardiovascular diseases. The main conclusion was that total and HDL cholesterol were protective. Total and HDL cholesterol usually increase with intake of animal foods, and particularly of animal fat. The lowest mortality from all cardiovascular diseases was in the highest total cholesterol range, 172.5 to 180; and the highest mortality in the lowest total cholesterol range, 120 to 127.5. The difference was quite large; the mortality in the lowest range was approximately 3.3 times higher than in the highest.

This post focuses on the intake of two main plant foods, namely wheat flour and rice intake, and their relationships with mortality from all cardiovascular diseases. After many exploratory multivariate analyses, wheat flour and rice emerged as the plant foods with the strongest associations with mortality from all cardiovascular diseases. Moreover, wheat flour and rice have a strong and inverse relationship with each other, which suggests a “consumption divide”. Since the data is from China in the late 1980s, it is likely that consumption of wheat flour is even higher now. As you’ll see, this picture is alarming.

The main model and results

All of the results reported here are from analyses conducted using WarpPLS. Below is the model with the main results of the analyses. (Click on it to enlarge. Use the "CRTL" and "+" keys to zoom in, and CRTL" and "-" to zoom out.) The arrows explore associations between variables, which are shown within ovals. The meaning of each variable is the following: SexM1F2 = sex, with 1 assigned to males and 2 to females; MVASC = mortality from all cardiovascular diseases (ages 35-69); TKCAL = total calorie intake per day; WHTFLOUR = wheat flour intake (g/day); and RICE = and rice intake (g/day).


The variables to the left of MVASC are the main predictors of interest in the model. The one to the right is a control variable – SexM1F2. The path coefficients (indicated as beta coefficients) reflect the strength of the relationships. A negative beta means that the relationship is negative; i.e., an increase in a variable is associated with a decrease in the variable that it points to. The P values indicate the statistical significance of the relationship; a P lower than 0.05 generally means a significant relationship (95 percent or higher likelihood that the relationship is “real”).

In summary, the model above seems to be telling us that:

- As rice intake increases, wheat flour intake decreases significantly (beta=-0.84; P<0.01). This relationship would be the same if the arrow pointed in the opposite direction. It suggests that there is a sharp divide between rice-consuming and wheat flour-consuming regions.

- As wheat flour intake increases, mortality from all cardiovascular diseases increases significantly (beta=0.32; P<0.01). This is after controlling for the effects of rice and total calorie intake. That is, wheat flour seems to have some inherent properties that make it bad for one’s health, even if one doesn’t consume that many calories.

- As rice intake increases, mortality from all cardiovascular diseases decreases significantly (beta=-0.24; P<0.01). This is after controlling for the effects of wheat flour and total calorie intake. That is, this effect is not entirely due to rice being consumed in place of wheat flour. Still, as you’ll see later in this post, this relationship is nonlinear. Excessive rice intake does not seem to be very good for one’s health either.

- Increases in wheat flour and rice intake are significantly associated with increases in total calorie intake (betas=0.25, 0.33; P<0.01). This may be due to wheat flour and rice intake: (a) being themselves, in terms of their own caloric content, main contributors to the total calorie intake; or (b) causing an increase in calorie intake from other sources. The former is more likely, given the effect below.

- The effect of total calorie intake on mortality from all cardiovascular diseases is insignificant when we control for the effects of rice and wheat flour intakes (beta=0.08; P=0.35). This suggests that neither wheat flour nor rice exerts an effect on mortality from all cardiovascular diseases by increasing total calorie intake from other food sources.

- Being female is significantly associated with a reduction in mortality from all cardiovascular diseases (beta=-0.24; P=0.01). This is to be expected. In other words, men are women with a few design flaws, so to speak. (This situation reverses itself a bit after menopause.)

Wheat flour displaces rice

The graph below shows the shape of the association between wheat flour intake (WHTFLOUR) and rice intake (RICE). The values are provided in standardized format; e.g., 0 is the mean (a.k.a. average), 1 is one standard deviation above the mean, and so on. The curve is the best-fitting U curve obtained by the software. It actually has the shape of an exponential decay curve, which can be seen as a section of a U curve. This suggests that wheat flour consumption has strongly displaced rice consumption in several regions in China, and also that wherever rice consumption is high wheat flour consumption tends to be low.


As wheat flour intake goes up, so does cardiovascular disease mortality

The graphs below show the shapes of the association between wheat flour intake (WHTFLOUR) and mortality from all cardiovascular diseases (MVASC). In the first graph, the values are provided in standardized format; e.g., 0 is the mean (or average), 1 is one standard deviation above the mean, and so on. In the second graph, the values are provided in unstandardized format and organized in terciles (each of three equal intervals).



The curve in the first graph is the best-fitting U curve obtained by the software. It is a quasi-linear relationship. The higher the consumption of wheat flour in a county, the higher seems to be the mortality from all cardiovascular diseases. The second graph suggests that mortality in the third tercile, which represents a consumption of wheat flour of 501 to 751 g/day (a lot!), is 69 percent higher than mortality in the first tercile (0 to 251 g/day).

Rice seems to be protective, as long as intake is not too high

The graphs below show the shapes of the association between rice intake (RICE) and mortality from all cardiovascular diseases (MVASC). In the first graph, the values are provided in standardized format. In the second graph, the values are provided in unstandardized format and organized in terciles.



Here the relationship is more complex. The lowest mortality is clearly in the second tercile (206 to 412 g/day). There is a lot of variation in the first tercile, as suggested by the first graph with the U curve. (Remember, as rice intake goes down, wheat flour intake tends to go up.) The U curve here looks similar to the exponential decay curve shown earlier in the post, for the relationship between rice and wheat flour intake.

In fact, the shape of the association between rice intake and mortality from all cardiovascular diseases looks a bit like an “echo” of the shape of the relationship between rice and wheat flour intake. Here is what is creepy. This echo looks somewhat like the first curve (between rice and wheat flour intake), but with wheat flour intake replaced by “death” (i.e., mortality from all cardiovascular diseases).

What does this all mean?

- Wheat flour displacing rice does not look like a good thing. Wheat flour intake seems to have strongly displaced rice intake in the counties where it is heavily consumed. Generally speaking, that does not seem to have been a good thing. It looks like this is generally associated with increased mortality from all cardiovascular diseases.

- High glycemic index food consumption does not seem to be the problem here. Wheat flour and rice have very similar glycemic indices (but generally not glycemic loads; see below). Both lead to blood glucose and insulin spikes. Yet, rice consumption seems protective when it is not excessive. This is true in part (but not entirely) because it largely displaces wheat flour. Moreover, neither rice nor wheat flour consumption seems to be significantly associated with cardiovascular disease via an increase in total calorie consumption. This is a bit of a blow to the theory that high glycemic carbohydrates necessarily cause obesity, diabetes, and eventually cardiovascular disease.

- The problem with wheat flour is … hard to pinpoint, based on the results summarized here. Maybe it is the fact that it is an ultra-refined carbohydrate-rich food; less refined forms of wheat could be healthier. In fact, the glycemic loads of less refined carbohydrate-rich foods tend to be much lower than those of more refined ones. (Also, boiled brown rice has a glycemic load that is about three times lower than that of whole wheat bread; whereas the glycemic indices are about the same.) Maybe the problem is wheat flour's  gluten content. Maybe it is a combination of various factors, including these.

Reference

Kock, N. (2010). WarpPLS 1.0 User Manual. Laredo, Texas: ScriptWarp Systems.

Acknowledgment and notes

- Many thanks are due to Dr. Campbell and his collaborators for collecting and compiling the data used in this analysis. The data is from this site, created by those researchers to disseminate their work in connection with a study often referred to as the “China Study II”. It has already been analyzed by other bloggers. Notable analyses have been conducted by Ricardo at Canibais e Reis, Stan at Heretic, and Denise at Raw Food SOS.

- The path coefficients (indicated as beta coefficients) reflect the strength of the relationships; they are a bit like standard univariate (or Pearson) correlation coefficients, except that they take into consideration multivariate relationships (they control for competing effects on each variable). Whenever nonlinear relationships were modeled, the path coefficients were automatically corrected by the software to account for nonlinearity.

- The software used here identifies non-cyclical and mono-cyclical relationships such as logarithmic, exponential, and hyperbolic decay relationships. Once a relationship is identified, data values are corrected and coefficients calculated. This is not the same as log-transforming data prior to analysis, which is widely used but only works if the underlying relationship is logarithmic. Otherwise, log-transforming data may distort the relationship even more than assuming that it is linear, which is what is done by most statistical software tools.

- The R-squared values reflect the percentage of explained variance for certain variables; the higher they are, the better the model fit with the data. In complex and multi-factorial phenomena such as health-related phenomena, many would consider an R-squared of 0.20 as acceptable. Still, such an R-squared would mean that 80 percent of the variance for a particularly variable is unexplained by the data.

- The P values have been calculated using a nonparametric technique, a form of resampling called jackknifing, which does not require the assumption that the data is normally distributed to be met. This and other related techniques also tend to yield more reliable results for small samples, and samples with outliers (as long as the outliers are “good” data, and are not the result of measurement error).

- Only two data points per county were used (for males and females). This increased the sample size of the dataset without artificially reducing variance, which is desirable since the dataset is relatively small. This also allowed for the test of commonsense assumptions (e.g., the protective effects of being female), which is always a good idea in a complex analysis because violation of commonsense assumptions may suggest data collection or analysis error. On the other hand, it required the inclusion of a sex variable as a control variable in the analysis, which is no big deal.

- Since all the data was collected around the same time (late 1980s), this analysis assumes a somewhat static pattern of consumption of rice and wheat flour. In other words, let us assume that variations in consumption of a particular food do lead to variations in mortality. Still, that effect will typically take years to manifest itself. This is a major limitation of this dataset and any related analyses.

- Mortality from schistosomiasis infection (MSCHIST) does not confound the results presented here. Only counties where no deaths from schistosomiasis infection were reported have been included in this analysis. Mortality from all cardiovascular diseases (MVASC) was measured using the variable M059 ALLVASCc (ages 35-69). See this post for other notes that apply here as well.

The China Study II: Cholesterol seems to protect against cardiovascular disease

First of all, many thanks are due to Dr. Campbell and his collaborators for collecting and compiling the data used in this analysis. This data is from this site, created by those researchers to disseminate the data from a study often referred to as the “China Study II”. It has already been analyzed by other bloggers. Notable analyses have been conducted by Ricardo at Canibais e Reis, Stan at Heretic, and Denise at Raw Food SOS.

The analyses in this post differ from those other analyses in various aspects. One of them is that data for males and females were used separately for each county, instead of the totals per county. Only two data points per county were used (for males and females). This increased the sample size of the dataset without artificially reducing variance (for more details, see “Notes” at the end of the post), which is desirable since the dataset is relatively small. This also allowed for the test of commonsense assumptions (e.g., the protective effects of being female), which is always a good idea in a complex analysis because violation of commonsense assumption may suggest data collection or analysis error. On the other hand, it required the inclusion of a sex variable as a control variable in the analysis, which is no big deal.

The analysis was conducted using WarpPLS. Below is the model with the main results of the analysis. (Click on it to enlarge. Use the "CRTL" and "+" keys to zoom in, and CRTL" and "-" to zoom out.) The arrows explore associations between variables, which are shown within ovals. The meaning of each variable is the following: SexM1F2 = sex, with 1 assigned to males and 2 to females; HDLCHOL = HDL cholesterol; TOTCHOL = total cholesterol; MSCHIST = mortality from schistosomiasis infection; and MVASC = mortality from all cardiovascular diseases.


The variables to the left of MVASC are the main predictors of interest in the model – HDLCHOL and TOTCHOL. The ones to the right are control variables – SexM1F2 and MSCHIST. The path coefficients (indicated as beta coefficients) reflect the strength of the relationships. A negative beta means that the relationship is negative; i.e., an increase in a variable is associated with a decrease in the variable that it points to. The P values indicate the statistical significance of the relationship; a P lower than 0.05 generally means a significant relationship (95 percent or higher likelihood that the relationship is “real”).

In summary, this is what the model above is telling us:

- As HDL cholesterol increases, total cholesterol increases significantly (beta=0.48; P<0.01). This is to be expected, as HDL is a main component of total cholesterol, together with VLDL and LDL cholesterol.

- As total cholesterol increases, mortality from all cardiovascular diseases decreases significantly (beta=-0.25; P<0.01). This is to be expected if we assume that total cholesterol is in part an intervening variable between HDL cholesterol and mortality from all cardiovascular diseases. This assumption can be tested through a separate model (more below). Also, there is more to this story, as noted below.

- The effect of HDL cholesterol on mortality from all cardiovascular diseases is insignificant when we control for the effect of total cholesterol (beta=-0.08; P=0.26). This suggests that HDL’s protective role is subsumed by the variable total cholesterol, and also that it is possible that there is something else associated with total cholesterol that makes it protective. Otherwise the effect of total cholesterol might have been insignificant, and the effect of HDL cholesterol significant (the reverse of what we see here).

- Being female is significantly associated with a reduction in mortality from all cardiovascular diseases (beta=-0.16; P=0.01). This is to be expected. In other words, men are women with a few design flaws. (This situation reverses itself a bit after menopause.)

- Mortality from schistosomiasis infection is significantly and inversely associated with mortality from all cardiovascular diseases (beta=-0.28; P<0.01). This is probably due to those dying from schistosomiasis infection not being entered in the dataset as dying from cardiovascular diseases, and vice-versa.

Two other main components of total cholesterol, in addition to HDL cholesterol, are VLDL and LDL cholesterol. These are carried in particles, known as lipoproteins. VLDL cholesterol is usually represented as a fraction of triglycerides in cholesterol equations (e.g., the Friedewald and Iranian equations). It usually correlates inversely with HDL; that is, as HDL cholesterol increases, usually VLDL cholesterol decreases. Given this and the associations discussed above, it seems that LDL cholesterol is a good candidate for the possible “something else associated with total cholesterol that makes it protective”. But waidaminet! Is it possible that the demon particle, the LDL, serves any purpose other than giving us heart attacks?

The graph below shows the shape of the association between total cholesterol (TOTCHOL) and mortality from all cardiovascular diseases (MVASC). The values are provided in standardized format; e.g., 0 is the average, 1 is one standard deviation above the mean, and so on. The curve is the best-fitting S curve obtained by the software (an S curve is a slightly more complex curve than a U curve).


The graph below shows some of the data in unstandardized format, and organized differently. The data is grouped here in ranges of total cholesterol, which are shown on the horizontal axis. The lowest and highest ranges in the dataset are shown, to highlight the magnitude of the apparently protective effect. Here the two variables used to calculate mortality from all cardiovascular diseases (MVASC; see “Notes” at the end of this post) were added. Clearly the lowest mortality from all cardiovascular diseases is in the highest total cholesterol range, 172.5 to 180; and the highest mortality in the lowest total cholesterol range, 120 to 127.5. The difference is quite large; the mortality in the lowest range is approximately 3.3 times higher than in the highest.


The shape of the S-curve graph above suggests that there are other variables that are confounding the results a bit. Mortality from all cardiovascular diseases does seem to generally go down with increases in total cholesterol, but the smooth inflection point at the middle of the S-curve graph suggests a more complex variation pattern that may be influenced by other variables (e.g., smoking, dietary patterns, or even schistosomiasis infection; see “Notes” at the end of this post).

As mentioned before, total cholesterol is strongly influenced by HDL cholesterol, so below is the model with only HDL cholesterol (HDLCHOL) pointing at mortality from all cardiovascular diseases (MVASC), and the control variable sex (SexM1F2).


The graph above confirms the assumption that HDL’s protective role is subsumed by the variable total cholesterol. When the variable total cholesterol is removed from the model, as it was done above, the protective effect of HDL cholesterol becomes significant (beta=-0.27; P<0.01). The control variable sex (SexM1F2) was retained even in this targeted HDL effect model because of the expected confounding effect of sex; females generally tend to have higher HDL cholesterol and less cardiovascular disease than males.

Below, in the “Notes” section (after the “Reference”) are several notes, some of which are quite technical. Providing them separately hopefully has made the discussion above a bit easier to follow. The notes also point at some limitations of the analysis. This data needs to be analyzed from different angles, using multiple models, so that firmer conclusions can be reached. Still, the overall picture that seems to be emerging is at odds with previous beliefs based on the same dataset.

What could be increasing the apparently protective HDL and total cholesterol in this dataset? High consumption of animal foods, particularly foods rich in saturated fat and cholesterol, are strong candidates. Low consumption of vegetable oils rich in linoleic acid, and of foods rich in refined carbohydrates, are also good candidates. Maybe it is a combination of these.

We need more analyses!

Reference:

Kock, N. (2010). WarpPLS 1.0 User Manual. Laredo, Texas: ScriptWarp Systems.


Notes:

- The path coefficients (indicated as beta coefficients) reflect the strength of the relationships; they are a bit like standard univariate (or Pearson) correlation coefficients, except that they take into consideration multivariate relationships (they control for competing effects on each variable).

- The R-squared values reflect the percentage of explained variance for certain variables; the higher they are, the better the model fit with the data. In complex and multi-factorial phenomena such as health-related phenomena, many would consider an R-squared of 0.20 as acceptable. Still, such an R-squared would mean that 80 percent of the variance for a particularly variable is unexplained by the data.

- The P values have been calculated using a nonparametric technique, a form of resampling called jackknifing, which does not require the assumption that the data is normally distributed to be met. This and other related techniques also tend to yield more reliable results for small samples, and samples with outliers (as long as the outliers are “good” data, and are not the result of measurement error).

- Colinearity is an important consideration in models that analyze the effect of multiple predictors on one single variable. This is particularly true for multiple regression models, where there is a temptation of adding many predictors to the model to see which ones come out as the “winners”. This often backfires, as colinearity can severely distort the results. Some multiple regression techniques, such as automated stepwise regression with backward elimination, are particularly vulnerable to this problem. Colinearity is not the same as correlation, and thus is defined and measured differently. Two predictor variables may be significantly correlated and still have low colinearity. A reasonably reliable measure of colinearity is the variance inflation factor. Colinearity was tested in this model, and was found to be low.

- An effort was made here to avoid multiple data points per county (even though this was available for some variables), because this could artificially reduce the variance for each variable, and potentially bias the results. The reason for this is that multiple answers from a single county would normally be somewhat correlated; a higher degree of intra-county correlation than inter-county correlation. The resulting bias would be difficult to control for, via one or more control variables. With only two data points per county, one for males and the other for females, one can control for intra-country correlation by adding a “dummy” sex variable to the analysis, as a control variable. This was done here.

- Mortality from schistosomiasis infection (MSCHIST) is a variable that tends to affect the results in a way that makes it more difficult to make sense of them. Generally this is true for any infectious diseases that significantly affect a population under study. The problem with infection is that people with otherwise good health or habits may get the infection, and people with bad health and habits may not. Since cholesterol is used by the human body to fight disease, it may go up, giving the impression that it is going up for some other reason. Perhaps instead of controlling for its effect, as done here, it would have been better to remove from the analysis those counties with deaths from schistosomiasis infection. (See also this post, and this one.)

- Different parts of the data were collected at different times. It seems that the mortality data is for the period 1986-88, and the rest of the data is for 1989. This may have biased the results somewhat, even though the time lag is not that long, especially if there were changes in certain health trends from one period to the other. For example, major migrations from one county to another could have significantly affected the results.

- The following measures were used, from this online dataset like the other measures. P002 HDLCHOL, for HDLCHOL; P001 TOTCHOL, for TOTCHOL; and M021 SCHISTOc, for MSCHIST.

- SexM1F2 is a “dummy” variable that was coded with 1 assigned to males and 2 to females. As such, it essentially measures the “degree of femaleness” of the respondents. Being female is generally protective against cardiovascular disease, a situation that reverts itself a bit after menopause.

- MVASC is a composite measure of the two following variables, provided as component measures of mortality from all cardiovascular diseases: M058 ALLVASCb (ages 0-34), and M059 ALLVASCc (ages 35-69). A couple of obvious problems: (a) they does not include data on people older than 69; and (b) they seem to capture a lot of diseases, including some that do not seem like typical cardiovascular diseases. A factor analysis was conducted, and the loadings and cross-loadings suggested good validity. Composite reliability was also good. So essentially MVASC is measured here as a “latent variable” with two “indicators”. Why do this? The reason is that it reduces the biasing effects of incomplete data and measurement error (e.g., exclusion of folks older than 69). By the way, there is always some measurement error in any dataset.

- This note is related to measurement error in connection with the indicators for MVASC. There is something odd about the variables M058 ALLVASCb (ages 0-34), and M059 ALLVASCc (ages 35-69). According to the dataset, mortality from cardiovascular diseases for ages 0-34 is typically higher than for 35-69, for many counties. Given the good validity and reliability for MVASC as a latent variable, it is possible that the values for these two indicator variables were simply swapped by mistake.

Low omega-6 to omega-3 ratio: Grain-fed meats or industrial vegetable oils?

Just a little note on the use of language. Clearly there is no such a thing as grain-fed or grass-fed beef, because one does not feed beef anything. One feeds cattle grain or grass, and then the resulting beef is said to be “grain-fed” or “grass-fed”. It is a manner of speaking that facilitates discourse, which is why it is used here.

To compensate for this digression, let me show you a graph, which pretty much summarizes the "punch line" of this post. The graph below shows the omega-6 fat contents of 1 lb (454 g) of grain-fed beef and 1 tablespoon (roughly 14 g) of a typical industrial vegetable oil (safflower oil). As you can see, there is a lot more omega-6 in the much smaller amount of industrial vegetable oil. A gram-for-gram comparison would practically make the beef content bar disappear.


It has been estimated that our Paleolithic ancestors consumed a diet with an omega-6 to omega-3 ratio of about 1. While other estimates exist, the general consensus seems to be that that ratio was not much greater than 5. Western diets, in contrast, typically have omega-6 to omega-3 ratios of between 15 and 40. In some cases, the ratio is even higher.

Omega-6 fats are essential fats, meaning that they must be part of one’s diet. Fats make up about 60 percent of our brain. About 20 percent is made up of omega-6 and omega-3 fats. The primary omega-6 fat found in our brain is arachidonic acid, which is either synthesized by our body based on linoleic acid from plant foods or obtained directly from animal foods such as meat and eggs. The predominant omega-3 fat found in our brain is docosahexaenoic acid (DHA), of which certain types of fish and algae are rich sources.

Inflammation is an important process in the human body, without which wounds would never heal. Incidentally, muscle gain would not occur without inflammation either. Strength training causes muscle damage and inflammation, after which recovery leads to muscle gain. Omega-6 fats play an important role in inflammation. Generally, they are pro-inflammatory.

Too much inflammation, particularly in a chronic fashion, is believed to be very detrimental to our health. A very high omega-6 to omega-3 ratio seems to cause excessive and chronic inflammation. The reason is that omega-3 fats are generally anti-inflammatory, counteracting the pro-inflammatory action of omega-6 fats. Over time, a very high omega-6 to omega-3 ratio is believed to cause a number of Western diseases. Among them are cardiovascular complications, cancer, and various autoimmune diseases.

So, should you worry about too much omega-6 from grain-fed meats?

If you think that the answer is “yes”, consider this. Apparently the (arguably) longest-living group in the world, the non-Westernized Okinawans, consume plenty of pork. Pork is a staple of their traditional diet. It is true that the average cut will have an omega-6 to omega-3 ratio of more than 7, which is not very favorable. Pork in general, whether grain-fed or not, is relatively high in omega-6 fats. As a side note, pork is not a good source of linoleic acid (found in plants), even though it is a rich source of arachidonic acid, the omega-6 fat synthesized from linoleic acid by various animals.

It is difficult to estimate the exact amounts of omega-6 and omega-3 fats from grain-fed cuts of meat; different sources provide different estimates. Here are some reasonable estimates based on various sources, including Nutritiondata.com. A typical 100 g portion of grain-fed pork should contain about 690 mg of omega-6 fats, and 120 mg of omega-3 fats. A typical 100 g portion of grain-fed beef should have about 234 mg of omega-6 fats, and 12 mg of omega-3 fats. It does not take that much omega-3 to counterbalance the omega-6 obtained from grain-fed pork or beef, even if one eats a lot of them. Two softgels of fish oil will normally contain about 720 mg of omega-3 fats (they will also come with 280 mg of omega-6 fats). Three sardines will have over 2 g of omega-3 fats, and less than 200 mg of omega-6 fats.

Industrial vegetable oils (made from, e.g., safflower seeds, soybean, and sunflower seeds) are very, very rich sources of omega-6 fats, in the form of linoleic acid. There is a lot more omega-6 in them than in grain-fed meats. One tablespoon of safflower oil contains over 10 g of omega-6 fats, in the form of linoleic acid, and virtually zero omega-3 fats. About 2 kg (4.4 lbs) of grain-fed pork, and 5 kg (11 lbs) of grain-fed beef will give you that much omega-6; but they will also come with omega-3.

How much fish oil does one need to neutralize 10 g of pure omega-6 fats? A lot! And there is a problem. Excessive fish oil consumption may be toxic to the liver.

If you cook with industrial vegetable oils rich in linoleic acid (this excludes olive and coconut oils), or eat out a lot in restaurants that use them (the vast majority), you will probably be consuming significantly more than 10 g of omega-6 fats per day. The likely negative health effects of eating grain-fed meats pales in comparison with the likely negative health effects of this much omega-6 fats from industrial vegetable oils.

You should reduce as much as possible your consumption of industrial vegetable oils rich in linoleic acid, as well as other products that use them (e.g., margarine). Keep in mind that industrial vegetable oils are in many, many industrialized foods; even canned sardines, if they are canned with soybean oil.

It is also advisable to couple this with moderate consumption of fish rich in omega-3, such as sardines and salmon. (See this post for a sardine recipe.) Taking large doses of fish oil every day may not be such a good idea.

Should you also consume only grass-fed meat? Do it if you can. But, if you cannot, maybe you shouldn’t worry too much about it. This also applies to eggs, dairy, and other animal products.

References:

Elliott, W.H., & Elliott, D.C. (2009). Biochemistry and molecular biology. New York: NY: Oxford University Press.

Ramsden, C.E., Faurot, K.R., Carrera-Bastos, P., Cordain, L., De Lorgeril, M., & Sperling (2009). Dietary fat quality and coronary heart disease prevention: A unified theory based on evolutionary, historical, global, and modern perspectives. Current Treatment Options in Cardiovascular Medicine, 11(4), 289-301.

Schmidt, M.A. (1997). Smart fats: How dietary fats and oils affect mental, physical and emotional intelligence. Berkeley, CA: North Atlantic Books.

How to lose fat and gain muscle at the same time? Strength training plus a mild caloric deficit

Ballor et al. (1996) conducted a classic and interesting study on body composition changes induced by aerobic and strength training. This study gets cited a lot, but apparently for the wrong reasons. One of these reasons can be gleaned from this sentence in the abstract:

    “During the exercise training period, the aerobic training group … had a significant … reduction in body weight … as compared with the [strength] training group ...

That is, one of the key conclusions of this study was that aerobic training was more effective than strength training as far as weight loss is concerned. (The authors refer to the strength training group as the “weight training group”.)

Prior to starting the exercise programs, the 18 participants had lost a significant amount of weight through dieting, for a period of 11 weeks. The authors do not provide details on the diet, other than that it was based on “healthy” food choices. What this means exactly I am not sure, but my guess is that it was probably not particularly high or low in carbs/fat, included a reasonable amount of protein, and led to a caloric deficit.

The participants were older adults (mean age of 61; range, 56 to 70), who were also obese (mean body fat of 45 percent), but otherwise healthy. They managed to lose an average of 9 kg (about 20 lbs) during that 11-week period.

Following the weight loss period, the participants were randomly assigned to either a 12-week aerobic training (four men, five women) or weight training (four men, five women) exercise program. They exercised 3 days per week. These were whole-body workouts, with emphasis on compound (i.e., multiple-muscle) exercises. The figure below shows what actually happened with the participants.


As you can see, the strength training group (WT) gained about 1.5 kg of lean mass, lost 1.2 kg of fat, and thus gained some weight. The aerobic training group (AT) lost about 0.6 kg of lean mass and 1.8 kg of fat, and thus lost some weight.

Which group fared better? In terms of body composition changes, clearly the strength training group fared better. But my guess is that the participants in the strength training group did not like seeing their weight going up after losing a significant amount of weight through dieting. (An analysis of the possible psychological effects of this would be interesting; a discussion for another blog post.)

The changes in the aerobic training group were predictable, and were the result of compensatory adaptation. Their bodies changed to become better adapted to aerobic exercise, for which a lot of lean mass is a burden, as is a lot of fat mass.

So, essentially the participants in the strength training group lost fat and gained muscle at the same time. The authors say that the participants generally stuck with their weight-loss diet during the 12-week exercise period, but not a very strict away. It is reasonable to conclude that this induced a mild caloric deficit in the participants.

Exercise probably induced hunger, and possibly a caloric surplus on exercise days. If that happened, the caloric deficit must have occurred on non-exercise days. Without some caloric deficit there would not have been fat loss, as extra calories are stored as fat.

There are many self-help books and programs online whose main claim is to have a “revolutionary” prescription for concurrent fat loss and muscle gain – the “holy grail” of body composition change.

Well, it may be as simple as combining strength training with a mild caloric deficit, in the context of a nutritious diet focused on unprocessed foods.

Reference:

Ballor, D.L., Harvey-Berino, J.R., Ades, P.A., Cryan, J., & Calles-Escandon, J. (1996). Contrasting effects of resistance and aerobic training on body composition and metabolism after diet-induced weight loss. Metabolism, 45(2), 179-183.