a. Pregnancy status
Smoking status
Weight gain
both categorical variables
both quantitative variables
c. one categorical variable and one quantitative variable
The Statistical Abstract of the United States, prepared by the Census Bureau, provides the number of single-organ transplants for the year \(2010\), by organ. The next two exercises are based on the following table:
\[ \begin{array}{|c|c|} \hline \text{Heart} & 2333 \\ \hline \text{Lung} & 1770 \\ \hline \text{Liver} & 6291 \\ \hline \text{Kidney} & 16898 \\ \hline \text{Pancreas} & 350 \\ \hline \text{Intenstine} & 151 \\ \hline \end{array} \]
a pie chart but not a bar graph
a bar graph but not a pie chart
c. either a pie chart or a bar graph
a. Nearly \(61\%\)
One-sixth (nearly \(17\%\))
This percent cannot be calculated from the information provided in the table
The graphic below shows the percent of adults in the world who are overweight or obese, by type of country of residence based on that country’s income level. The following two exercises are based on this figure:
b. a bar graph that cannot be made into one pie chart
b. The majority of adults who live in high-income countries are overweight obese
Below is a histogram of the takeoff angles of \(54\) videotaped jumps of adult hedgehog fleas, Archaeophyllus erinacei. The following two exercised are based on this histogram:
\[10/54 = 0.185 * 100 = 18.5\%\]
a. skewed to the right
roughly symmetric
skewed to the left
Researchers examined a new treatment for advanced ovarian cancer in a mouse model. They created a nanparticle-based delivery system for a suicide gene therapy to be delivered directly to the tumor cells. The grafted tumors were injected either with the new treatment or with only some buffer solution to serve as a comparison. The following data give the fold increase in tumor size after two weeks in \(20\) mice. A \(1\) represents no change, a \(2\) represents a doubling in volume of the tumor.
\[ \begin{array}{|c|} \hline \text{Buffer Solution}\\ \hline 9.1 \quad 8.1 \quad 7.8 \quad 7.0 \quad 6.8 \quad 5.4 \quad 5.4 \quad 4.1 \quad 3.8 \quad 3.3\\ \hline \end{array} \]
\[ \begin{array}{|c|} \hline \text{Nanoparticle-delivered gene therapy}\\ \hline 4.1 \quad 3.5 \quad 2.1 \quad 2.1 \quad 1.8 \quad 1.8 \quad 1.4 \quad 1.2 \quad 1.1 \quad 1.1\\ \hline \end{array} \]
The data for the buffer solution is approximately symmetric,
heavily spread, and pushed more to the right than the gene therapy
treatment data which is right skewed, low spread, and pushed close to
\(1\).
\[\text{Median Buffer}=6.1\]
\[\bar X_{\text{Buffer}}=6.08\]
\[\text{Median Trt}=2.02\]
\[\bar X_{\text{Trt}}=1.8\]
Looking at both dotplots and midpoints, there’s a clear effect of treatment from the gene therapy versus the buffer solution. The data for the gene therapy is much less spread and pushed almost entirely towards \(1\).
Spider silk is the strongest known material, natural or man-made, on a weight basis. A study examined the mechanical properties of spider silk using 21 female golden orb weavers, Nephila clavipes. Here are data on silk yield stress, which represents the amount of force per unit area needed to reach permanent deformation of the silk strand. The data are expressed in megapascals (MPa):
\[ \begin{array}{|c|c|c|c|c|c|c|} \hline 164.0 & 478.7 & 251.3 & 351.7 & 173.0 & 448.9 & 300.6\\ \hline 362.0 & 272.4 & 740.2 & 329.0 & 327.2 & 270.5 & 332.1\\ \hline 288.8 & 176.1 & 282.2 & 236.1 & 358.2 & 270.5 & 290.7\\ \hline \end{array} \]
\[\bar x={1\over n}\sum_{i=1}^n x_i = {1 \over 21}(164.0 + 478.7 + \ ... \ +290.7)=319.2476\]
\[\text{Median} = 290.7\]
Mean \(>\) Median. Given this, the data is skewed to the right with a higher density of values between \(200\) and \(400\). The median would provide a better representation of this density, but the mean would be more inclusive to the variation occuring in the data set.
\[\sigma=\sqrt{{1\over (n-1)}\sum_{i=1}^n(x_i-\bar x)^2} \newline = \sqrt{{1\over 20}((164.0-319.2476)^2+(478.7-319.2476)^2 + \ ... \ + (290.7-319.2476)^2)} \newline =\sqrt{{1\over 20}*312078.9}=\sqrt{15603.95}=124.9158\]
The standard deviation being so high relative to the mean (\(\approx 39\%\) of the mean) explains just how strong the effect of the single major outlier value in the data set is. Due to this extreme spread we would likely want to describe this data set with the median to represent the bulk of our data or use a histogram when using the mean to describe the data to be inclusive of the shape of the data.
Fun fact: If we exclude the highest value in the data set, \(740.2\), the standard deviation drops to \(81.44\)