These values were used in the calculation of the least-squares regression line. The least-squares regression line is fit to a set of data. If one of the data points has a negative residual, then the: A correlation between the values of the response...

D All of the answers are correct. Page 30 Which of the following statements concerning the least squares regression of Y on X depicted in the graph below is TRUE? A B C D The point 70, 50 is very influential and has a large residual. The point 70,...

We may conclude which of the following? A This is strong, but not conclusive, evidence that being overweight results in lower salaries. B If the annual salary and the number of pounds overweight for each individual in this study were plotted on a scatterplot, the points would lie close to a negatively sloping straight line. C If a larger sample of adults between the ages of 25 and 35 had been studied, the correlation would have been even stronger. Page 33 A response Y and explanatory variable X were measured on each of several subjects. A scatterplot of the measurements is given below. The least-squares regression line is shown in the plot. Which of the following is a plot of the residuals for the above data versus X?

The owner of a chain of supermarkets notices that there is a positive correlation between the sales of beer and the sales of ice cream over the course of the previous year. Seasons when sales of beer were above average, sales of ice cream also tended to be above average. Likewise, during seasons when sales of beer were below average, sales of ice cream also tended to be below average. Which of the following would be a valid conclusion from these facts? A There must be an error. There should be no association between beer and ice cream sales. B Evidently, for a significant proportion of customers of these supermarkets, drinking beer causes a desire for ice cream or eating ice cream causes a thirst for beer. C A scatterplot of monthly ice cream sales versus monthly beer sales would show that a straight line describes the pattern in the plot, but it would have to be a horizontal line.

Page 35 In order to use a relatively homogeneous group of students, the researcher examines only data of high school valedictorians students who graduated at the top of their high school class who have completed their first year of college. The researcher finds the correlation between total SAT score and GPA at the end of the freshman year to be very close to 0. A Because the group of students studied is a very homogeneous group of students, the results should give a very accurate estimate of the correlation the researcher would find if all college students who have completed their freshman year were studied.

B The correlation we would find if all college students who have completed their freshman year were studied would be even smaller than that found by the researcher. By restricting to valedictorians, the researcher is examining a group that will be more informative than those students who have only completed their freshman year. C The researcher made a mistake. Correlation cannot be calculated the formula for correlation is invalid unless all students who completed their freshman year are included. When exploring very large sets of data involving many variables, which of the following is TRUE?

A Extrapolation is safe because it is based on a greater quantity of evidence. B Associations will be stronger than would be seen in a much smaller subset of the data. C A strong association is good evidence for causation because it is based on a large quantity of information. Page 36 Consider the scatterplot below. The point indicated by the plotting symbol x would be: A a residual. B influential. C a z-score. D a least-squares point. Use the following to answer questions The following scatterplot displays the per capita income versus number of deaths due to traffic accidents per , people for each of the 50 states, plus the District of Columbia.

Page 37 If instead of plotting these variables for each of the 50 states and the District of Columbia, we plotted the values of these variables for each county in the United States, we would expect the value of the correlation r to be: A exactly the same. B smaller. D much higher and probably near 1 because there are many more counties than states. The least-squares regression line was fit to the data in the scatterplot and the residuals computed. A plot of the residuals versus the percent of miles traveled on urban roads in the state is given below. This plot suggests that: A a high number of deaths per , people implies low per capita income, but only for states with a high percentage of miles traveled on urban roads. B a high number of deaths per , people implies low per capita income, but only for states with a low percentage of miles traveled on urban roads.

C percentage of miles traveled on urban roads may be a lurking variable in understanding the association between income and deaths per , people. Two variables, x and y, are measured on each of several individuals. The correlation between these variables is found to be 0. To interpret this correlation, we should do which of the following? A Compute the least-squares regression line of y on x and consider whether the slope is positive or negative. B Interchange the roles of x and y i. C Plot the data. Page 38 A sample of 79 companies was taken and the annual profits y were plotted against annual sales x.

The plot is given below. The correlation between sales and profits is found to be 0. Based on this information, we may conclude which of the following? A Not surprisingly, increasing sales causes an increase in profits. This is confirmed by the large positive correlation. B There are clearly influential observations present.

The population is all visitors coming to the state of Hawaii. Since airline flights carry the vast majority of visitors to the state, the use of questionnaires for passengers during incoming flights is a good way to reach this population. The questionnaire actually appears on the back of a mandatory plants and animals declaration form that passengers must complete during the incoming flight. A large percentage of passengers complete the visitor information questionnaire. Questions 1 and 4 provide quantitative data indicating the number of visits and the number of days in Hawaii. Questions 2 and 3 provide qualitative data indicating the categories of reason for the trip and where the visitor plans to stay. The two populations are the population of women whose mothers took the drug DES during pregnancy and the population of women whose mothers did not take the drug DES during pregnancy.

It was a survey. The article reported "twice" as many abnormalities in the women whose mothers had taken DES during pregnancy. Thus, a rough estimate would be In many situations, disease occurrences are rare and affect only a small portion of the population. Large samples are needed to collect data on a reasonable number of cases where the disease exists. For those buildings using electricity, the percentage has not changed greatly over the years. For the buildings using natural gas, the majority were constructed in or before; the second largest percentage was constructed in Most of the buildings using oil were constructed in or before.

All of the buildings using propane are older. In the period or before most used natural gas. From From , it is fairly evenly divided between electricity and natural gas. Since almost all new buildings are using electricity or natural gas with natural gas being the clear leader. If you know the change in either, you will have a good idea of the stock market performance for the day. Introduction to Probability We should display the offer that appeals to female visitors. Discrete Probability Distributionsb. Since the shipment is large we can assume that the probabilities do not change from trial to trial and use the binomial probability distribution. Continuous Probability Distributions2. The area to the left of z is 1 -. Sampling and Sampling Distributions3. Normal distribution:. This would suggest not using the proportion of DJIA stocks going up on a daily basis as a predictor of the proportion of NYSE stocks going up on that day.

The proportion of workers not required to contribute to their company sponsored health care plan has declined. There seems to be a trend toward companies requiring employees to share the cost of health care benefits. However, the statistical analysis shows that the reduction in the mean duration is only 3. The interval estimate shows the reduction in the population mean is 1. Additional data collected by the end of the season would provide a more precise estimate. In any case, most likely the issue will continue in future years. It is expected that major league baseball would prefer that additional steps be taken to further reduce the mean duration of games. The residual plot leads us to question the assumption of a linear relationship between x and y. Even though the relationship is significant at the. Regression or correlation analysis can never prove that two variables are casually related. Multiple Regression a.

The standardized residual plot is shown below. There appears to be a very unusual trend in the standardized residuals. Regression Analysis: Model Building a. In this case, the recommended decision is to build a large-size community center. That is, it eliminates the risk of a loss, which appears to be a significant factor in the mayor's decision-making process. Mutually exclusive events are dependent. Yes, the probability of default is greater than. Continuous The expected value of a 3 -point shot is higher. So, if these probabilities hold up, the team will make more points in the long run with the 3 -point shot. The z value corresponding to a cumulative probability of. The population mean duration of games in is less than the population mean in Management should be encouraged by the fact that steps taken in reduced the population mean duration of baseball games.

The summations needed to compute the slope and the y-intercept are: The sum of squares due to error and the total sum of squares are Minitab output shown in part a did not identify any observations with a large standardized residual; thus, there does not appear to be any outliers in the data. The Minitab output shown in part a identifies observation 2 as an influential observation. If the promotional campaign is conducted, the probabilities will change to 0. Requirement 4. Use the relative frequency method.

Divide by the total adult population of Age Number Probability 18 to 24 So, if an accident leads to a fatality, the probability a small car was involved is. The probability of spending this much or less is only. Section 8. As the confidence level increases, there is a larger margin of error and a wider confidence interval. This modest positive skewness in the data set can be expected to exist in the population.

Regardless of skewness, this is a pretty small data set. Consider using a larger sample next time. We are not able to conclude that the manager's claim is wrong. The manager's claim can be rejected. Research hypothesis b. There is no statistical evidence that the new bonus plan increases sales volume. We can conclude that the new bonus plan increases the mean sales volume. A mistake could be implementing the plan when it does not help. This could lead to not implementing a plan that would increase sales. Interval Estimation 7. Hypothesis Testing 1. We are unable to conclude there has been a change in the mean CNN viewing audience.

The sample mean of thousand viewers is encouraging but not conclusive for the sample of 40 days. Recommend additional viewer audience data. A larger sample should help clarify the situation for CNN. No reason to change from the 2 hours for cost estimating purposes. These studies help companies and advertising firms evaluate the impact and benefit of commercials. There is not a statistically significant difference between the National mean price per gallon and the mean price per gallon in the Lower Atlantic states. A difference exists with system B having the lower mean checkout time. The DJIA is not that far beyond the range of the data. But there is a huge increase in the Adjusted R-Squared, and both variables have low p-values in part b.

Hence we can expect better predictions from the 2-variable model. The estimated regression equation did not provide a good fit. In fact, the p-value of. The Minitab output is shown below: d. We see that Person is highly correlated with Months the sample correlation coefficient is -. No, it is 1. In part b it represents the marginal change in revenue due to an increase in television advertising with newspaper advertising held constant. Note: The Minitab output is shown in Exercise 5 a. NOTE: These answers seem to imply that a variable whose p-value is above alpha should be dropped. The Minitab output is shown below: Fit Stdev. Confidence interval estimate: Prediction interval estimate:

