# Workshop2

The University of Tasmania - Department of Economics BEA140 Quantitative Methods Mini assignment like questions are highlighted in yellow Module 2 Exercise 1 “Hardways”, a road building contractor buys blue metal from a quarry by the truckload. An overall review of costs is being carried out, and as part of this Hardways wish to gain a better feel as too how much blue metal they are actually receiving. They weight a random sample of 30 truckloads and the results (in tonnes) are listed below. 22.70 20.98 21.30 21.09 22.60 22.29 20.75 22.11 22.62 21.50 22.17 22.16 18.25 19.57 21.55 22.23 21.12 21.16 19.46 19.45 20.06 17.13 19.04 19.73 21.72 19.06 20.37 20.12 20.59 21.27

Exercise 3 In a sugar factory each day's production of 1kg packs is sampled and weighed as part of the factory's quality control process. A summary of one day's sample appears in the table below, along with the statistics that the foreman calculated from the sample. Unfortunately, the data for one class was accidentally erased. Weight (g) 970 &U 980 980 &U 990 990 &U 1000 1000 &U1010 1010 &U 1020 1020 &U 1030 Freq., fj 5 18 ?? 13 7 4

Mean = Median = Standard Deviation =

997.82 grams 986.43 grams 67.32 grams

(a) As soon as the quality manager sees the foreman's statistics she suspects that the foreman must have made some calculation errors. Which (if any) statistics are clearly wrong, and why?

CRICOS Provider Code 00586B

BEA140 Quantitative Methods – Mod2 Wkshp Exercise 4 Explain and distinguish between the following as methods of displaying data. What display mode is most suited to particular types of data ? a) b) c) d) a bar chart, a histogram, a run chart, a pie chart.

2

Exercise 5 a) Explain what is meant by (i) categorical data, (ii) quantitative data. b) Explain the meaning of nominal, ordinal, interval and ratio measures, give examples of each, and explain to which of the two data types these measures belong. Some Multiple Choice Problems Exercise 6 If two populations had means of 5 and 7 respectively, the mean of the combined populations would be (a) (d) 6 (b) 12 (c) between 5 and 7 12 divided by the sum of the number of elements in both populations.

Exercise 7 Three types of apples are blended into a juice concentrate A - 40 %, priced at \$ 2.20 per kg, B - 35 %, \$ 2.80 per kg, C - the rest, \$ 1.60 per kg. The cost per kg of ingredients in the concentrate is (a) \$ 2.20 (b) \$ 2.26 (c) \$ 6.60 (d) \$ 2.15

Exercise 8 A library has data on the number of days each book on loan has been out. The data values range from 1 day to 238 days, with most of the books being out for fewer than 7 days. The best measure of the centre of this data is (a) mode, (b) mean, (c) median, (d) coefficient of variation.

Exercise 9 For a skewed distribution, the best measure of central tendency to report usually is (a) mean, (c) mode, (b) median, (d) depends upon the direction of skewness.

Exercise 10 For which of the following types of distribution is the mean larger than the median ? (a) symmetrical, (c) negatively skewed, (b) positively skewed, (d) bimodal.

Exercise 11 Two populations, A and B, have the same coefficient of variation. If A's mean is four times as large as B's, then A's standard deviation is (a) (c) twice as large as B's, four times as large as B's, (b) (d) half as large as B's, unknown from the given information.

Exercise 12 A student received a standard score of 1.5 on a test, for which the class variance was 16. If the student's mark was 88, the class average on the test was (a) 64 (b) 70 (c) 82 (d) 94

CRICOS Provider Code 00586B

BEA140 Quantitative Methods – Mod2 Wkshp Exercise 13 There are seven members of the crew of a space shuttle. On Earth they weigh 98, 77, 63, 101, 85, 49, 94 kgs. Find the mean, median, standard deviation and range of their weights (on Earth).

3

Exercise 14 For each country in a random sample of seven countries, a researcher looked up the average annual income (X) and life expectancy (Y) for female individuals. The researcher wishes to see if income can be used to help predict life expectancy. The data appears in the table below:
X (\$US ‘000, 2004) Y (years) 19 45 23 47 25 56 27 68 28 67 31 81 32 82

a) b) c) d) e)

Construct a scattergram & comment of suitability of OLS to model the relationship. Determine slope and intercept of OLS line of best fit. Determine and interpret standard error of the estimate. Determine and interpret coefficient of determination. Predict the life expectancy of women in a country where the average female income is \$US 5,000. Comment.

Exercise 18 The manager of a department store is investigating the relation between the number of salespersons in duty during a week, X, and the value of merchandise lost (shrinkage) per week, Y in \$'000. The investigation was done over 12 consecutive weeks, with a different number of sales staff on duty each week, but constant during each week. Week X Y (a) (b) (c) (d) (e) (f) (g) 1 20 2.0 2 21 2.3 3 22 1.8 4 23 1.7 5 24 1.7 6 25 1.2 7 25 1.7 8 20 2.6 9 22 2.1 10 24 1.1 11 26 1.0 12 28 0.4

Construct a scattergram for the data. Determine the linear relation between number of sales staff and value of shrinkage per week, and plot this on the scattergram diagram. Find the standard error of the estimate for this data. Interpret the coefficient of determination between the two variables. What is the correlation coefficient ? Predict the shrinkage per week when there are 27 staff on duty. If the store management, having done the analysis, decided to employ a 29 th salesperson, but not a 30 th, what might this imply about the marginal cost of hiring the 29 th person ? If additional part-time floor-walkers could be hired during peak periods at a cost of \$ 50 per week, how many extras would the firm hire ?

Exercise 19 Which of the following is not a reason why we need to be careful about ascribing causation from X to Y in regression analysis (a) the direction of causation may run from Y to X, (b) the observed association may be due entirely to chance, (c) the true relationship between X and Y may not be linear, (d) the observed association between the two variables may be caused by another variable that affects both the X and the Y variable. Exercise 20 A sample coefficient of determination equal to 0.50 implies that: (a) a one unit change in the independent variable results in a 0.50 unit change in the dependent variable, (b) 50% of the variation in the dependent variable around its mean can be accounted for by the changes in the independent variable, (c) 50% of the variation in the independent variable around its mean can be accounted for by changes in the dependent variable, (d) 50% of the variation in the independent variable around its mean can be explained by changes in the dependent variable. Exercise 21 If the true relationship between two variables was inverse, the (a) sample intercept would be negative, (b) population intercept would be negative, (c) sample slope coefficient would be negative, (d) population coefficient of correlation would be negative. CRICOS Provider Code 00586B

BEA140 Quantitative Methods – Mod2 Wkshp Exercise 22 If every data point lies on the regression line, then the sum of squared errors is equal to _____, and the coefficient of determination is equal to _____ Exercise 23 Two tests in statistics for a class of eight students at The Tech produced the coded mark distributions below. It is proposed to use Test 1 as a predictor of performance in Test 2. Student Test 1 Test 2 A 4.0 3.5 B 3.5 2.0 C 3.0 3.5 D 3.0 3.0 E 2.5 2.0 F 2.0 2.5 G 3.5 3.5 H 4.0 3.0

4

a) Determine the estimated slope of the regression line for the data from The Tech. b) Determine the estimated intercept of the regression line for The Tech data. c) If for this data Student K scored 1.0 higher on Test 1 than Student L, determine the expected difference between their scores on Test 2 (rounded to the nearest 0.05). d) If Student M scored 2.25 on Test 1,what is the prediction for Test 2 (rounded to the nearest 0.05)? e) Determine the sample standard error of estimate. f) Determine the sample coefficient of correlation. Exercise 26 A financial planner is asked to rank a random sample of 10 investment portfolios. As a check, you also do some research to determine how much commission they would receive on each portfolio and you rank the portfolios by commission. The data appear below: Porfolio FP Place Comm Place a) A 5th 3rd B 2nd 1st C 7th 6th D 4th 2nd E 9th 9th F 1st 5th G 10th 8th H 2nd 4th I 8th 9th J 6th 7th

Being careful to resolve any ties, determine the Spearman Rank Correlation Coefficient and comment on its value.

Exercise 27 A random sample of 9 students received the following awards in economics and accounting. Student Eco. Acc. 1 HD DN 2 DN DN 3 DN CR 4 CR HD 5 CR FP 6 PP PP 7 PP NN 8 FP PP 9 NN PP

a) Convert the awards to rankings, resolving any ties. b) Calculate Spearmans Rank Correlation Coefficient.

CRICOS Provider Code 00586B

BEA140 Quantitative Methods – Mod2 Wkshp Solutions - Module 2 Exercise 1 a) There are a number of ways of constructing a stem and leaf diagram, given the number of significant digits: - ignore final digit (i), - round back to second last digit (ii), - use double digit leaves (iii), as illustrated below: (i) Stem 16 17 18 19 20 21 22 23 | | | | | | | | Leaf(*0.1) 1 2 0 0 0 1 0 1 1 6 6 8 8 0 30 0 1 2 8 14 22 30 30 (ii) Stem 16 17 18 19 20 21 22 23 | | | | | | | | Leaf(*0.1) 1 2 0 1 0 1 0 1 1 6 5 9 8 0 30 0 1 2 8 13 22 30 30

5

0 1 1 1

4 3 1 1

4 5 2 2

5 7 3 2

7 9 5 5 7 6 6 7

1 1 1 2

4 4 1 2

5 6 2 2

6 7 8 3 3 5 6 7 3 6 6 7

(iii) Stem Leaf(*0.001) 16 17 18 19 20 21 22 23 | | | | | | | | 13 25 04 06 09 11 0 1 1 6 6 8 8 0 30 0 1 2 8 14 22 30 30

06 12 12 16

45 37 16 17

46 59 27 23

57 75 30 29

73 98 50 55 72 60 62 70

The stem and leaf diagram(s) indicates that the sample of truck loads of blue stone follow a left (negative) skewed distribution. b) Distribution of Truck Load Weights (tonnes)
j 1 2 3 4 5 6 7 8 Weight (tonnes) 16 &U 17 17 &U 18 18 &U 19 19 &U 20 20 &U 21 21 &U 22 22 &U 23 23 &U 24 Class Mark, xj 16.5 17.5 18.5 19.5 20.5 21.5 22.5 23.5 Freq, fj 0 1 1 6 6 8 8 0 30 Cum Freq, Σfj 0 1 2 8 14 22 30 30 Rel Cum Freq 0.000 0.033 0.067 0.267 0.467 0.733 1.000 1.000 f j xj 0.0 17.5 18.5 117.0 123.0 172.0 180.0 0.0 628.0 f j xj 2 0.00 306.25 342.25 2281.50 2521.50 3698.00 4050.00 0.00 13199.50

The last three columns have been included for use in subsequent parts of this exercise.

Histogram of Weights of Truck Loads
9 8 7 6 5 4 3 2 1 0

16 &U 17 17 &U 18 18 &U 19 19 &U 20 20 &U 21 21 &U 22 22 &U 23 23 &U 24 Weight (tonnes)

CRICOS Provider Code 00586B

BEA140 Quantitative Methods – Mod2 Wkshp c) Ogive of truck load weights

6

Ogiv e of Truck Loads 1 0.9 0.8 0.7 Cum. Rel. Freq. 0.6 0.5 0.4 0.3 0.2 0.1 0 16 18 20 Weight (Tonnes) 22 24 22.0625

From the chart we can see that 25% of truck loads exceed 22.0625 tonnes. (The exact figure was determined by linear interpolation. Students familiar with process can check that they can obtain the same figure.)

d) Summary measures from raw data: mean = Σx/n = 624.15/30 = 20.805 tonnes median position = (30+1)/2 = 15.5th => average of 15th and 16th from ordered array. 15th = 21.09, 16th = 21.12 => median = 21.105 tonnes mode: each value occurs only once, so there is no mode (i.e. no most common value.) range = max – min = 22.70-17.13 = 5.57 tonnes std dev: s2 = (Σx2 – (Σx)2/n)/n-1 = (13041.6423 – 624.152/30)/29 = 1.9379845 (NB use sample formula) => s = √1.9379845 = 1.3921 tonnes e) f) As the median (21.105) > mean (20.805) this indicates a negative (or left) skewed distribution, as we have already seen with the histogram. In a negative skewed distribution, extreme values are more likely to small than large. Summary measures from frequency distribution: mean = Σfx/n = 628/30 = 20.933 tonnes median position = 15.5 (as in (d)), which falls in the class 21 &U 22. There are 14 values less than 21 and there are 8 values in the 21 &U 22 class. The 15.5th value is (15.5 – 14) = 1.5 positions into the class, and the class width is 1.00 So, median = 21+1.5/8 * 1= 21.1875 tonnes modal class: both the 21 &U 22 and the 22 &U 23 classes have eight values so these are the two modal classes std dev: s2 = (Σfx2 – (Σfx)2/n)/n-1 = (13199.50 – 628.02/30)/29 = 1.84023 (NB use sample formula) => s = √1.84023 = 1.3566 tonnes The summary measures are similar but different to those obtained using the raw data. When we use grouped data we assume that the class marks are representative of their classes, and this is rarely exactly the case. For example, in the 19 &U 20 class the raw values are 19.04, 19.06, 19.45, 19.46, 19.57, 19.73 which average 19.385, and thus using our class mark of 19.5 introduces some error in the group data calculations. (NB If you chose different classes you may have got slightly different results.) Exercise 2 In this exercise assume speed is continuous => class marks will be class mid points. Speed 0 &U 5 5 &U 10 10 &U 15 15 &U 20 20 &U 25 fi 2 8 5 2 1 18 Class Mark, xi 2.5 7.5 12.5 17.5 22.5 fixi 5.0 60.0 62.5 35.0 22.5 185.0 fixi2 12.50 450.00 781.25 612.50 506.25 2362.5 cum freq 2 10 15 17 18

CRICOS Provider Code 00586B

BEA140 Quantitative Methods – Mod2 Wkshp mean = Σfx/n = 185 / 18 = 10.27777778 = 10.28 to 2 decimal places. Std Dev: s2 = (Σfx2 -(Σfx)2/n)/(n-1) = (2362.5 - 1852/18)/17 = 27.12418300 => s = 5.2080882 = 5.21 to 2d.p. z score: z = x ? x = 19 ? 10.27777778 = 1.674745 = 1.67 to 2 d.p. (i.e. 19 is 1.67 std devs above the mean.) s 5.2080882 median position = (n+1)/2 = 9.5 i.e median is in 5&U10 class. median = LCL + width * (how far into class)/(how many in class) = 5 + 5 * 7.5/8 = 9.6875 = 9.69 to 2 d.p.

7

Exercise 3 Median is wrong as there are no more than 23 values below it, but at least 24 above it => its not in the middle. Std Dev is likely to be wrong as 3 to 8 std devs usually span range - not 1 std dev. Mean looks a fairly central value and may well be OK. Exercise 4 a) Bar chart - A chart in which the length (or height) of bars is used to represent the frequencies (either relative or absolute) of the items represented by the bars. May be horizontal or vertical. Most appropriate for categorical (qualitative) or discrete variables. Also used where components of totals are to be shown - stacked bars, or multiple (side by side) bars. b) Histogram - A particular type of bar graph plotted using real class limits as boundaries for the purposes of presenting a distribution of a numerical variable. Heights of bars represent frequencies (absolute or relative) in each interval. c) Run Chart - A line chart in which the horizontal axis represents time (or some other progressive variable). Often used to identify / display temporal nature in the variable of interest, such as trends, periodicity, fluctuations, falls etc…. d) Pie charts - Pictorial device to show proportion of recorded data falling in different categories. Usually for qualitative or count data. An intuitively appealing medium when aiming at a low numeracy target audience. Exercise 5 a) Categorical data (i) Data refers to a characteristic of items - literally we place items into different categories. Categories are mutually exclusive, (e.g. agree, disagree or have no opinion about a statement). Arithmetic operations restricted to counting. Measurement scales are nominal or ordinal. Quantitative data (ii) Numerically measured characteristic of an object, e.g. the number of successes in a given number of trials, or a dimension such as time, weight, length of a observable phenomenon. Measures can be discrete or continuous on interval or ratio scales. b) Nominal measures (i) A scale of names describing the object, e.g. Sharp, Casio calculator // Male, Female // The colour of houses… Categorises. Arithmetic operations are restricted to counting. In particular there is no order implied across the categories. Ordinal measures (ii) Indicates relative order or rank as well as description, e.g. heavy smoker, light smoker, non-smoker; street numbers. Responses can only be counted. Danger if numbers are ascribed to orders and arithmetic operations are performed, e.g. if arbitrary values 2, 1, 0 are assigned to each of these, does a heavy smoker consume twice as much ? (Or even that the step from heavy to light is the same size as the step from light to non.) SETL scores and market research surveys have similar problem. Interval measures (iii) Indicate the magnitude of a phenomenon, permits a determination of difference between measures, i.e. subtraction is a permissible operation. A temperature of 15 C is cooler than one of 30 C, but it is incorrect to suggest that 30 C is twice as hot as 15 C. However the difference between 15C and 30C is the same as the difference between 40C and 55C. An arbitrary zero can be set, e.g. 0 C, but does not mean that there is an absence of heat. Ratio measures (iv) A natural zero point exists, e.g. sales of zero has meaning. (zero means none!) Sales of \$ 4 m is twice as much as sales of \$ 2 m. Exercise 6 (c) The weighted mean must be between the two component means. Consider populations A, B with one element, 5, and four elements 7, 7, 7, 7, respectively. Then mean of combined population is ΣX/n = 33/5 = 6.6. Exercise 7 (b) Weighted average cost of ingredients is (0.4*2.2) + (0.35*2.8) + (0.25*1.6) = 0.88 + 0.98 + 0.40 = \$ 2.26 per kg NB This follows from the formula for mean of grouped data: CRICOS Provider Code 00586B

BEA140 Quantitative Methods – Mod2 Wkshp mean = (Σfixi)/n = Σ{xifi/n} = Σ{xi(fi/n)} = Σwixi , where the weights wi = fi/n are merely the proportion falling into each class. Exercise 8 (c) In the presence of extreme outliers, the median provides the most meaningful indicator of centrality. The mode may be very low, e.g. perhaps less than 7 days, quite possibly one ,but it could freakishly be, say, 60 days. The mean will be affected by the 238-day overdue book, which will be a strong distorter. The median will fall into the middle of the range, in the sense that there are as many books overdue for fewer days as there are books overdue by more days. The coefficient of variation does not measure centrality. For example suppose the data looked like: 1 1 1 1 1 2 2 mode =1, mean = 17.11, median = 3 In this case, mean and mode are not very effective at giving a feel for a characterising or typical value.

8

2

3

3

3

4

4

5

5

7

25

238

Exercise 9 (b) The median is often the preferred measure of central tendency in skew distributions, especially when there is heavy skewing. The reason follows on from Exercise 8. Where there is substantial skew we have the chance that extreme values will occur that will exert such influence on the mean as to dilute its power as a representative measure. A good real life example is the reporting of house prices, with the Real Estate Institute choosing to quote "median house price". The occasional very expensive property that is sold will influence the mean causing mean house prices to fluctuate month-to-month and potentially causing uncertainty in the market place (which the Institute would like to avoid). Often the median will fall between the mode (at the peak of the distribution), and the mean (which lies toward the tail of the distribution). See B&L 6th edition p 142 - 144, or 7th edition p 151-152 Exercise 10 (b) In a positively skewed distribution the mean will exceed the median. If the data are symmetrical, the median and mean coincide. If negatively skewed, mean is less than median. If bimodal, there is no consistent relation between median and mean. Exercise 11 (c) For populations, CV = σ/μ. If μA = 4μB, then for CVA = CVA, σA = 4σB - if the denominator is increased by a factor of four, so must be the numerator, for the value of the fraction to remain constant. Exercise 12 (c) A standard score is defined to be z = (X - μ )/σ Then if z = 1.5, σ = 4, and X = 88, then μ = X - zσ = 88 - 4*1.5 = 82. Exercise 13 NB We have all the crew, so we have a population rather than a sample. Let X ≡ weight of crew member in kg. We have ΣX = 567 and ΣX2 = 48165, N=7 Sorted: 49, 63, 77, 85, 94, 98, 101 Mean: μ = ΣX/N = 567/7 = 81.00 kg σ2 = (ΣX2 – (ΣX)2/N)/N = (48165 - 5672/7)/7 = 319.7142857 => σ = 17.880556 = 17.88 to 2 decimal places range = max – min = 101-49 = 52, = 52.00 to 2 decimal places. Median position = (n+1)/2 = 4 => median is 85, = 85.00 to 2 decimal palces.

CRICOS Provider Code 00586B

BEA140 Quantitative Methods – Mod2 Wkshp Exercise 14 a) The scattergram suggests that a linear regression is Income vs Life Expectancy feasible. (Strong (+)ve relationship.) 90 b) We have n=7, ΣX=185, ΣY=446, ΣX2=5013, 80 ΣY2=29768, ΣXY=12183 70 60 Solving, b = n∑XY - ∑X∑Y 50 2 2 40 n∑X - (∑X) 30 20 = (7*12183-185*446) 10 (7*5013–185*185) 0 0 10 20 30 40 = 3.199769053 = 3.20 to 2 d.p. Incom e (\$US '000) a = (∑Y-b∑X)/n = (446-3.199769053*185)/7 = -20.85103926 = -20.85 to 2 d.p. 2 = {∑Y2 - a∑Y - b∑XY} / (n-2) = (29768+20.85103926*446-3.199769053*12183)/5 c) se = 16.95542746 => se = 4.117696864 = 4.12 to 2d.p. (Around 2/3 of the time out prediction will be correct within +/- \$US 4120) 2 2 d) SST = ∑Y -(∑Y) /n = 29768-4462/7 = 1351.428571 r2 = 1- SSE/SST = 0.937268504 = 0.94 to 2 d.p. (Around 94% of variation in life expectancy can be explained by its relationship with income.) (NB SSE=84.77713626 was determined as the bracketed bit in part c).) e) When x=5, the predicted value of y is yc=-20.85103926+3.199769053*5 = -4.852194355 = -4.85 to two decimal places. It doesn’t make sense to have a life expectancy of -4.85 years. The problem here is that we have extrapolated beyond the range of the data upon which the model was based. In effect we have assumed that the linear relationship continues even though we have no 3 evidence that it does.
Years

9

2.5 2

Exercise 18 (a) Relation is fairly tight and negative. Linear relationship is feasible, and thus it is reasonable to proceed with regression (b) Accumulations are : n = 12, ∑X=280, ∑Y = 19.6, ∑X2 = 6600, ∑Y2 = 36.18, ∑XY = 442.1 Solving, b =

1.5 1 0.5 0 15 20 25 30

n∑XY - ∑X∑Y = {12(442.1)-280*19.6}/{12(6600)-2802} n∑X2 - (∑X)2 = -182.8/800 = -0.2285 a = (∑Y-b∑X)/n = (19.6-0.2285(280))/12 = 6.9650

Regression equation is Yc = 6.9650 - 0.2285X (c) se2 = {∑Y2 - a∑Y - b∑XY} / (n-2) = {36.18-6.965(19.6)+0.2285(442.1)}/10 = 0.68585 / 10 = 0.068585, from which se = 0.2619 This means that about 2/3 of the points are within \$262 of the regression line, or that about 95% are within \$524 of the line. (d) 2 2 From the accumulations, SST = ∑Y -(∑Y) /n = 4.166667, SSE = 0.68585, so r 2 = 1 - 0.68685 / 4.166667 = 0.835396, which means about 84 % of variation in Y about its mean is explained by variations in X. The correlation coefficient is r = - 0.9140. CRICOS Provider Code 00586B

BEA140 Quantitative Methods – Mod2 Wkshp (e) When there are 27 staff on duty, expected shrinkage is 6.965 - 27*0.2285 = 0.7955, i.e. \$ 795.50 per week.`This is within the range of given values for X (i.e. interpolation) , so the prediction is reasonable.

10

(f)

For each additional staff employed, the regression line predicts an associated decrease in shrinkage of \$ 228.50 per week. The marginal cost of hiring the 30 th person must exceed this for the store to deem it not worthwhile. Presumably the marginal hiring cost is a rising function, and the 29 th employee had a marginal hiring cost of less than \$ 228.50. This assumes the shrinkage saved function continues linear beyond X = 28. (i.e. extrapolation) NB The linear relationship MUST breakdown sooner or later, otherwise Y will become negative for sufficiently large values of X. i.e. If we have enough staff shrinkage will become negative, that is stock will start appearing from nowhere!!!!!! Whether the firm hires any part-time staff will depend upon the behaviour of the shrinkage saved function for staff levels beyond 28, and how the function might change when peak-hour staff only are being hired. The firm will continue to increase hiring of part-time floorwalkers while the saving per week exceeds \$ 50. If the amount saved per week from prospectively hiring the k th floorwalker is less than the weekly hiring cost of that person, no further hiring will occur.

(g)

Exercise 19 (c) Regression analysis measures the association between two or more variables, but does not impute any necessary causation. Each of (a), (b) and (d) are possible. The shape of the relation is irrelevant, and even if a non-linear relation exists, direction of causation is not determined by the regression process. Exercise 20 (b) If r 2 = 0.5, then 50 % of variation in the dependent variable can be explained by variation in the independent variable. Exercise 21 (d) If Y = A - BX, then the population correlation coefficient will be negative, as the correlation coefficient has the same sign as the slope. It is possible for the sample slope for a given set of observations to be positive, even though the population slope is negative. Note that the word "true" in the question indicates that we are interested in the population. Often we don't have access to the population and we use a sample. Typically then we use the sample correlation coefficient as an estimate of the TRUE (i.e. population) value. Exercise 22 For each observation, Y - Yc = 0. Then SSE = ∑(Y - Yc)2 = 0, and all variation is explained, so that r2 = 1 - SSE/SST = 1 Exercise 23 (little working) Accumulations (denoting Test 1 and Test 2 results by X and Y respectively) : ∑X = 25.5, ∑Y = 23, ∑X2 = 84.75, ∑Y2 = 69, ∑XY = 74.75. Assuming the results for Test 1 were to be used to predict those for Test 2, regression line is Yc = 1.5541 + 0.4144X, i.e. slope = 0.4144, intercept = 1.5541. a) slope = 0.4144 b) intercept = 1.5541 c) A difference of 1.5 between students on Test 1 will lead to a prediction of a difference of 0.4144*1.5 = 0.62, or 0.6 rounded, on Test 2. d) By substitution in above regression equation, prediction is 2.4865 = 2.50 rounded. e) se2 = SSE / 6 = 2.2793 / 6 = 0.3799, se = 0.6163 f) r2 = 1 - SSE / SST = 1 - 2.2793 / 2.8750 = 0.2072, from which r = + 0.4552.

CRICOS Provider Code 00586B

BEA140 Quantitative Methods – Mod2 Wkshp

11

Exercise 26 Only 4 of the 20 data entries are involved in ties, so it is OK to use the short formula. The equal seconds in FP together occupy the 2nd and 3rd rank, so we replace them by 2.5. The equal ninths in Comm together occupy the 9th and 10th rank, so we replace them by 9.5. The table below follows: Porfolio A B C D E F G H I J FP Place 5th 2nd 7th 4th 9th 1st 10th 2nd 8th 6th FP Rank 5 2.5 7 4 9 1 10 2.5 8 6 55 Comm Place 3rd 1st 6th 2nd 9th 5th 8th 4th 9th 7th Comm Rank 3 1 6 2 9.5 5 8 4 9.5 7 55 d 2 1.5 1 2 -0.5 -4 2 -1.5 -1.5 -1 0 d2 4 2.25 1 4 0.25 16 4 2.25 2.25 1 37

6Σd2 = 1 – 6*37/(10*99) = 0.776, n(n2-1) indicating a strong correlation between portfolio rating and commission received. One would have to wonder whether one would be receiving unbiased advice. Later in the unit we will learn to test whether this apparent association could be reasonably attributed to chance. rs = 1 – Exercise 27 a) Sorting and resolving ties gives, the following ranks: Eco HD 1 1 HD 1 1 DN 2 2.5 DN 2 2.5 DN 3 2.5 DN 3 2.5 CR 4 4.5 CR 4 4 CR 5 4.5 PP 5 6 PP 6 6.5 PP 6 6 PP 7 6.5 PP 7 6 FP 8 8 FP 8 8 NN 9 9 NN 9 9

Acc

b) Because there are many ties we need to use the long formula for rs. Our data can be tabulated as follows: Eco Award HD DN DN CR CR PP PP FP NN Eco Rank,X 1 2.5 2.5 4.5 4.5 6.5 6.5 8 9 45 Acc Award DN DN CR HD FP PP NN PP PP Ac Rank, Y 2.5 2.5 4 1 8 6 9 6 6 45 X2 1 6.25 6.25 20.25 20.25 42.25 42.25 64 81 283.5 Y2 6.25 6.25 16 1 64 36 81 36 36 282.5 XY 2.5 6.25 10 4.5 36 39 58.5 48 54 258.75

Then rs = (9*258.75-45*45)/√((9*283.5-452)(9*282.5-452)) = 0.582 Indicating moderately strong positive correlation. NB (Using short cut formula gives 0.5958)

CRICOS Provider Code 00586B

2017年2月份workshop参加者费用列表+2016.11.13
2017年2月份workshop参加者费用列表+2016.11.13 - 2017 年 2 月份 workshop 参加者费用列表 时间 所需费用 建筑参观旅行费用(可选) 参加人员 Works...
BEA WebLogic Workshop使用笔记之二
BEA WebLogic Workshop 使用笔记之二 -- EJB 开发篇 http://www.uml.org.cn/j2ee/200707102.asp 作者:李巍 来源:dev2dev 续上篇,通过本文的学习,读者将能...
Workshop 2- Introducing Economics(1)
ECONOMICS Workshop 2 Demand and Supply John Sloman 1. The following passage refers to the operation of a free-market economy. Delete the words (in ...
Workshop 2 QuestionsBMAN21040 Intermediate Manageme...
Workshop 2 QuestionsBMAN21040 Intermediate Management Accounting_英语学习_外语学习_教育专区。BMAN21040 Intermediate Management Accounting 2011/2012Work...

...5 Now and Then—Communication Workshop 第2课时教...
Unit 5 Communication Workshop 第 2 课时教学设计 基本信息 课题 学科 教材 设计教师 Unit 5 Communication Workshop(2) 英语 年级 七年级 北师大版《初中英语...

...5 Now and Then—Communication Workshop 第2课时教...