Critical Thinking Exercise and SAS project

Critical Thinking Exercise and SAS project

Completed SAS project you will upload the following three items:

1. The DOCX file with the original assignment and rubric (all fields completed).

2. The XLSX file you downloaded with the addition of the tab with the Scatterplots on it and the Regression Output tab for the regression of Price with the three independent variables.

3. A PDF file you produce from SAS that shows the output of your final regression (with a higher R2 than we had with the original model).

SAS® Forecasting Project for Critical Thinking

This project utilizes the “Real Estate – Base” database. The purpose is twofold:

Build critical thinking skills needed to structure data analysis appropriately for effective decision

making.

Analyze available data practically and skillfully in order to build an explanatory regression model.

The Real Estate – Base database includes the following variables for 101 homes (* NOTE: These variables

are shown as qualitative variables within the database):

a.

*Unit#

(An assigned database key)

b. *Type

(H = House, C = Condo/Apartment)

c.

*Location

(1 through 10 – voting district where located)

d. *U/S/R

(Urban vs. Suburban vs. Rural location)

e. Price

(The price the house ended up selling for in 2017)

f.

Sq. Ft.

(Heated/Cooled & Attached square footage)

g.

Lot (Acres)

(Acreage of property)

h. Garage

(Number of attached covered and/or enclosed parking positions)

i.

BRs

(Number of qualified bedrooms)

j.

Baths

(Number of bathrooms – no tub or shower indicated as .5)

k.

*Pool

(No=No Access; HA=Shared Pool; AG=Above Ground; IG=In Ground)

l.

Age

(Age of home in rounded year at end of 2017)

At a high level, here are the steps you are going to perform:

1. Download the Excel spreadsheet with the Real Estate Data in it and create the requested

Scatterplots. NOTE: It is important that the Dependent Variable (Price) is on the Y-axis and the

Independent Variable is on the X-axis. The order of the two columns will dictate that.

2. Perform Regression Analysis within Excel to determine how well the prescribed Independent

Variables explain changes in the Dependent Variable.

3. Upload the Real Estate dataset into SAS Studio.

4. Perform a series of Regression Analyses in SAS Studio to find a better set of explanatory

variables.

5. Answer a critical thinking exercise regarding forecasting and the data set we have.

Here are the steps in detail:

1. Create the following charts in Excel using the charting tools and the indicated variables in “Real

Estate – Base.xlsx” (Remember, Price is your Dependent Variable)

a.

Create a new tab in the spreadsheet called “Scatterplots”. After creating each

Scatterplot on the original tab, move it to the Scatterplot tab you created.

b. Create a Scatterplot using the variables Price and Sq. Ft.

c.

Create a Scatterplot using the variables Price and Lot (Acres).

d. Create a Scatterplot using the variables Price and Garage.

e. Create a Scatterplot using the variables Price and BRs.

f.

Create a Scatterplot using the variables Price and Baths.

g.

Create a Scatterplot using the variables Price and Age.

2. What sort of relationship do you see between these variables based on the scatterplots?

a.

Between Price and Sq. Ft. (Circle)?

No relationship Weak Moderate Strong

b. Between Price and Lot (Circle)?

No relationship Weak Moderate Strong

c.

Between Price and Garage (Circle)?

No relationship Weak Moderate Strong

d. Between Price and BRs (Circle)?

No relationship Weak Moderate Strong

e. Between Price and Baths (Circle)?

No relationship Weak Moderate Strong

f.

Between Price and Age (Circle)?

No relationship Weak Moderate Strong

3. In the Excel spreadsheet provided, using the Data Analysis Add-in, run a regression analysis with

Price as the Dependent Variable and Lot, Garage and BRs as the Independent Variables and

select to have Excel create a new tab called “Regression Model”. It is recommended that you

run individual regressions with each variable alone to see how strong each R

2

is.

4. Provide the following from the “Excel Model”:

a.

Coefficient of Determination (R-squared)

___________________

b. Y-Intercept for the Regression Model

___________________

c.

Slope value for X1 (Lot)

___________________

d. Slope value for X2 (Garage)

___________________

e. Slope value for X3 (BRs)

___________________

5. Do you think we need all three current Independent variables in our Regression model to

predict changes in Price (Circle)? Yes No

Explain: _________________________________________________________________________

_______________________________________________________________________________

_______________________________________________________________________________

6. Which variable(s) would you remove (Circle)?

Lot Size

Garage BRs

7. Of the following variables in the spreadsheet, which variable would you select next to add to the

model (i.e., you think it would create a stronger prediction of Price)?

Type Location U/S/R Sq. Ft. Baths Pool Age

8. Run a SAS Regression Model on the Real Estate – Base database using Price as the Dependent

Variable (Y) and include the original Independent Variables (minus any you removed in step 6)

and adding the variable you chose in step 7. Print your model output and turn it in with the

assignment. (NOTE: You may have to repeat this exercise until you find a combination of

variables that gives you a higher R

2

).

9. Provide the following from the SAS Model:

a.

Coefficient of Determination (R-squared).

________________________

b. Y-Intercept for the Regression Model

________________________

c.

Slope value for each of your Independent Variables.

i. Var_______________________ ________________________

ii. Var_______________________ ________________________

iii. Var_______________________ ________________________

iv. Var_______________________ ________________________

v. Var_______________________ ________________________

10. Did your SAS model provide a stronger Coefficient of Determination (Circle)? Yes No

Critical Thinking Question:

11. A large real estate company is trying to use similar data plus their own sales data to forecast

total sales for the coming year for each of their agents and they have pulled data from their

Finance records. They are trying to assemble the best data to build a Regression model.

a.

Would it make sense to use the same data as we used above in the SAS model? Why or

why not?

__________________________________________________________________________________

__________________________________________________________________________________

b. Recommend two data elements you think they probably have available to help them

predict sales for each of their sales people.

1. ______________________________________________

2. ______________________________________________