Critical Thinking Exercise and SAS project
Critical Thinking Exercise and SAS project
Completed SAS project you will upload the following three items:
1. The DOCX file with the original assignment and rubric (all fields completed).
2. The XLSX file you downloaded with the addition of the tab with the Scatterplots on it and the Regression Output tab for the regression of Price with the three independent variables.
3. A PDF file you produce from SAS that shows the output of your final regression (with a higher R2 than we had with the original model).
SAS® Forecasting Project for Critical Thinking
This project utilizes the “Real Estate – Base” database. The purpose is twofold:
–
Build critical thinking skills needed to structure data analysis appropriately for effective decision
making.
–
Analyze available data practically and skillfully in order to build an explanatory regression model.
The Real Estate – Base database includes the following variables for 101 homes (* NOTE: These variables
are shown as qualitative variables within the database):
a.
*Unit#
(An assigned database key)
b. *Type
(H = House, C = Condo/Apartment)
c.
*Location
(1 through 10 – voting district where located)
d. *U/S/R
(Urban vs. Suburban vs. Rural location)
e. Price
(The price the house ended up selling for in 2017)
f.
Sq. Ft.
(Heated/Cooled & Attached square footage)
g.
Lot (Acres)
(Acreage of property)
h. Garage
(Number of attached covered and/or enclosed parking positions)
i.
BRs
(Number of qualified bedrooms)
j.
Baths
(Number of bathrooms – no tub or shower indicated as .5)
k.
*Pool
(No=No Access; HA=Shared Pool; AG=Above Ground; IG=In Ground)
l.
Age
(Age of home in rounded year at end of 2017)
At a high level, here are the steps you are going to perform:
1. Download the Excel spreadsheet with the Real Estate Data in it and create the requested
Scatterplots. NOTE: It is important that the Dependent Variable (Price) is on the Y-axis and the
Independent Variable is on the X-axis. The order of the two columns will dictate that.
2. Perform Regression Analysis within Excel to determine how well the prescribed Independent
Variables explain changes in the Dependent Variable.
3. Upload the Real Estate dataset into SAS Studio.
4. Perform a series of Regression Analyses in SAS Studio to find a better set of explanatory
variables.
5. Answer a critical thinking exercise regarding forecasting and the data set we have.
Here are the steps in detail:
1. Create the following charts in Excel using the charting tools and the indicated variables in “Real
Estate – Base.xlsx” (Remember, Price is your Dependent Variable)
a.
Create a new tab in the spreadsheet called “Scatterplots”. After creating each
Scatterplot on the original tab, move it to the Scatterplot tab you created.
b. Create a Scatterplot using the variables Price and Sq. Ft.
c.
Create a Scatterplot using the variables Price and Lot (Acres).
d. Create a Scatterplot using the variables Price and Garage.
e. Create a Scatterplot using the variables Price and BRs.
f.
Create a Scatterplot using the variables Price and Baths.
g.
Create a Scatterplot using the variables Price and Age.
2. What sort of relationship do you see between these variables based on the scatterplots?
a.
Between Price and Sq. Ft. (Circle)?
No relationship Weak Moderate Strong
b. Between Price and Lot (Circle)?
No relationship Weak Moderate Strong
c.
Between Price and Garage (Circle)?
No relationship Weak Moderate Strong
d. Between Price and BRs (Circle)?
No relationship Weak Moderate Strong
e. Between Price and Baths (Circle)?
No relationship Weak Moderate Strong
f.
Between Price and Age (Circle)?
No relationship Weak Moderate Strong
3. In the Excel spreadsheet provided, using the Data Analysis Add-in, run a regression analysis with
Price as the Dependent Variable and Lot, Garage and BRs as the Independent Variables and
select to have Excel create a new tab called “Regression Model”. It is recommended that you
run individual regressions with each variable alone to see how strong each R
2
is.
4. Provide the following from the “Excel Model”:
a.
Coefficient of Determination (R-squared)
___________________
b. Y-Intercept for the Regression Model
___________________
c.
Slope value for X1 (Lot)
___________________
d. Slope value for X2 (Garage)
___________________
e. Slope value for X3 (BRs)
___________________
5. Do you think we need all three current Independent variables in our Regression model to
predict changes in Price (Circle)? Yes No
Explain: _________________________________________________________________________
_______________________________________________________________________________
_______________________________________________________________________________
6. Which variable(s) would you remove (Circle)?
Lot Size
Garage BRs
7. Of the following variables in the spreadsheet, which variable would you select next to add to the
model (i.e., you think it would create a stronger prediction of Price)?
Type Location U/S/R Sq. Ft. Baths Pool Age
8. Run a SAS Regression Model on the Real Estate – Base database using Price as the Dependent
Variable (Y) and include the original Independent Variables (minus any you removed in step 6)
and adding the variable you chose in step 7. Print your model output and turn it in with the
assignment. (NOTE: You may have to repeat this exercise until you find a combination of
variables that gives you a higher R
2
).
9. Provide the following from the SAS Model:
a.
Coefficient of Determination (R-squared).
________________________
b. Y-Intercept for the Regression Model
________________________
c.
Slope value for each of your Independent Variables.
i. Var_______________________ ________________________
ii. Var_______________________ ________________________
iii. Var_______________________ ________________________
iv. Var_______________________ ________________________
v. Var_______________________ ________________________
10. Did your SAS model provide a stronger Coefficient of Determination (Circle)? Yes No
Critical Thinking Question:
11. A large real estate company is trying to use similar data plus their own sales data to forecast
total sales for the coming year for each of their agents and they have pulled data from their
Finance records. They are trying to assemble the best data to build a Regression model.
a.
Would it make sense to use the same data as we used above in the SAS model? Why or
why not?
__________________________________________________________________________________
__________________________________________________________________________________
b. Recommend two data elements you think they probably have available to help them
predict sales for each of their sales people.
1. ______________________________________________
2. ______________________________________________