* =================================================================== * Updated with clear, professional variable names * Lines: 72 * =================================================================== clear all set more off capture log close * 1. Load built-in auto dataset sysuse auto, clear rename price car_price rename mpg miles_per_gallon rename weight car_weight_lbs rename length car_length_in rename rep78 repair_record_78 rename foreign origin // 0=Domestic, 1=Foreign * 3. Data cleaning drop if missing(car_price) | missing(repair_record_78) * Label origin properly label values origin origin_lbl * 4. Generate transformed and categorical variables gen ln_price = ln(car_price) gen ln_mpg = ln(miles_per_gallon) recode repair_record_78 (1/2 = 1 "Poor") (3 = 2 "Average") (4/5 = 3 "Good"), /// gen(repair_cat) label var repair_cat "Repair record 1978 (categorical)" gen weight_kg = car_weight_lbs * 0.453592 label var weight_kg "Weight (kilograms)" * 5. Summary statistics with nice table eststo clear estpost summarize car_price miles_per_gallon car_weight_lbs /// car_length_in origin ln_price weight_kg, detail esttab using "summary_statistics.rtf", replace /// cells("count mean(fmt(2)) sd(fmt(2)) min max") /// label title("Descriptive Statistics") nonumber * 6. Regression models eststo: regress car_price miles_per_gallon car_weight_lbs i.origin i.repair_cat eststo: regress ln_price ln_mpg car_weight_lbs i.origin i.repair_cat, robust * 7. Export publication-ready regression table esttab using "regression_results.rtf", replace /// b(4) se(4) /// star(* 0.10 ** 0.05 *** 0.01) /// scalar(N r2_a F) sfmt(3) /// label /// mtitles("Level price" "Log-log model (robust SE)") /// title("OLS Regression Results") /// note("Model 2 uses robust standard errors") * 8. Basic diagnostic estat vif // Check multicollinearity estat hettest // Breusch-Pagan test * 9. Save cleaned dataset save "auto_cleaned_renamed.dta", replace * 10. Optional: export summary stats to Excel export excel using "descriptive_stats.xlsx", sheet("Summary") firstrow(varlabels) replace log close display as text "Analysis completed – check the created files in your working directory!"
Standard input is empty
* ===================================================================
* Sample Stata do-file: Data cleaning, summary statistics & regression
* Updated with clear, professional variable names
* Lines: 72
* ===================================================================
clear all
set more off
capture log close
log using "analysis_log.smcl", replace
* 1. Load built-in auto dataset
sysuse auto, clear
* 2. Rename variables to meaningful, lowercase names
rename price car_price
rename mpg miles_per_gallon
rename weight car_weight_lbs
rename length car_length_in
rename rep78 repair_record_78
rename foreign origin // 0=Domestic, 1=Foreign
* 3. Data cleaning
drop if missing(car_price) | missing(repair_record_78)
* Label origin properly
label define origin_lbl 0 "Domestic" 1 "Foreign"
label values origin origin_lbl
* 4. Generate transformed and categorical variables
gen ln_price = ln(car_price)
gen ln_mpg = ln(miles_per_gallon)
recode repair_record_78 (1/2 = 1 "Poor") (3 = 2 "Average") (4/5 = 3 "Good"), ///
gen(repair_cat)
label var repair_cat "Repair record 1978 (categorical)"
gen weight_kg = car_weight_lbs * 0.453592
label var weight_kg "Weight (kilograms)"
* 5. Summary statistics with nice table
eststo clear
estpost summarize car_price miles_per_gallon car_weight_lbs ///
car_length_in origin ln_price weight_kg, detail
esttab using "summary_statistics.rtf", replace ///
cells("count mean(fmt(2)) sd(fmt(2)) min max") ///
label title("Descriptive Statistics") nonumber
* 6. Regression models
eststo: regress car_price miles_per_gallon car_weight_lbs i.origin i.repair_cat
eststo: regress ln_price ln_mpg car_weight_lbs i.origin i.repair_cat, robust
* 7. Export publication-ready regression table
esttab using "regression_results.rtf", replace ///
b(4) se(4) ///
star(* 0.10 ** 0.05 *** 0.01) ///
scalar(N r2_a F) sfmt(3) ///
label ///
mtitles("Level price" "Log-log model (robust SE)") ///
title("OLS Regression Results") ///
note("Model 2 uses robust standard errors")
* 8. Basic diagnostic
estat vif // Check multicollinearity
estat hettest // Breusch-Pagan test
* 9. Save cleaned dataset
save "auto_cleaned_renamed.dta", replace
* 10. Optional: export summary stats to Excel
export excel using "descriptive_stats.xlsx", sheet("Summary") firstrow(varlabels) replace
log close
display as text "Analysis completed – check the created files in your working directory!"
* End of do-file