fork download
  1. * ===================================================================
  2. * Sample Stata do-file: Data cleaning, summary statistics & regression
  3. * Updated with clear, professional variable names
  4. * Lines: 72
  5. * ===================================================================
  6.  
  7. clear all
  8. set more off
  9. capture log close
  10.  
  11. log using "analysis_log.smcl", replace
  12.  
  13. * 1. Load built-in auto dataset
  14. sysuse auto, clear
  15.  
  16. * 2. Rename variables to meaningful, lowercase names
  17. rename price car_price
  18. rename mpg miles_per_gallon
  19. rename weight car_weight_lbs
  20. rename length car_length_in
  21. rename rep78 repair_record_78
  22. rename foreign origin // 0=Domestic, 1=Foreign
  23.  
  24. * 3. Data cleaning
  25. drop if missing(car_price) | missing(repair_record_78)
  26.  
  27. * Label origin properly
  28. label define origin_lbl 0 "Domestic" 1 "Foreign"
  29. label values origin origin_lbl
  30.  
  31. * 4. Generate transformed and categorical variables
  32. gen ln_price = ln(car_price)
  33. gen ln_mpg = ln(miles_per_gallon)
  34.  
  35. recode repair_record_78 (1/2 = 1 "Poor") (3 = 2 "Average") (4/5 = 3 "Good"), ///
  36. gen(repair_cat)
  37. label var repair_cat "Repair record 1978 (categorical)"
  38.  
  39. gen weight_kg = car_weight_lbs * 0.453592
  40. label var weight_kg "Weight (kilograms)"
  41.  
  42. * 5. Summary statistics with nice table
  43. eststo clear
  44. estpost summarize car_price miles_per_gallon car_weight_lbs ///
  45. car_length_in origin ln_price weight_kg, detail
  46.  
  47. esttab using "summary_statistics.rtf", replace ///
  48. cells("count mean(fmt(2)) sd(fmt(2)) min max") ///
  49. label title("Descriptive Statistics") nonumber
  50.  
  51. * 6. Regression models
  52. eststo: regress car_price miles_per_gallon car_weight_lbs i.origin i.repair_cat
  53.  
  54. eststo: regress ln_price ln_mpg car_weight_lbs i.origin i.repair_cat, robust
  55.  
  56. * 7. Export publication-ready regression table
  57. esttab using "regression_results.rtf", replace ///
  58. b(4) se(4) ///
  59. star(* 0.10 ** 0.05 *** 0.01) ///
  60. scalar(N r2_a F) sfmt(3) ///
  61. label ///
  62. mtitles("Level price" "Log-log model (robust SE)") ///
  63. title("OLS Regression Results") ///
  64. note("Model 2 uses robust standard errors")
  65.  
  66. * 8. Basic diagnostic
  67. estat vif // Check multicollinearity
  68. estat hettest // Breusch-Pagan test
  69.  
  70. * 9. Save cleaned dataset
  71. save "auto_cleaned_renamed.dta", replace
  72.  
  73. * 10. Optional: export summary stats to Excel
  74. export excel using "descriptive_stats.xlsx", sheet("Summary") firstrow(varlabels) replace
  75.  
  76. log close
  77. display as text "Analysis completed – check the created files in your working directory!"
  78. * End of do-file
Success #stdin #stdout 0.03s 25360KB
stdin
Standard input is empty
stdout
* ===================================================================
* Sample Stata do-file: Data cleaning, summary statistics & regression
* Updated with clear, professional variable names
* Lines: 72
* ===================================================================

clear all
set more off
capture log close

log using "analysis_log.smcl", replace

* 1. Load built-in auto dataset
sysuse auto, clear

* 2. Rename variables to meaningful, lowercase names
rename price      car_price
rename mpg        miles_per_gallon
rename weight     car_weight_lbs
rename length     car_length_in
rename rep78      repair_record_78
rename foreign    origin            // 0=Domestic, 1=Foreign

* 3. Data cleaning
drop if missing(car_price) | missing(repair_record_78)

* Label origin properly
label define origin_lbl 0 "Domestic" 1 "Foreign"
label values origin origin_lbl

* 4. Generate transformed and categorical variables
gen ln_price   = ln(car_price)
gen ln_mpg     = ln(miles_per_gallon)

recode repair_record_78 (1/2 = 1 "Poor") (3 = 2 "Average") (4/5 = 3 "Good"), ///
    gen(repair_cat)
label var repair_cat "Repair record 1978 (categorical)"

gen weight_kg = car_weight_lbs * 0.453592
label var weight_kg "Weight (kilograms)"

* 5. Summary statistics with nice table
eststo clear
estpost summarize car_price miles_per_gallon car_weight_lbs ///
                  car_length_in origin ln_price weight_kg, detail

esttab using "summary_statistics.rtf", replace ///
    cells("count mean(fmt(2)) sd(fmt(2)) min max") ///
    label title("Descriptive Statistics") nonumber

* 6. Regression models
eststo: regress car_price miles_per_gallon car_weight_lbs i.origin i.repair_cat

eststo: regress ln_price ln_mpg car_weight_lbs i.origin i.repair_cat, robust

* 7. Export publication-ready regression table
esttab using "regression_results.rtf", replace ///
    b(4) se(4) ///
    star(* 0.10 ** 0.05 *** 0.01) ///
    scalar(N r2_a F) sfmt(3) ///
    label ///
    mtitles("Level price" "Log-log model (robust SE)") ///
    title("OLS Regression Results") ///
    note("Model 2 uses robust standard errors")

* 8. Basic diagnostic
estat vif       // Check multicollinearity
estat hettest  // Breusch-Pagan test

* 9. Save cleaned dataset
save "auto_cleaned_renamed.dta", replace

* 10. Optional: export summary stats to Excel
export excel using "descriptive_stats.xlsx", sheet("Summary") firstrow(varlabels) replace

log close
display as text "Analysis completed – check the created files in your working directory!"
* End of do-file