Statistical & DATA Analysis

​1. R programs

The programs below are all available under the same github repository "R_projects".

  • ​Descriptive statistics  and basic associations between variables .

  • ANOVA test 

  • Chi-square test of independence

  • Pearson Correlation

  • Linear & Multiple regression

  • Machine learning with Decision Trees . 

  • Random forest algorithm.


2. SAS programs 

The programs below are all available under the same github repository "Statistics_with_SAS".

  • ​​ANOVA

  • Chi-square test of independence

  • Pearson Correlation

  • Moderator variables - with chisquare test.

Interactive map charts using R and leaflet.

API Programming:


1. Twitter API:

  • R Programs to get list of Twitter Follower ids and their profiles (location, screen name, follower count, etc.) 

  •  Python program to track changes in Twitter follower count .

Text analytics:

 news Headline analysis: *** new ***

  • text pre-processing, 

  • sorts and aggregates by publisher names

  • creates word clouds and word association plots

Marketing Analytics

1. R programs

  • mktg_fns.R => Compute RFM (recency, frequency and monetary value) and basic Customer Segmentation 

  • ​cust_revenue.R => program to calculate revenue/customer & revenue/segment , Customer Scoring and revenue prediction.

New projects added during 1st week of  every month. To stay updated, please subscribe to my blog page


sign up here for email notifications.

Pattern analysis and "cyber-security strength" analysis for password list dataset. *** NEW ***


4. US education Scorecard 

  • Github link here. 

  • ​Interactive US state map showing college names, admission rate, average faculty salary, etc. 



  • Kaggle score = 2.60, with multinomial regression algorithm.
  • Heatmap  to view worst affected regions. ​
  • Programs to test relationship using chisquare tests and visualizations using Correlograms and ggplot.
  • Github link here.


  • Official Score = 0.789 (~79% accurate predictions)
  • Competition link here.
  • Github code link here.  Please review the file for program description and submission files.​​
  • Predictions made using the following algorithms: Naive Bayes Algorithm, Neural net model, Random Forest, Decision tree algorithm. 

​1. Airbnb New User Bookings

  • Official Score = 0.832 (~83% accurate predictions)
  • Competition link here.
  • Github code link here.  Please review the file for program description and submission files.​​

Journey of Analytics

  • R program- Create Scatter plots, histograms, barplots, boxplots, piecharts and densityplots.​

  • Heatmaps & Correlograms - see Kaggle SFO crime project below.​

  • Advanced graphics - Bubble charts, 3D surface plots and mathematical function graphs.