Statistical & DATA Analysis

​1. R programs

The programs below are all available under the same github repository "R_projects".

  • ​Descriptive statistics  and basic associations between variables .

  • ANOVA test 

  • Chi-square test of independence

  • Pearson Correlation

  • Linear & Multiple regression

  • Machine learning with Decision Trees . 

  • Random forest algorithm.


2. SAS programs 

The programs below are all available under the same github repository "Statistics_with_SAS".

  • ​​ANOVA

  • Chi-square test of independence

  • Pearson Correlation

  • Moderator variables - with chisquare test.

Interactive map charts using R and leaflet.

API Programming:


1. Twitter API:

  • R Programs to get list of Twitter Follower ids and their profiles (location, screen name, follower count, etc.) 

  •  Python program to track changes in Twitter follower count .

Text analytics:

 news Headline analysis: *** new ***

  • text pre-processing, 

  • sorts and aggregates by publisher names

  • creates word clouds and word association plots

Marketing Analytics

1. R programs

  • mktg_fns.R => Compute RFM (recency, frequency and monetary value) and basic Customer Segmentation 

  • ​cust_revenue.R => program to calculate revenue/customer & revenue/segment , Customer Scoring and revenue prediction.

Pattern analysis and "cyber-security strength" analysis for password list dataset. *** NEW ***


4. US education Scorecard 

  • Github link here. 

  • ​Interactive US state map showing college names, admission rate, average faculty salary, etc. 



  • Kaggle score = 2.60, with multinomial regression algorithm.
  • Heatmap  to view worst affected regions. ​
  • Programs to test relationship using chisquare tests and visualizations using Correlograms and ggplot.
  • Github link here.


  • Official Score = 0.789 (~79% accurate predictions)
  • Competition link here.
  • Github code link here.  Please review the file for program description and submission files.​​
  • Predictions made using the following algorithms: Naive Bayes Algorithm, Neural net model, Random Forest, Decision tree algorithm. 

​1. Airbnb New User Bookings

  • Official Score = 0.832 (~83% accurate predictions)
  • Competition link here.
  • Github code link here.  Please review the file for program description and submission files.​​

Journey of Analytics

  • R program- Create Scatter plots, histograms, barplots, boxplots, piecharts and densityplots.​

  • Heatmaps & Correlograms - see Kaggle SFO crime project below.​

  • Advanced graphics - Bubble charts, 3D surface plots and mathematical function graphs.