Selected projects and code repo

Some of my selected open-source projects and code repo are listed here. Clicking on the heading will straight take you to the respective Github repo. All of them have permissive licenses like MIT or BSD-2. Please feel free to fork and leave a star if you like it!


Practice and tutorial-style notebooks covering wide variety of machine learning techniques.

Here is the detailed documentation.


Collection of a variety of Deep Learning (DL) code examples, tutorial-style Jupyter notebooks, and projects. Quite a few of the Jupyter notebooks are built on Google Colab and may employ special functions exclusive to Google Colab (for example uploading data or pulling data directly from a remote repo using standard Linux commands).

Here is the detailed documentation.


PyPI - Status PyPI

This is a lightweight Python library for generating random database tables. Useful for beginners in data science when they want to create SQL database tables with synthetic data for practicing machine learning and data extraction algorithms. It can generate Pandas DataFrame, MySQL and SQLite tables, and Excel files with random but contextual data such as name, address, city, zip code, telephone number, birthday, license plate, organization, job title, etc.

pip install pydbgen

Read the docs here.

'MLR' - Linear Regression Library with Statistical Modeling

PyPI - Status PyPI

This is A lightweight, easy-to-use Python package that combines the scikit-learn-like simple API with the power of statistical inference tests, visual residual analysis, outlier visualization, multicollinearity test, found in packages like statsmodels and R language.

pip install mlr

Read the docs here.


This is a simple and intuitive API written in Python to interface with the famous UC Irvine Machine Learning repository. It can help a user easily search and download relevant datasets or selectively choose a dataset based on its size or machine learning task category (regression or classification or clustering etc.).

Here is the detailed documentation.


Design of Experiment (DOE) is an important activity for any scientist, engineer, or statistician planning to conduct experimental analysis, especially in this age of rapidly expanding field of data science and associated statistical modeling and machine learning. This set of codes is a collection of functions which wrap around the core packages (pyDOE and DiversiPy) and generate DOE matrices from an arbitrary range of input variables and save on the local disk as CSV or Excel file. It covers factorial designs, response-surface methods (RSM), and Latin Hypercube sampling.

Read the detailed documentation here.


Read the Docs (version) PyPI - Status PyPI

This is just the formal release of the above mentioned design-of-experiment project on the PyPi repository for easy install.

pip install doepy

Read the docs here.

Synthetic data generation for machine learning

Various methods for generating synthetic data for data science and ML.

Statistics and mathematical computing with Python

General statistics, mathematical programming, and numerical/sceintific computing scripts and notebooks in Python.

Read the description here.

Apache Spark with Python

Notebooks on Apache Spark fundamentals (using PySpark) - RDD and Dataframe, and machine learning with Spark (MLib).

Read the description here.