Some of my selected open-source projects and code repo are listed here. Clicking on the heading will straight take you to the respective Github repo. All of them have permissive licenses like MIT or BSD-2. Please feel free to fork and leave a star if you like it!

Practice and tutorial-style notebooks covering wide variety of machine learning techniques.

- Regression
- Classificiation
- Clustering
- Synthetic data generation for machine learning
- Learning and complexity curve generation
- How to mix object-oriented programming into machine learning
- Deployment of ML models using microservice web framework

Collection of a variety of Deep Learning (DL) code examples, tutorial-style Jupyter notebooks, and projects. Quite a few of the Jupyter notebooks are built on Google Colab and may employ special functions exclusive to Google Colab (for example uploading data or pulling data directly from a remote repo using standard Linux commands).

- Deep learning vs. linear model
- Demo of a general-purpose regression module
- Simple Conv Net
- Using Keras `ImageDataGenerator` and other utilities
- Transfer learning
- Activation maps
- Keras Callbacks using ResNet
- Simple RNN
- Text generation using LSTM
- Bi-directional LSTM for sentiment classification
- Generative adversarial network (GAN) using simple 1-D algebraic function
- Scikit-learn wrapper for Keras

This is a lightweight Python library for generating random database tables. Useful for beginners in data science when they want to create SQL database tables with synthetic data for practicing machine learning and data extraction algorithms. It can generate Pandas DataFrame, MySQL and SQLite tables, and Excel files with random but contextual data such as name, address, city, zip code, telephone number, birthday, license plate, organization, job title, etc.

` pip install pydbgen `

This is A lightweight, easy-to-use Python package that combines the scikit-learn-like simple API with the power of statistical inference tests, visual residual analysis, outlier visualization, multicollinearity test, found in packages like statsmodels and R language.

` pip install mlr `

This is a simple and intuitive API written in Python to interface with the famous UC Irvine Machine Learning repository. It can help a user easily search and download relevant datasets or selectively choose a dataset based on its size or machine learning task category (regression or classification or clustering etc.).

Design of Experiment (DOE) is an important activity for any scientist, engineer, or statistician planning to conduct experimental analysis, especially in this age of rapidly expanding field of data science and associated statistical modeling and machine learning. This set of codes is a collection of functions which wrap around the core packages (pyDOE and DiversiPy) and generate DOE matrices from an arbitrary range of input variables and save on the local disk as CSV or Excel file. It covers *factorial designs*, *response-surface methods (RSM)*, and *Latin Hypercube sampling*.

This is just the formal release of the above mentioned design-of-experiment project on the PyPi repository for easy install.

` pip install doepy `

Various methods for generating synthetic data for data science and ML.

- Scikit-learn data generation (regression/classification/clustering) methods
- Random regression and classification problem generation from symbolic expressions
- Synthesizing time series
- Generating Gaussian mixture model data

General statistics, mathematical programming, and numerical/sceintific computing scripts and notebooks in Python.

- Set algebra basics
- Permutations and combinations
- Discrete probability distributions
- How to do linear regression in 8 ways
- R-style statistical functions using Python
- Statistical diagnostics on a linear regression model
- Recognizing nature of a statistical distribution from its histogram using deep learning

Notebooks on Apache Spark fundamentals (using PySpark) - RDD and Dataframe, and machine learning with Spark (MLib).