Some of my selected open-source projects and code repo are listed here. Clicking on the heading will straight take you to the respective Github repo. All of them have permissive licenses like MIT or BSD-2. Please feel free to fork and leave a star if you like it!
Practice and tutorial-style notebooks covering wide variety of machine learning techniques.
Collection of a variety of Deep Learning (DL) code examples, tutorial-style Jupyter notebooks, and projects. Quite a few of the Jupyter notebooks are built on Google Colab and may employ special functions exclusive to Google Colab (for example uploading data or pulling data directly from a remote repo using standard Linux commands).
This is a lightweight Python library for generating random database tables. Useful for beginners in data science when they want to create SQL database tables with synthetic data for practicing machine learning and data extraction algorithms. It can generate Pandas DataFrame, MySQL and SQLite tables, and Excel files with random but contextual data such as name, address, city, zip code, telephone number, birthday, license plate, organization, job title, etc.
pip install pydbgen
This is A lightweight, easy-to-use Python package that combines the scikit-learn-like simple API with the power of statistical inference tests, visual residual analysis, outlier visualization, multicollinearity test, found in packages like statsmodels and R language.
pip install mlr
This is a simple and intuitive API written in Python to interface with the famous UC Irvine Machine Learning repository. It can help a user easily search and download relevant datasets or selectively choose a dataset based on its size or machine learning task category (regression or classification or clustering etc.).
Design of Experiment (DOE) is an important activity for any scientist, engineer, or statistician planning to conduct experimental analysis, especially in this age of rapidly expanding field of data science and associated statistical modeling and machine learning. This set of codes is a collection of functions which wrap around the core packages (pyDOE and DiversiPy) and generate DOE matrices from an arbitrary range of input variables and save on the local disk as CSV or Excel file. It covers factorial designs, response-surface methods (RSM), and Latin Hypercube sampling.
This is just the formal release of the above mentioned design-of-experiment project on the PyPi repository for easy install.
pip install doepy
Various methods for generating synthetic data for data science and ML.
General statistics, mathematical programming, and numerical/sceintific computing scripts and notebooks in Python.
Notebooks on Apache Spark fundamentals (using PySpark) - RDD and Dataframe, and machine learning with Spark (MLib).