Introduction#
What it is#
Geochemistry π is a Python framework for data-driven geochemistry discovery. It provides an extendable tool and one-stop shop for geochemical data analysis on tabular data. The goal of the Geochemistry π is to create a series of user-friendly and extensible products of high automation for the full cycle of geochemistry research.
Key features are:
Easy to use: The automation of data mining process provides the users with simple number options to choose.
Extensible: It allows appending new algorithms through Scikit-learn with AutoML function by FLAML and Ray.
First Phase#
It works as a software application with a command-line interface (CLI) to automate data mining process with frequently-used machine learning algorithms and statistical analysis methods, which would further lower the threshold for the geochemists.
The highlight is that through choosing simple number options, the users are able to implement a completed cycle of data mining without knowledge of SciPy, NumPy, Pandas, Scikit-learn, FLAML, Ray packages.
The following figure is the activity diagram of automated ML pipeline in Geochemistry π:
Its data section provides feature engineering based on arithmatic operation. It allows the users to have a statistic analysis on the data set as well as on the imputation result, which is supported by the combination of Monte Carlo simulation and hypothesis testing.
Its models section provides both supervised learning and unsupervised learning methods from Scikit-learn framework, including four types of algorithms, regression, classification, clustering, and dimensional reduction. Integrated with FLAML and Ray framework, it allows the users to run AutoML easily, fastly and cost-effectively on the built-in supervised learning algorithms in our framework.
Second Phase#
Currently, we are building three access ways to provide more user-friendly service, including web portal, CLI package and API. It allows the user to perform continuous training of the model by automating the ML pipeline in different layers.
The following figure is the system architecture diagram of Geochemistry π:
The whole package is under construction and the documentation is progressively evolving.
In-house Materials#
Materials are in both Chinese and English. Others unshown below are internal materials.
Guideline Manual – Geochemistry π (International - Google drive)
Learning Steps for Newbies – Geochemistry π (International - Google drive)
Learning Steps for Newbies - Geochemistry π (China - Tencent Docs)
Code Specification v2.1.2 - Geochemistry π (International - Google drive)
Code Specification v2.1.2 - Geochemistry π (China - Tencent Docs)
Cycle Report - Geochemistry π (International - Google drive)
In-house Videos#
Technical record videos are on Bilibili and Youtube synchronously while other meeting videos are internal materials. More Videos will be recorded soon.
ZJU_Earth_Data Introduction (Geochemical Data, Python, Geochemistry π) - Prof. Zhang
How to Collaborate and Provide Bug Report on Geochemistry π Through GitHub - Can He (Sany)
How to Create and Use Virtual Environment on Geochemistry π - Can He (Sany)
How to use Github-Desktop in conflict resolution - Qiuhao Zhao (Brad)
Virtual Environment & Packages On Windows - Jianming Zhao (Jamie)
Git Workflow & Coordinating Synchronization - Jianming Zhao (Jamie)