Photo by Martha Dominguez de Gouveia on Unsplash

People normally give high respect to doctors, who have high social status, earn high salaries and play important roles in the healthcare system. Compared with the “elite” doctors, the importance of another indispensable group in the healthcare system — nurses — is often easily ignored. No matter whether we admit or not, nurses normally spend more time accompanying with patients than doctors. They observe more, communicate more and know more details of patients’ conditions than doctors, and thus may become more important roles than doctors in medical treatment sometimes. …

Photo by Joshua Sortino on Unsplash

Ever since 2005 when “big data” was first called by Roger Mougalas, this term has been popular among the public. This passage attempts to draw a big picture of the main “circles” related to big data, in terms of its necessary techniques and application. Hopefully, it can help those who intend to become an expert in this area.

Photo by Morning Brew on Unsplash

COVID-19 has affected everyone in the world, and unfortunately it is unlikely to end soon. Many research groups worldwide are making efforts to control the trend of COVID-19 and provide forecast data to facilitate policy making. The Centers for Disease Control and Prevention in the United States (CDC) lists 24 groups who have been actively contributing to predict the spreading of COVID-19. These models provide the forecast on the number of total infected, total death, the duration of COVID-19, and the effect of policy intervention. Based on the description of predictive models from the CDC website, these predictive models can…

This essay continues to summarise the python scripts of common graphs in data exploration. The sample data used here are Boston House Data (sklearn.datasets.load_boston())and Singapore PSI data. We will cover the following groups of graph:

5. combo graphs

6. specific graphs

All the scripts are available:

Combo Graphs

  1. Line + Line

Figure: What is the relationship of house price and percentage of low status people wrt. crime rate

This essay is going to summarise the python scripts of common graphs in data exploration. The sample data used here are Boston House Data (sklearn.datasets.load_boston())and Singapore PSI data. We will cover the following groups of graph:

  1. line graphs
  2. bar graphs
  3. dot graphs
  4. area graphs

All the scripts are available:

Line Graphs

We are going to provide scripts for 3 types of line graphs: standard graph, time trend, density curve. Before going to detailed graphs, let’s get familiar with the following options.


Photo by Maxi am Brunnen on Unsplash

In Part 1, we have discussed about the basic algorithm of Gradient Tree Boosting.

Let’s start Part 2 today. We are going to focus on the competing algorithms in Gradient Tree Boosting: XGBoost, CatBoost and LightGBM. By reading the passage below, you will know the answers to the following questions for these 3 algorithms:

1. How does it handle missing values in splitting the nodes?

2. How does it handle categorical features in splitting the nodes?

3. How efficient it is to split nodes?


XGBoost was developed in 2014 by Tianqi Chen, who was then a PhD student in…

Photo by Richard Gatley on Unsplash

XGBoost, LightGBM and CatBoost are among the most common algorithms to use in competitions. To understand their differences , we will split this topic into three parts: Part 1 talks about the mathematics of Gradient Tree Boosting, Part 2 compares the differences among XGBoost, LightGBM and CatBoost, and Part 3 discusses the codes and exercises of each algorithm.

Let’s start with Part 1 today.

XGBoost, LightGBM and CatBoost are the competing algorithms in Gradient Tree Boosting. To better understand their differences in model training, we need to know how Gradient Tree Boosting works first. It is assumed that you have…

Photo by Thomas Q on Unsplash

When continuing the python work on a new computer, it usually requires an appropriate initial setup to ensure a smooth transition, e.g. package installation, virtual environment management, and GPU setup. I will discuss the most common tasks in the initial setup below, and hopefully, it can help a smooth transition for your work. All the tasks are done in the command window in Windows 10, python 3.7.

  1. Install Python Packages (Offline)
  2. Create Virtual Environment
  3. Edit Default Setting of Jupyter Notebook
  4. Install Jupyter Notebook Extension (Offline)
  5. Set Up GPU for Tensorflow

Install Python Package (Offline)

To install the right version of python packages is critical…

Photo by Lukas Juhas on Unsplash

In classification problems, AUC (Area Under the Curve) is one of the most important evaluation metrics to measure the model’s performance. However, do you know which “curve” need to use in AUC? The default choice is typically AUC-ROC (Receiver Operating Characteristics). However, there is another common choice: AUC-PRC (Precision Recall Curve). In this article, we will learn their differences and their application scenarios.

What is confusion matrix?

It is better to introduce some notations using a confusion matrix. A confusion matrix is a 2-by-2 table to show all the combinations of actual data and predicted results: {True Positive (TP), False…

Beverly Wang

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store