These tests help us determine if the patterns we observe in our data are real or just happened by chance. Think of statistical tests as tools that allow researchers to sense their data scientifically. At the core, a statistical test examines the relationship between different variables, such as if two groups differ in their average scores on a test. Deciding when to conduct a statistical test depends on various factors, such as the nature of the data, the research question, and the study’s design.

*Check out the* *part 1 of the article*

## ARIMA models

To forecast future values in a time series data, start by collecting past data points of what you want to predict, such as previous stock prices. Then, plot this data on a graph to identify trends or patterns over time.The decision of when to conduct a statistical test is determined by the nature of the data, the research question, and the study’s design.

Next, choose a forecasting model. Common and simple models include Moving Average, which takes the average of recent data points, Linear Regression, which fits a straight line through your data, and Exponential Smoothing, which gives more importance to recent data.

After selecting your model, train it using your collected data. This step adjusts the model to align well with your observed data. Once the model is trained, you can use it to make predictions about future values.

Finally, evaluate your predictions by comparing them with the actual values once they become available.

## Exponential smoothing (ES)

Use this when: You need to predict future numbers in a series of data using simple exponential smoothing.

Example: You want to estimate future sales based on past patterns.

## Seasonal decomposition

Use when: You want to break down time series data into three parts: trend, seasonality, and residuals. This helps you see patterns more clearly.

Example: You want to look at website traffic data. By breaking it down, you can find regular patterns (like more visitors on weekends) and spot unusual changes (like a sudden spike in traffic).

## Kaplan-Meier estimator

Use when: You want to estimate how long people in a group will live.

Example: You want to study how long patients with a certain disease are likely to survive.

## Cox proportional hazards model

A regression model often used in medical studies to explore how the survival time of patients relates to one or more factors.

## Log-rank test

Use when: You want to compare how long people live in different groups.

Example: You want to see how long patients live with different treatments.

## K-means clustering

It is used when one wants to divide data into a set number (k) of groups, making sure that the points in each group are alike and different from points in other groups.

## Hierarchical clustering

Hierarchical clustering is a way to put similar things into groups. It makes groups so that items in the same group are alike, and items in different groups are not alike. The groups are shown in a tree diagram called a dendrogram. Use when: You want to group similar observations into clusters based on features, with a hierarchical structure.

Use this when you want to group similar observations into clusters based on features, with a hierarchical structure.

## DBSCAN (density-based spatial clustering of applications with noise)

Use when you want to group similar observations into clusters based on features, with noise handling.

Example: You want to analyze spatial data to identify clusters of high density.

The academic Hive webpage offers top-notch tools for student and researcher success. We also offer Consultancy Services; book a session today.