Reading-Notes

Project maintained by eslamakram Hosted on GitHub Pages — Theme by mattgraham

What is Linear Regression?

Linear regression is a basic and commonly used type of predictive analysis. The overall idea of regression is to examine two things: (1) does a set of predictor variables do a good job in predicting an outcome (dependent) variable? (2) Which variables in particular are significant predictors of the outcome variable, and in what way do they–indicated by the magnitude and sign of the beta estimates–impact the outcome variable?

These regression estimates are used to explain the relationship between one dependent variable and one or more independent variables. The simplest form of the regression equation with one dependent and one independent variable is defined by the formula y = c + b*x, where y = estimated dependent variable score, c = constant, b = regression coefficient, and x = score on the independent variable. In other words, you need to find a function that maps some features or variables to others sufficiently well. The dependent features are called the dependent variables, outputs, or responses. The independent features are called the independent variables, inputs, or predictors.

MLM

Implementing Linear Regression in Python

The package scikit-learn is a widely used Python library for machine learning, built on top of NumPy and some other packages. It provides the means for preprocessing data, reducing dimensionality, implementing regression, classification, clustering, and more. Like NumPy, scikit-learn is also open source.

There are five basic steps when you’re implementing linear regression:

Import the packages and classes you need. Provide data to work with and eventually do appropriate transformations. Create a regression model and fit it with existing data. Check the results of model fitting to know whether the model is satisfactory. Apply the model for predictions.

Step 1: Import packages and classes

The first step is to import the package numpy and the class LinearRegression from sklearn.linear_model: import numpy as np from sklearn.linear_model import LinearRegression

Step 2: Provide data

The second step is defining data to work with. The inputs (regressors, 𝑥) and output (predictor, 𝑦) should be arrays (the instances of the class numpy.ndarray) or similar objects. This is the simplest way of providing data for regression:

    x = np.array([5, 15, 25, 35, 45, 55]).reshape((-1, 1))
    y = np.array([5, 20, 14, 32, 22, 38])

Step 3: Create a model and fit it

The next step is to create a linear regression model and fit it using the existing data. Let’s create an instance of the class LinearRegression, which will represent the regression model: model = LinearRegression()

Step 4: Get results

Once you have your model fitted, you can get the results to check whether the model works satisfactorily and interpret it. You can obtain the coefficient of determination (𝑅²) with .score() called on model:

        r_sq = model.score(x, y)
        print('coefficient of determination:', r_sq)
        coefficient of determination: 0.71587561374795

Step 5: Predict response

Once there is a satisfactory model, you can use it for predictions with either existing or new data. To obtain the predicted response, use .predict():

        y_pred = model.predict(x)
        print('predicted response:', y_pred, sep='\n')