Bird’s Eye View of the machine learning workflow:

Our machine learning blueprint is designed around those 3 elements.
There are 5 core steps:
Should you clean your data more? Engineer features? Test new algorithms? Etc.
There’s a lot of trial and error, so how do you avoid chasing dead ends? The answer is “Exploratory Analysis.” (Which is just fancy-talk for “getting to know” your data.)
Better data beats fancier algorithms…
Choose the best, most appropriate algorithms without wasting your time.
Finally, train your models. This step is pretty formulaic once you’ve done the first 4.
At last, it’s time to build our models!
It might seem like it took us a while to get here, but professional data scientists actually spend the bulk of their time on the steps leading up to this one:
Split Dataset Let’s start with a crucial but sometimes overlooked step: Spending your data. Think of your data as a limited resource.
You can spend some of it to train your model (i.e. feed it to the algorithm). You can spend some of it to evaluate (test) your model. But you can’t reuse the same data for both!

Model parameters
Model parameters are learned attributes that define individual models.
e.g. regression coefficients e.g. decision tree split locations They can be learned directly from the training data Hyperparameters
Hyperparameters express “higher-level” structural settings for algorithms.
e.g. strength of the penalty used in regularized regression e.g. the number of trees to include in a random forest They are decided before fitting the model because they can’t be learned from the data
Fit and Tune Models Now that we’ve split our dataset into training and test sets, and we’ve learned about hyperparameters and cross-validation, we’re ready fit and tune our models.
Basically, all we need to do is perform the entire cross-validation loop detailed above on each set of hyperparameter values we’d like to try.

