Machine Learning: Epoch vs Batch

When first diving into machine learning, you often hear about the epoch, batch size, and iteration.

You know they’re a counter of something when training your model, but what is the difference between them? Knowing this is actually crucial for setting up your algorithm.

I am going to assume you’ve read my other post, “Machine Learning In 3 Minutes” or that you already know about gradient descent. Just to qucikly recap, gradient descent is an iterative alogirth where we try to find the model that produces the lowest loss (or error) between the predicted outcome and the actual outcome.

So, how does epoch, batch size, and iteration come into play?

Well to train our model, we have data. And in this day and age… we can have ridiculous amounts of data. Say for example, that you have 1 million rows of patient data, each row containing basic information about the patient. Can you process 1 million records at once with your trusty Mac? Oh, you have a state of the art gaming PC? Okay – what if the dataset is 10 billion?

There are few reasons why we divide up data into batches, but this is probably the biggest reason. Thus, if we had 1 million datasets, and we picked a batch size of 10,000, we would have 100 batches.

Total dataset count = Number of batches * batch size

Once you fully go through the 1 million datasets, the epoch count will be 1.

Epoch = One Full use of all dataset (in training)

Iteration is how many batches you must run to get 1 epoch. Consqeunctly, it’s equal to the Number of batches.

You can summarize this as:

Total Training Dataset Count = 1 Epoch = Batch size * Num. of Batches (or iteration)

And that’s it! You won’t have to confused by these terms every again!

Bonus: What is the difference between Steps vs Periods?

Sometimes batches are big enough that we’d want to divide it up and output some values as our training algorithm goes through the batch. Thus, periods are divisions of batches.

Steps are the final division we can have. One step, is one processing of the gradient descent iteration. In our case, since the batch size was 10,000. It means 10,000 steps to run through the one batch. So, if we decided to output a console statement 10 times per batch, you can see it as:

Batch Size = Number of Periods * Steps:

Or in our case:

10,000 = 10 periods * 1000 steps

I hope that makes everything crystal clear for you!

Brian Ko - Developer