Exploratory Data Analysis (EDA) is a crucial step before you start further analyzing or modeling your data. It prepares, cleans your data, and provides the context needed to develop an appropriate model — and explains the results correctly. Some of the key steps in EDA include:

  • Summarizing main characteristics of the data
  • Gaining better understanding of the data set
  • Uncovering relationship between variables
  • Extracting important variables

Importing the dataset

The complete code and dataset can be found on my Github. Please feel free to dowanload it at:

Let’s begin by importing the necessary libraries in Python. These libraries…

Building a Machine Learning algorithm is more than just feeding data into it; several flaws impair any model’s performance. Overfitting in Machine Learning is one such flaw that reduces the model’s accuracy and efficiency.

When we feed a mathematical model with much more data (including noise) than it takes, it is overfitted. Consider wanting to blend into bulky clothing to make it more relatable. As a model suits more data than it does, it begins to detect noisy data and incorrect values in the data. As a consequence, the model’s performance and consistency suffer.

Training with more data will help…

In this lab, I completed a series of exercises exploring movie rating data IMDb. I conducted basic exploratory data analysis on IMDB’s movie data, looking to answer such questions as what is the average rating per genre? How many different actors are in a movie?

Basic level

Let’s import the necessary libraries:

Read in ‘imdb_1000.csv’ and store it in a DataFrame named ‘movies’.


When analyzing your data, it is crucial to recognize and understand the importance of each data type. Depending on the type of your data, a specific analysis will be appropriate. Similarly, the data type will also drive the choice of data visualization techniques. In this series of 2 posts, I first presented the different type of statistical data and provided some examples of each type. I further explained the importance of data types using Python and R programing.

In this part 1, I covered data type with Python, converting data type with Python, and visualizing quantitative and qualitative.

Numerical data Versus categorical data

Various data…

Armel Djangone

A Data Scientist with demonstrable, superior, data-driven problem-solving skills. Passionate about data.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store