Classification of Machine Learning-Part 3: Reinforcement Learning

3 min readJul 22, 2021

Reinforcement learning is a third type of machine learning that enables systems to learn by trial and error in an interactive environment using feedback based on their own experiences. Reinforcement learning utilizes punishment and rewards to determine positive and negative behavior. Reinforcement learning aims to take suitable action for maximizing the reward in a specific situation. Different machines and software utilize this for finding the best behavior that should be taken in a particular case. There is no definite answer in reinforcement learning, and instead, the agent determines what needs to be done for performing a specific task.

The machine learns to attain the goal in a complicated and uncertain environment. Artificial intelligence undergoes a game-like situation in reinforcement learning. The computer uses trial and error to solve a particular problem. For making the machine function in a way that the programmer wants, AI gets penalties or rewards based on the performed actions. The programmer sets the game rules, but he does not hint to the model about solving the problem. The model tries to figure out the right way to perform a task for maximizing the reward. This begins with random trials finishes with superhuman skills. This is one of the most efficient ways to estimate the creativity of a machine.

The two model-free Reinforcement Learning algorithms used commonly include SARSA (State-Action-Reward-State-Action) and Q-learning. SARSA is an on-policy technique that allows the model to learn the value depending on its action derived from its existing policy. While Q-learning is an off-policy strategy in which the machine knows value depending on the activity derived from another approach. Both of these strategies are easy to implement; however, they do not estimate value for unseen conditions. This shortcoming can be overcome by using advanced algorithms like Deep Q-Networks and DDPG (Deep Deterministic Policy Gradient), which can identify the problem in continuous and high dimensional places by learning policies.

Reinforcement learning is of two types, i.e., positive and negative. Positive reinforcement occurs when a particular behavior enhances the frequency and strength of behavior, i.e., positively influences a behavior. Negative reinforcement means to strengthen a specific behavior by stopping or avoiding the harmful condition. Reinforcement learning can lead to maximum performance and sustenance of change for a more extended period.

Reinforcement learning may be utilized in the automation industry, machine learning, and data processing. It may also create training systems that provide custom materials and instruction according to the students’ requirements. It is widely utilized for developing Artificial Intelligence to play computer games. The realistic environments may have partial observability. However, the parameters can influence the speed of learning. The increased reinforcement can create an overload of states that may reduce the results.

Final Words

Reinforcement learning allows the computer/machine to interact with the environment and maximizing the reward depending on its own experiences. It is a cutting-edge technology that can transform the world; however, it may not be required in every situation. This can help make the machine creative and use innovative ways to perform the tasks to increase creativity. Hence, reinforcement learning can prove to be a ground-breaking technology and the next step of development in AI.

References

Anderson, H. S., Kharkar, A., Filar, B., Evans, D., & Roth, P. (2018). Learning to evade static pe machine learning malware models via reinforcement learning. arXiv preprint arXiv:1801.08917.

Aradi, S. (2020). Survey of deep reinforcement learning for motion planning of autonomous vehicles. IEEE Transactions on Intelligent Transportation Systems.

https://www.geeksforgeeks.org/what-is-reinforcement-learning/

Classification of Machine Learning-Part 3: Reinforcement Learning

Final Words

References

Written by Armel Djangone