A Deeper Dive into the fields of Artificial Intelligence
So far, we’ve learned about the types of AI, Family of AI, its features and limitations. Here, in this blog, we’ll take a deeper dive into the types and variety learning algorithms of AI.
AI follows a hierarchy given as:
- Artificial Intelligence: AI is the broadest concept — it refers to the simulation of human intelligence by machines. Its major goal is to enable machines to think, learn, and solve problems like humans.
- Machine Learning (ML): ML, a subset of AI, Enable machines to think, learn, and solve problems like humans. In ML, Models improve their performance over time by finding patterns in data.
- Deep Learning (DL): DL, a subset of ML, uses artificial neural networks with many layers (hence “deep”). It learns features automatically from raw data (e.g., pixels, text).
Learning in Artificial Intelligence(AI) refers to the process by which a system improves its performance on a task over time through experience, data or interaction with the environment. Various types of learning algorithms in AI are:
TYPES OF LEARNING IN MACHINE LEARNING
Supervised Learning
The model learns from labeled data – inout-output pairs where the correct answer is known. The user provides a labeled data, for eg : Image -> Label (“Dog”). The model learns to map the outputs. Various algoritms in supervised learning are:
-
- Linear Regression: This algorithm learns from the labelled datasetss and maps the data points with most optimized linear functions which can be used for prediction on new datasets. It assumes that there is a linear relationship between the input and output. This relationship is represented by straight line. Linear Regression is a statistical method used to model the relationship between a dependent variable and one or more independent variables.
-
- Logistic Regression: Unlike linear regression which predicts continuous values it predicts predicts the probability that an input belongs to a specific class. This is used in binary classification where the output can be one of two possible categories sucha as yes/No, True/False or 0/1. It uses sigmoid function to convert inputs into a probability value between 0 and 1.
-
- Decision Trees: Decision tree helps us to make decisions by mapping out different choices and their possible outcomes. It mimics human decision-making using a tree-like structure of decisions. This is used for tasks like classification and prediction. IT works as : starts at the root -> ask a question -> Move down branches depending on the answer -> end at a leaf node with a prediction.
-
- Support Vector Machines (SVM): SVM is a supervised machine learning algorithm used for classification and regression tasks. It tried to find the best boundary known as hyperplane that separates different classes in tthe data. It’s main goal is to maximise the margin between two classes. The larger the margin, the better is the performance on new and unseen data. It finds the optimal hyperpplane that best separates the data into distinct classes.
-
- Hyperplane : It is a decision boundary that separates different classes. in 2D-> a line, in 3D -> a plane, in n-D -> a hyperplane.
-
- Margin : The distance between hyperplane and the closest data points.
-
- Support Vectors : The data points closest to the hyperplane, essential for decision boundary.
-
- Support Vector Machines (SVM): SVM is a supervised machine learning algorithm used for classification and regression tasks. It tried to find the best boundary known as hyperplane that separates different classes in tthe data. It’s main goal is to maximise the margin between two classes. The larger the margin, the better is the performance on new and unseen data. It finds the optimal hyperpplane that best separates the data into distinct classes.
-
- k-Nearest Neighbours (KNN): Generally used for classification, KNN can also be used for regression tasks. It works by finding “k” closest data points (neighbours) to a given inout and makes predictions based on the majority class (for classification) or the average value (for regression). It is a non parametric and instance-based learning method.It predicts the class/value of a new data point based on the majority class or average of the k nearest neighbors in the training data.Choosing k:
-
- Small K (e.g., k = 1) -> more flexible, high variance, overfitting risk.
-
- Large K (e.g., k = 10) -> smoother, less variance, but may underfit.
-
- k-Nearest Neighbours (KNN): Generally used for classification, KNN can also be used for regression tasks. It works by finding “k” closest data points (neighbours) to a given inout and makes predictions based on the majority class (for classification) or the average value (for regression). It is a non parametric and instance-based learning method.It predicts the class/value of a new data point based on the majority class or average of the k nearest neighbors in the training data.Choosing k:
-
- Naive Bayes: It is a classification algorithm that uses probability to predict which category a data point belongs to, assuming that all features are unrelated. It is based on foundational Bayes’ Theorem with a strong assumption of feature independence.It used for spam detection, sentiment analysis, document categorization etc.There are three types of Naive Bayes theorem:
-
- Gaussian Naive Bayes : Assumes features are continuous and follow a normal distribution.
-
- Multinomial Naive Bayes: Used for document classification (word counts, term frequencies).
-
- Bernoulli Naive Bayes: Used for binary features (eg, presence or absence of a word).
-
- Naive Bayes: It is a classification algorithm that uses probability to predict which category a data point belongs to, assuming that all features are unrelated. It is based on foundational Bayes’ Theorem with a strong assumption of feature independence.It used for spam detection, sentiment analysis, document categorization etc.There are three types of Naive Bayes theorem:
-
- Random Forests: It is a ML algoritm that uses many decision trees to make better predictions, Each tree looks ar different random parts of the data and their results are combined by voting for classification or averaging for regression. This helps in improving accuracy and reducing overfitting. It grows many decision trees, ad aggregates their predictions. It works as follows:
-
- Bootstrap Sampling (Bagging) : From original datasets with N samples, randomly select N data points.
-
- Grow Decision Trees: For each bootstrap sample, build a decision tree and continue until stopping criteria are met.
-
- Repeat for Multiple Trees : repeat step 1 and 2 for n_estimators times. This gives us a forest of decision trees.
-
- Aggregating Predictions : the final prediction is the majority vote or the average of all the tree outputs
-
- Random Forests: It is a ML algoritm that uses many decision trees to make better predictions, Each tree looks ar different random parts of the data and their results are combined by voting for classification or averaging for regression. This helps in improving accuracy and reducing overfitting. It grows many decision trees, ad aggregates their predictions. It works as follows:
UnSupervised Learning
The model is trained on an unlabelled dataset to discover patterns or structures.Various Algorithms under unsupervised learning are:
-
- K-Means clustering: It is an unsupervised learning algorithm used to group data into k-distinct non-overlapping clusters based on similarity. It tried to partition the datasets into k cluster such that:
-
- Each data point belongs to nearest cluster center (centroid).
-
- The intra-cluster distance is minimized.
-
- K-Means clustering: It is an unsupervised learning algorithm used to group data into k-distinct non-overlapping clusters based on similarity. It tried to partition the datasets into k cluster such that:
-
- Principal Component Analysis (PCA): It is the technique used to reduce the dimentionality of data, extracting key patterns while preserving as much variance as possible and Removing redundancy and multicollinearity. PCA finds new axes such that:
-
- Capture maximum variance in the data.
-
- Each component is orthogonal to the others.
-
- Reduce dimensionality by keeping only the top few components.
-
- Principal Component Analysis (PCA): It is the technique used to reduce the dimentionality of data, extracting key patterns while preserving as much variance as possible and Removing redundancy and multicollinearity. PCA finds new axes such that:
-
- Hierarchial Clustering: Hierarchial Clustering builds a tree-like structure (called a dendrogram) of nested clusters. unlike k means clustering, It does not require the number of clusters to be specified beforehand. These are mainly of two types:
-
- Agglomerative (Bottom-Up): start with eaach point as its own cluster and merge them.
-
- Divisive (Top-Down): start with all points in one cluster and split them.
-
- Hierarchial Clustering: Hierarchial Clustering builds a tree-like structure (called a dendrogram) of nested clusters. unlike k means clustering, It does not require the number of clusters to be specified beforehand. These are mainly of two types:
Reinforcement Learning
The model learns by interacting with an environment and getting feedback in the form of rewards or penalties. The AI agent takes an action -> gets reward or punishment -> Learns what to do next time. This type of learning can be seen in Game-playing agents, self-driving cars, robotic control systems. Algorithms under Reinforcement Learning are :
-
- Markov Decision Processes (MDPs): It is a mathematical framework where the output are partly random and partly under control of an agent (decision maker). This framework is used to model decision making problems hence forms a key foundation of reinforcement learning. An MDP is defined by a 5-tuple:
-
- S : set of states, all possible situations the agent c can be in.
-
- A : set of Actions, all possible actions the agent can take.
-
- P(s′∣s,a) : Transition probability, probability of reaching state s’ after taking action a in state s.
-
- R(s,a) : Reward function, immediate reward for taking actiond a in state s.
-
- γ : Discount factor, how much future rewards are valued (0 to 1)
-
- Markov Decision Processes (MDPs): It is a mathematical framework where the output are partly random and partly under control of an agent (decision maker). This framework is used to model decision making problems hence forms a key foundation of reinforcement learning. An MDP is defined by a 5-tuple:
-
- Q-Learning: is a model-free reinforcement learning approach used to find the optimal action-selection policy for a given finite MDP (Markov Decision Process). It guides an agent to learn how to act optimally in a given environment by getting through learning by interactions, without needing a model of the environment. Q-values are stored in a 2D Q table in which rows show states and columns show actions. Each cell holds a state-action pair.
-
- Deep Q-Network (DQN): It is based on Q-Learning but inferenced with neural network to form a deep neural network. It uses Q-function instean of a Q-table. Its key components are:
-
- Q-Network : input — state, output — Q-values for wach possible action.
-
- Experience Replay buffer : stored past experiences , randomly samples mini-batches during training -> breaks correlation in data.
-
- Target Network : A copy of Q-network used to compute target Q-values.
-
- Deep Q-Network (DQN): It is based on Q-Learning but inferenced with neural network to form a deep neural network. It uses Q-function instean of a Q-table. Its key components are:
-
- Bellman Equation: Bellman equation expresses a relationship between the value of the state and the value of its successor states. hence it forms a fundamental recursive fomula in reinforcement learning and dynamic programming. It is used to evaluate nd improve policies and is fundamental backbone of algorithms like Q-learning, DQN, and policy iteration. It is based on the intuition that :the value of being in the best you can get by taking an action, getting the immediate reward, and then following the optimal path forward”.
Deep Learning
Deep learning focuses on using neural networks with many layers to model and understand complex patterns and representations in large datasets. Deep learning mimics neural networks of the human brain, it enables computers to autonomously uncover patterns and make informed decisions from vast amount of unstructured data. Different type of deep learning models are:
-
- Single Neuron: Neuron si a single unit in any neural network. A neural network is the interconnections of nodes (or neurons) to mimic the neural structure of human nervous system hence making system to replicate how human brain works making it “Artificially Intelligent”.
-
- Single Layer Perceptron: It is inspired by biological neurons and their ability to process input information. It is a single artificial neuron and is a fundamental building block of neural networks. A SLP is a simplified computational model that mimics the behaviour of a biological neuron. It works as follows:
-
- Receive signal from outside.
-
- process signals by adding weights and biases .hence sum them up.
-
- pass the sum to the nonlinear function to produce the output.
-
- Single Layer Perceptron: It is inspired by biological neurons and their ability to process input information. It is a single artificial neuron and is a fundamental building block of neural networks. A SLP is a simplified computational model that mimics the behaviour of a biological neuron. It works as follows:
-
- Multi-Layer Perceptron: Consists of fully connected dense layers that transforms input data from one dimension to another hence called “multi-layer”. It contains one or more hidden layers and an output layer. It models complex relationships between inputs and outputs. Its components are :
-
- Input layer : each node in this layer corresponds to an input feature.
-
- Hidden layers : MLP can have any number of hidden layers with each layer containing any number of nodes. These layers are supposed to process information received from the input layer.
-
- Output layer : This layer generates the final prediction according to the value processed by hidden layer.
-
- Multi-Layer Perceptron: Consists of fully connected dense layers that transforms input data from one dimension to another hence called “multi-layer”. It contains one or more hidden layers and an output layer. It models complex relationships between inputs and outputs. Its components are :
-
- Artificial Neural Networks(ANNs): ANN contains artificial neurons(called units) , arranged in a series of layers that contribute to whole ANN. The layer may have a dozen of units or millions, depending on the network complexity. In the hidden layer, each neuron receives input from the previous layer neurons, computes the weighted sum, and sends it to the neurons in the next layer. These connections are weighted means effects of the inputs from the previous layer are optimized more or less by assigning different-different weights to each input and it is adjusted during the training process by optimizing these weights for improved model performance. ANN are trained using training set . If the ANN identifies incorrectly, it adjusts weights according to the learning rate during training.
-
- Feedforward Neural Network (FNN): It is an ANN in which information flows in one direction – from input layer, through hidden layer, to output layer – without loops or any kind of feedback. It is mainly used for pattern recognition tasks like image and speech classification. Activation functions introduce non-linearity into the network enabling it to learn and model complex data patters. A feedforward network is trained as :
-
- Forward propagation : the input data passes through the network and the output is calculated.
-
- Loss Calculation: The loss (or error) is calculated using a loss function such as Mean Squared Error (MSE) for regression tasks or Cross-Entropy Loss for classification tasks.
-
- Backpropagation : The error is propagated back to the network to update weights. The gradient of the loss function with respect to each weight is calculated and the weights are adjusted using gradient descent.
-
- Feedforward Neural Network (FNN): It is an ANN in which information flows in one direction – from input layer, through hidden layer, to output layer – without loops or any kind of feedback. It is mainly used for pattern recognition tasks like image and speech classification. Activation functions introduce non-linearity into the network enabling it to learn and model complex data patters. A feedforward network is trained as :
-
- Convolutional Neural Networks(CNN): It is an advances ANN, primarily designed to extract features from grid like matrix datasets. This is particularly useful for visual datasets such as images or videos, where data patterns play a crucial role.
-
- A typical CNN consists of
-
- Input layers
-
- Convolutional Layers
-
- Activation LAyer
-
- Pooling layers
-
- fully connected layers
-
- Output layer
-
- A typical CNN consists of
-
- Working of a CNN
-
- A complete CNN network is called as covnets. an Image can be represented as 3D having dimensions as (l , w , h ) where l – length, w – width ahd h- height i.e the channel of image.
-
- THis image is processed through covnets as:
-
- Input Layer : image is provided as input htis layer in format (l, w, h).
-
- Convolutional layer : this extracts the features from input datasets and creates the feature maps . feature maps are dot product output of kernel weight and the corresponding input image patch. kernels are filtered matrices taken out of original image to generate feature maps.
-
- Activation layer : this layer adds non-linearity to the output of covolutional layer. It performs this by adding common activation functions like RELU, Tanh, Leaky RELU etc. the output is passed to the next pooling layer.
-
- Pooling layer : its function is to reduce the size of volume which makes computation fast reduces memory and also prevents overfitting. The compressed output is passed on to next layer.
-
- Output layer : The output from the fully connected layers is then fed into a logistic function for classification tasks like sigmoid or softmax which converts the output of each class into the probability score of each class.
-
- THis image is processed through covnets as:
-
- Working of a CNN
-
- Convolutional Neural Networks(CNN): It is an advances ANN, primarily designed to extract features from grid like matrix datasets. This is particularly useful for visual datasets such as images or videos, where data patterns play a crucial role.
-
- Recurrent Neural Network(RNN): While standard neural networks pass information in one direction i.e from input to output, RNNs feed information back into the network at each step.RNNs work by “remembering” past information and passing the output from one step as input to the next i.e it considers all the earlier words to choose the most likely next word. This memory of previous steps helps the network understand context and make better predictions. RNN consists of two key components:
-
- Recurrent Neurons : It is a single unit of an RNN, holding hidden state that maintains information about previous inputs in a sequence. These remember information from prior steps by feeding back their hidden state, allowing them to capture dependencies across time.
-
- RNN Unfolding : is the process of expanding the recurrent structure over time steps. During unfolding each step of the sequence is represented as a separate layer in a series illustrating how information flows across each time step.
-
- RNN Working:
-
- The units in RNN have internal hidden state that acts as a memory retained from previous steps . This memory allows the network to store past knowledge and adapt based on new inputs.
-
- RNN Working:
-
- Recurrent Neural Network(RNN): While standard neural networks pass information in one direction i.e from input to output, RNNs feed information back into the network at each step.RNNs work by “remembering” past information and passing the output from one step as input to the next i.e it considers all the earlier words to choose the most likely next word. This memory of previous steps helps the network understand context and make better predictions. RNN consists of two key components:
-
- Long Short-Term Memory (LSTM) networks: It is an enhanced version of RNN to capture long-term dependencies in sequential data making them ideal for tasks like language translation, speech recognition and the time series forecasting. LSTMs introduce a memory cell that holds information over extended periods addressing the challenge of learning long-term dependencies. However they often face challenges in learning long-term dependencies where information from distant time steps becomes crucial for making accurate predictions for current state. This problem is known as the vanishing gradient or exploding gradient problem..
Author