Vital few, trivial many: November 2021

Saturday, November 20, 2021

Garbage in, garbage out (GIGO)

17 Common Issues In Machine Learning: Simplified

INTRODUCTION

Machine Learning or ML is one of the most successful applications of Artificial intelligence which provides systems with automated learning without being constantly programmed. It has acquired a ton of noticeable quality lately due to its capacity to be applied across scores of ventures to tackle complex issues quickly and effectively. From Digital assistants that play your music to the products being recommended based on prior search, Machine Learning has taken over many aspects of our life. It is a skill in high demand as companies require software that can grasp data and provide accurate results. The core objective is to obtain optimal functions with less confusion.

1. WHAT IS MACHINE LEARNING?

Machine Learning is a segment that comes under Artificial Intelligence (AI) that increases the quality of applications by using previously assimilated data. It programs systems to learn and grasp data without having to feed a new code for every new similar activity. The aim is for the flow to be automated rather than continuously modified. Hence by experience and past intel, it improves the program by itself.

2. WHY MACHINE LEARNING?

The domain of Machine Learning is a continuously evolving field with high demand. Without human intervention, it delivers real-time results using the already existing and processed data. It generally helps analyze and assess large amounts of data with ease by developing data-driven models. As of today, Machine Learning has become a fast and efficient way for firms to build models and strategize plans.

3. ADVANTAGES OF MACHINE LEARNING

Completely Automated ( Zero human intervention)
Analyses large amounts of data
More efficient than traditional data analytical methods
Identifies trends and patterns with ease
Reliable and efficient
Less usage of workforce
Handles a variety of data
Accommodates for most forms of applications

4. COMMONLY USED ALGORITHMS IN MACHINE LEARNING

There are many different models in Machine Learning. Here are the most commonly used algorithms in the world today:

Gradient Boosting algorithms dimensionality Reduction Algorithms
Random Forest
K-Means
KNN
Naive Bayes
SVM
Decision Tree
Logistic Regression
Linear Regression

Machine Learning or ML is slowly but steadily having a huge impact on data-driven business decisions across the globe. It has also helped organizations with the correct intel to make more informed, data-driven choices that are quicker than conventional methodologies. Yet, there are many issues in Machine Learning that cannot be overlooked in spite of its high productivity.

5. LIST OF COMMON PRACTICAL ISSUES IN MACHINE LEARNING

1) LACK OF QUALITY DATA

One of the main issues in Machine Learning is the absence of good data. While upgrading, algorithms tend to make developers exhaust most of their time on artificial intelligence. Data quality is fundamental for the algorithms to work as proposed. Incomplete data, unclean data, and noisy data are the quintessential foes of ideal ML. Different reasons for low data quality are-

Data can be noisy which will result in inaccurate predictions. This often leads to less accuracy in classification and low-quality results. It is noted as one of the most common errors faced in terms of data.

Incorrect or incomplete information can also lead to faulty programming through Machine Learning. Having less information will lead the program to analyze based on the minimal data present. Hence, decreasing the accuracy of the results.

For better future actions, the generalizing of input and output of past data is crucial. But a common issue that occurs is, the output data can become difficult to generalize.

2) FAULT IN CREDIT CARD FRAUD DETECTION

Although this AI-driven software helps to successfully detect credit card fraud, there are issues in Machine Learning that make the process redundant. It is tough for the system to spot anything without adequate amounts of data, hence making them blind to any illegal connections. Hence detecting fraud without possessing a significant amount of data is close to impossible.

3) GETTING BAD RECOMMENDATIONS

Proposal engines are quite regular today. While some might be dependable, others may not appear to provide the necessary results. Machine Learning algorithms tend to only impose what these proposal engines have suggested. So if there is any modification in the necessity of the result, then the recommendation will be of no use. Creating a complex algorithm, collecting large amounts of data, and implementing the algorithm, leading to nothing but incorrect results in case of changed priorities is one of the biggest issues with Machine Learning

4) TALENT DEFICIT

Albeit numerous individuals are pulled into the ML business, however, there are still not many experts who can take complete control of this innovation. It is quite rare to find a trained professional who is capable of comprehending the problems in Machine Learning and be able to reach out to a reliable software solution for the same.

5) IMPLEMENTATION

Organizations regularly have examination engines working with them when they decide to move up to ML. The usage of fresher ML strategies with existing procedures is a complicated errand. Keeping up legitimate documentation and interpretation need to go a long way to facilitating maximum usage. There are issues in Machine Learning when it comes to implementation-

Slow deployment – Although the models of Machine Learning are time efficient the creating process of the same is quite the opposite. As it is still a young innovation the implementation time is slow.

Data Security – Saving confidential data on ML servers is a risk as the model will not be able to differentiate between sensitive and insensitive data.

Lack of data is another key issue faced during the implementation of the model. Without adequate data, it is not possible to give out valuable intel.

6) MAKING THE WRONG ASSUMPTIONS

ML models can’t manage datasets containing missing data points. Thus, highlights that contain a huge part of missing data should be erased. On the other hand, if there are a couple of missing qualities in an element, rather than erasing it, we could fill those vacant cells. The most ideal approach to manage these issues in Machine Learning is to ensure that your data doesn’t accompany gaping holes and can convey a considerable measure of presumptions.

7) DEFICIENT INFRASTRUCTURE

ML requires a tremendous amount of data stirring abilities. Inheritance frameworks can’t deal with the responsibility and clasp under tension. You should check if your infrastructure can deal with issues in Machine Learning. If it can’t, you should hope to upgrade it completely with good hardware and adaptable storage.

8) HAVING ALGORITHMS BECOME OBSOLETE WHEN DATA GROWS

ML algorithms will consistently require a lot of data when being trained. Frequently, these ML algorithms will be trained over a specific data index and afterward used to foresee future data, a cycle which you can only expect with a significant amount of effort. The earlier “accurate” model over the data set may presently not be considered accurate, at a point where the arrangement of data changes.

9) ABSENCE OF SKILLED RESOURCES

The other issues in Machine Learning are that deep analytics and ML in their present structures are still new technologies. From the start code to the maintenance and monitoring of the process, Machine Learning experts are required to maintain the process. Artificial Intelligence and Machine Learning industries are still freshers to the market. Finding enough resources in the form of manpower is also difficult. Hence, there is a lack of talented representatives available to develop and manage scientific substances for ML. Data researchers regularly need a mix of space insight just as top to bottom knowledge of mathematics, technology, and science.

10) CUSTOMER SEGMENTATION

Let us consider the data of human behavior by a user during a time for testing and the relevant previous practices. All things considered, an algorithm is necessary to recognize those customers that will change over to the paid form of a product and those that won’t. A model with this choice issue would permit a program to trigger relevant recommendations for the user based on his catalog’s behavior.

The lists of supervised learning algorithms in ML are:

Neural Networks
Naive Bayesian Model
Classification
Support Vector Machines
Regression
Random Forest Model

11) COMPLEXITY

Although Machine Learning and Artificial Intelligence are booming, a majority of these sectors are still in their experimental phases, actively undergoing a trial and error method. From the setting up of the system to instilling complex data and even coding, the procedure is extremely complicated and quite tedious. It is a time-consuming and strenuous procedure that cannot accommodate any kinds of errors or mistakes.

12) SLOW RESULTS

Another one of the most common issues in Machine Learning is the slow-moving program. The Machine Learning Models are highly efficient bearing accurate results but the said results take time to be produced. Due to an overload of data and requirements, it takes longer than expected to provide results. This is mainly because of the complex algorithm that they work on and the time it takes to derive usable results. Another reason is that it requires constant monitoring at every step of the process.

13) MAINTENANCE

Requisite results for different actions are bound to change and hence the data needed for the same is different. This needs editing the code and more resources for monitoring the changes as well. As the outputs need to be generalized, regular monitoring and maintenance are necessary. Consistent maintenance is the key to keep the program up to date

14) CONCEPT DRIFT

This occurs when the target variable changes, resulting in the delivered results being inaccurate. This forces the decay of the models as changes cannot be easily accustomed to or upgraded. To eradicate this problem, a model that can adapt to a whole array of changes is necessary.

15) DATA BIAS

This occurs when certain aspects of a data set need more importance than others. Focusing on particular features within the database in order to generalize the outcomes is very common in Machine Learning Models. This leads to inaccurate results, low outcome levels, and other such errors.

16) HIGH CHANCES OF ERROR

Many algorithms will contain biased programming which will lead to biased datasets. It will not deliver the right output and produces irrelevant information. The usage of this can lead to bigger errors in the business models. This commonly occurs when the planning process is not done right. As you must have figured out by now, Machine Learning is all about concise algorithms. Before creating the model it is imperative that they identify the accurate problem statement and create a strategy. The major issues that arise in Machine Learning are from the errors in planning before the implementation.

17) LACK OF EXPLAINABILITY

Machine Learning is often termed a “Black box” as deciphering the outcomes from an algorithm is often complex and sometimes useless. This basically means the outputs cannot be easily comprehended as it is programmed in specific ways to deliver for certain conditions. This lack of explainability will make the reverse engineering of an algorithm nearly impossible, reducing the credibility of the algorithm.

CONCLUSION

Although there are many issues in Machine Learning, Today it is noted as one of the most evolving industries with advanced technological developments. Many of the biggest companies use Machine Learning to assist with large-grouped data analytics. From medical diagnosis and its developments to prediction and classification, ML has become an important and highly used process.

Differentiate Between Linear and Nonlinear Equations

Linear Equations	Non-Linear Equations
A Linear equation can be defined as the equation having the maximum only one degree.	A Nonlinear equation can be defined as the equation having the maximum degree 2 or more than 2.
A linear equation forms a straight line on the graph.	A nonlinear equation forms a curve on the graph.
The general form of linear equation is, y = mx +c Where x and y are the variables, m is the slope of the line and c is a constant value.	The general form of nonlinear equations is, ax2 + by2 = c Where x and y are the variables and a,b and c are the constant values
Examples: 3x + 2 = 5 4x + 5 = 1	Examples: 2x2+ 3y2 = 7 a2 + 2ab + b2 = 0

Convolutional Neural Networks (CNNs)

Convolutional Neural Networks (CNNs) are neural networks that automatically extract useful features (without manual hand-tuning) from data-points like images to solve some given task like image classification or object detection.

Friday, November 19, 2021

Change in Entropy

Here the entropy of the system has reached its maximum value, so that there is no more change in entropy.

Monday, November 15, 2021

The Central Limit Theorem

Binomial Distribution

The binomial distribution is a discrete probability distribution. It describes the outcome of n independent trials in an experiment. Each trial is assumed to have only two outcomes, either success or failure. If the probability of a successful trial is p, then the probability of having x successful outcomes in an experiment of n independent trials is as follows.

( ) n x (n− x) f(x) = x p (1− p) where x = 0,1,2,...,n

Problem

Suppose there are twelve multiple choice questions in an English class quiz. Each question has five possible answers, and only one of them is correct. Find the probability of having four or less correct answers if a student attempts to answer every question at random.

Solution

Since only one out of five possible answers is correct, the probability of answering a question correctly by random is 1/5=0.2. We can find the probability of having exactly 4 correct answers by random attempts as follows.

> dbinom(4, size=12, prob=0.2) 
[1] 0.1329

To find the probability of having four or less correct answers by random attempts, we apply the function dbinom with x = 0,…,4.

> dbinom(0, size=12, prob=0.2) + 
+ dbinom(1, size=12, prob=0.2) + 
+ dbinom(2, size=12, prob=0.2) + 
+ dbinom(3, size=12, prob=0.2) + 
+ dbinom(4, size=12, prob=0.2) 
[1] 0.9274

Alternatively, we can use the cumulative probability function for binomial distribution pbinom.

> pbinom(4, size=12, prob=0.2) 
[1] 0.92744

Answer

The probability of four or less questions answered correctly by random in a twelve question multiple choice quiz is 92.7%.