Skip to main content

From rumors of radiologists being replaced to Twitter bots going rogue, artificial intelligence (AI) and machine learning (ML) have become buzzwords we’ve all heard about. There has been much discussion on the breakthroughs both could bring to how fields traditionally function, like patient diagnosis in health care. Many have also warned of their potential and history of bias and abuse. Given the gravity surrounding AI and ML, and with much help from an excellent Ars Technica piece by Haomiao Huang, we’ve created a guide geared towards non-technical individuals to help them learn more about the basics of these technologies.

What is AI?

That’s a difficult question to answer, as there’s not one agreed upon definition. Stuart Russell and Peter Norvig in the textbook Artificial Intelligence: A Modern Approach, define it as the designing and building of intelligent agents that receive percepts from the environment and take actions that affect the environment.

The AI that companies use in products and services is “weak AI”: AI focused on one narrow task. The technology behind “Alexa, turn on the lights” would be one example.

The other type of AI is “strong AI,” or perhaps the more Hollywood conception of robots capable of defeating humans and controlling society. It refers to machine intelligence which is at least equal to human intelligence. The system would be able to fulfill the full range of human cognitive ability: reasoning, making judgements, solving puzzles, learning, planning and communicating.

While we’re nowhere near strong AI given the techniques researchers currently have access to, the capability of weak AI has rapidly progressed with the help of machine learning and deep learning.

What is Machine Learning? 

We’ve improved the capability of many AI systems because of ML. ML is a way of creating machine intelligence by letting the system figure out its own set of internalized rules after being shown many examples. For instance, if one wants a system to identify a cat in a picture, the researcher first shows the system a cat and tells it, “This is a cat.” Then it shows the system a dog and tells it, “This is not a cat.” Using a larger dataset and predictive modeling to find patterns, the system then comes up with its own mechanisms to identify a cat.  

One of the reasons for the recent proliferation of ML is that society has gotten better at collecting a lot of data, or a lot of examples to show systems. The more data that is fed into a system, the more it is improved.  

How do you store what you’ve learned from the data? 

The computational representation of the learning that one stores is called the model, and it can take many forms. It can be as simple as a linear regression model or as complex as a reinforcement learning model, as stated by Medium. The model determines what kind of questions you can ask of it. For example, if you’re trying to create a machine learning AI to figure out which fruit is a fig, you might study a sample of fruits on the vectors of taste, color, shape and softness to help identify one over another. The sample could be limited to oranges, bananas and figs. The system would store information on each vector for each fruit—the model—and from there build a function that captures all the data. You could then pick a fruit, judge it by the four vectors, and with the function predict whether it was a fig. 

This model would work well if I decided to pick up a banana since its already contained in our preexisting dataset. In addition, its color and shape are markedly different from a fig’s. What if I picked up a plum? Given its likeness in color, shape and softness to figs, the model might mistake it for one. To improve the model’s efficacy, we must add fruits beyond oranges, bananas and figs so that the model has a more accurate understanding of the diversity of fruits, and thus all the nuanced differences between them. 

Even this light example shows the potential dangers of having a limited dataset in ML. Leaving out certain points could lead to an inaccurate model and therefore poor predictions. 

This example also shows the importance of choosing the right model for the right dataset. It needs to be sophisticated enough to properly capture the data but nimble enough to work with it and train it. This is where deep learning comes in.  

What is Deep Learning? 

Deep learning (DL) is a type of machine learning that is based on the structure of neurons in the brain. The models in DL used to store learning and making predictions are called neural networks.  These neural networks, which resemble neurons, are organized in layers: each later performs a set of computations and passes on the answer to the next.  

The stacking allows the system to perform complex computations; a few layers could probably easily convey the relationships expressed in the fig example above. Datasets with difficult relationships can be mastered since DL networks add tens or hundreds of layers without a human having to specify the rules themselves. This autonomy has contributed to its recent success, as it has allowed researchers to create algorithms for previously challenging problems.   

How do you improve a neural network? 

A model’s memory is a set of numerical parameters that govern how it generates answers to the questions it’s asked. To improve the model’s ability to answer questions, one can shift the memory.  

A powerful neural network in DL also carries a multitude of parameters, so to tweak a network with hundreds of layers is not an easy task. This task, however, can be completed in successive steps by training the network in the way a student tests his/her knowledge in an area by taking an exam. The researcher asks a question of the network and compares its answer to the correct one. Then the researcher can tweak the parameters of the memory and test again. 

Instead of blindly performing trial and error in tweaking parameters, neural networks can compute exactly how much an output will change in response to a shift in the parameters. The researcher can thus “hill-climb” to the best model: changing the parameters in better directions until the model reaches its optimal point.  

Hill-climbing is another reason DL has become so popular so quickly. Researchers can take the bones of an existing neural network structure and repurpose it to a new dataset by training the network. For example, if you have an existing AI that was used for identifying figs and want to use it to instead identify peaches, you can score it on identifying peaches and then hill-climb to get better.  

What are the applications? 

Given the massive amounts of data at hand, the ability to easily adapt models to different datasets, and deep learning networks that can create powerful models out of challenging datasets, AI and ML have exploded with exciting applications. Some of the most popular ones include object recognition, facial recognition, speech recognition, and content creation.  

Object recognition  

A specific kind of neural network in DL called a convolutional neural network (CNN) has made the task of recognizing objects in pictures more doable. To recognize objects, one must recognize the patterns, or features, inherent in the object. For example, a face is comprised of two circles in place of its eyes. Before CNNs existed, recognition required a researcher to manually come up with a feature and then program a computer to look for them. CNNs allow the network to figure out their own hierarchy of features as they build more complex ones from simple ones.  

Facial recognition  

Researchers can now train a network to recognize specifically one face. Without a deep neural network, a researcher would have to train a network to recognize the features in each new face in a dataset. Instead a researcher can use an existing network outfitted to recognize faces in general and change the output to have it recognize the description of one specific face in the form of numbers that capture the uniqueness of the nose or eyes.  

Speech recognition 

Speech recognition works similarly to object recognition in that the network is to recognize items as collections of smaller features. In vision, features are organized spatially, while in sound these features, or syllables, are organized in time. Recognizing speech can be tricky as two phrases can sound nearly similar, so context is needed to identify the correct one. With a large enough sample set of spoken words, a model can learn what the appropriate phrase likely is.  

Researchers use a specific kind of neural network in DL called a Recurrent Neural Network (RNN) to consider context in speech recognition. Unlike a CNN in object recognition, the output from one layer is fed back into the layer and further upstream. This feedback loop allows the network to maintain memory so that it can consider syllables as they come. This allows the network to see the syllables coming together to form a word and how likely like a phrase is in context.  

For example, to differentiate whether someone said, “Christmas Eve” versus “kiss my feet,” the network can pick up the words that came before and after the phrase. If it also hears “Santa Claus” and “mistletoe” it might say that “Christmas Eve” is the more likely phrase.  

Content creation  

Some models are discriminative, meaning they differentiate between objects for the purposes of recognition. Other models are generative, meaning they generate objects given a description of an object. Generative models use the CNNs from object recognition to create objects, but training a generative model is more difficult than training a discriminative one. It is easy to tell a network that a picture it believes to be Sally is not, in fact, Sally, but how does one tell a network that its drawing of a cat is not good enough?  

One can train a generative model with the use of Generative Adversarial Networks (GANs). GANs are two neural networks that work against each other to improve one another. One network creates the content and another network is trained to tell the difference between the “fake” (network-created content) and the real content. The two networks then compete with one another as the generative network tries to make convincing fakes that meets the discriminative model’s standards. Each network gets better from competing and when the models get good enough, the generative model can be used on its own. In this competition, the human plays the role of an arbiter, tweaking the networks as the game goes along.  

Interrogate and learn 

No matter how autonomous these models and applications may seem, they are still pre-defined by humans. From the initial datasets and rules, our biases can seep into the outputs and have negative consequences for projects with even the best intentions.  

Given the rapid momentum AI and ML is gaining in fields devoted to improving livelihoods, such as health and criminal justice, it is important for all those involved in building and applying these models to interrogate their biases and assumptions. And for non-techies, it’s important for us to learn the basics of these technologies so we can inform the discussion on how to make them best serve society.  

Keep an eye out for our upcoming pieces on the debate on AI and ML’s role in healthcare.

















The views and opinions expressed by the authors on this blog website and those providing comments are theirs alone, and do not reflect the opinions of Softheon. Please direct any questions or comments to