source: AI Playbook

History

Dartmouth workshop

The Dartmouth Summer Research Project on Artificial Intelligence was the name of a 1956 summer workshop now considered by many(though not all) to be the seminal event for artificial intelligence as a field.

To a first approximation, that six-decade history AI divides reasonably well into “Classical AI” from “Modern AI”.

In Classical AI, researchers used logical rules to model intelligence.

If Classical AI was about very smart researchers creating rules attempting to understand the world, Modern AI techniques focus on letting computers derive their own “rules” using lots and lots of data. Rather than explicitly telling a computer how to find a cat, we’ll just show the computer lots of examples of cats, and see if the computer can construct a cat detector by figuring out what differentiates cats from dogs or muffins or couches or motorcycles.

第三次见到这本书, 人生如此, 事不过三. 应该是要阅读了.Artificial Intelligence: A Modern Approach

Defining AI Terms

Precisely defining artificial intelligence is tricky. John McCarthy proposed that AI is the simulation of human intelligence by machines for the inaugural summer research project in 1956. Others have defined AI as the study of intelligent agents, human or not, that can perceive their environments and take actions to maximize their chances of achieving some goal. Jerry Kaplan wrestles with the question for an entire chapter in his book Artificial Intelligence: What Everyone Needs To Know before giving up on a succinct definition.

Rather than try to define AI precisely, we’ll simply differentiate AI’s goals and techniques:

  • On the one hand, AI has a set of goals such as recognizing what’s in a picture (vision recognition), converting a recording of your voice into the words you meant (natural language processing), or finding the best way to get to grandmother’s house (route planning). In this guide, we’ll cover natural language processing, vision, and pattern recognition. But there are many goals, including autonomy, knowledge representation, logical reasoning, planning, learning, manipulating objects, classifying documents, and so on.

  • On the other hand, we have a toolkit of computer science techniques (think algorithms and data structures) we use in trying to achieve those goals. In this guide, we’ll use techniques such as deep learning and supervised learning. There are many other AI techniques, including symbolic computation, search and mathematical optimization, probabilistic techniques, and many others.

Artificial Intelligence comprises all ML techniques, but it also includes other techniques such as search, symbolic reasoning, logical reasoning, statistical techniques that aren’t deep learning based, and behavior-based approaches.

Strong vs weak AI, or narrow vs deep AI

One other distinction you might see as you continue your AI journey is between hard/soft, strong/weak, and deep/narrow AI. All of them basically distinguish between systems that work in a specific domain (soft, weak, narrow) such as vision recognition and language translation with systems that can generalize across many specific problems and continuously learn. Google’s DeepMind and OpenAI (in general) are working on hard/strong/deep AI. Google Brain (in general) is working on concrete capabilities that make all Google products better (e.g., better Inbox Smart Replies, better face and object detection in Google Photos, better search results in Google search, etc.).

Giving Your Software AI Superpowers

For example, this HTTP request sends a sentence to a sentiment analyzer over the Web:

1
GET /api/sentiment?phrase=Airplanes+with+the+shape+of+hawks+are+cool

Your code will get an answer like below, which means the sentiment of is positive. More precisely, the answer you get back is “this is an English sentence which has a positive (negative sentiments have a negative score) with a “sentiment strength” of 0.6, with 0 being weak and 1 being strong”. So if you are an airplane designer, may we suggest a hawk shape next time you have a clean sheet of paper out?

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
{
"score": 0.6000000238418579,
"magnitude": 0.6000000238418579,
"sentences": [
{
"text": {
"content": "Airplanes with the shape of hawks are cool.",
"beginOffset": -1
},
"sentiment": {
"magnitude": 0.6000000238418579,
"score": 0.6000000238418579
}
}
],
"language": "en"
}

If you application needs to work without an Internet connection, you can embed software libraries to call locally. For example, you can grab the Stanford CoreNLP toolkit to add language processing capabilities to your software, such as “part of speech tagging”, which tries to identify where are the nouns, verbs, adjectives, adverbs and so are in your text.

Just as with regular programming, if you can’t find a pre-built Web service or library to do what you want, you’ll have to create your own special functions. These days, the most popular way to do this is by training a machine learning model with labeled data using something such as scikit-learn or Spark’s MLlib (for a wide collection of machine learning techniques) or Tensorflow, Keras, Caffe2 or MXnet (for deep learning models). We’ll walk through a few examples using a Web service called Clarifai and Google’s TensorFlow later in this guide.

Natural Language Processing

Natural Language Processing (NLP) will enable better understanding all around: we’ll talk to our computers; our computers will understand us; and we’ll have the Star Trek Universal Communicator in our ears translating any language into our native language in real time (and vice versa).

Before we get to long, philosophical, and emotional natural conversations with our computers (as in the movie Her, we can build a lot of extremely useful language-enabled applications that help do things like understand whether someone is getting angry on a support call, write better job descriptions, and disambiguating words whose meaning change depending on context (see this Wikipedia page for a fun list of examples including one of my favorite perfectly grammatical sentences: Buffalo buffalo Buffalo buffalo buffalo buffalo Buffalo buffalo.

This branch of AI includes such capabilities as:

  • Automatic speech recognition (ASR), which converts what you say into computer-readable text, also called “speech to text”
  • Text to speech, which goes the other way and gives computer an increasingly human-sounding voice
  • Language detection which figures out what language a document is written in
  • Machine translation, which translates text from any language to any language. Some language pairs work better than others, mostly because of the availability of data sets.
  • Sentiment analysis, which figures out the emotional tilt of text
    Entity extraction, which highlights all the “things, places, people, and products” in a piece of text
  • Information extraction, which finds relationships between extracted entities, such as “who did what to whom and how did they do it”?
  • Document analysis, which categorizes documents, figures out what they are talking about (topic modeling), and makes the content easy to search (find documents about “fund raising” even if the documents never contain that exact phrase)
  • Natural language generation, which generates well-formed sentences so that when you are chatting with your bot, it sounds like a real person
  • Summarization, which generates readable summaries of arbitrary text documents, preserving as much of the meaning as possible
  • Question answering, which answers questions from people about a dataset. Part of the challenge is figuring out which questions mean the same thing. In bots, a hot topic these days is “intent categorization”, which is about trying to figure out what the user is trying to accomplish (e.g., book a flight, schedule a meeting, make a withdrawal)

Computer Vision

These symbols have many forms: they can be labels from a set used for training, captions, text extracted from the image via OCR, colors, and so on. Not all images are created alike: In general, systems that are good at processing attributes for still images are not necessarily as good for processing video, and vice-versa.

Sub-domains of computer vision include scene reconstruction, motion/event detection, tracking, object recognition, and image restoration among many others.

Training Your Own Models

There are a large setof machine learning algorithms with fun names such as decision trees, random forest, support vector machines, logistic regression, and so on. Each algorithm is best suited for a specific situation depending on how much data you have, how many “features” or dimensions of data you can feed the algorithms, how sparse or dense the data set is, and so on. Sometimes it’s hard to figure out which algorithm to use, and you will have to try a few different algorithms (and combinations of algorithms) to see how they do.

Here are a few good starting points to picking the right ML algorithm to solve your specific problem:

For both these reasons (namely, (1) it’s working and (2) it figures out features on its own), we’ll spend the rest of the time in this playbook digging into deep learning. But before we continue, this tweet is spot on:

Deep Learning

Having said that, deep learning algorithms are incredibly powerful and getting amazing results across many different domains. Professor Christopher Manning, a longtime veteran of NLP research at Stanford, says in his introductory lecture for “CS Natural Language Processing with Deep Learning” that “in the length of my lifetime, I’d actually say it’s unprecedented [for] a field to progress so quickly”.

Deep learning data structures and algorithms were originally inspired by the way neurons in the brain work, but most researchers today will tell you that brains and neural networks used in software like TensorFlow are very different. But if you are interested in the history of how we got here, check out these excellent resources which we’ve ordered by depth, from most concise to most comprehensive, for your reading pleasure.

Neural Network Architectures

The fundamental data structure of a neural network is loosely inspired by brains. Each of your brain cells (neurons) is connected to many other neurons by synapses. As you experience and interact with the world, your brain creates new connections, strengthens some connections, and weakens others. A neural network’s data structure has many similarities: its “neurons” are nodes in a network connected to other nodes. Connections between nodes have a strength. Neurons activate (that is, generate an electrical signal) based on inputs they receive from other neurons.

If you want to see some of this heat, read the IEEE interview with UC Berkeley professor Michael Jordan from October 2014.

The Wikipedia article on the types of artificial neural networks is a good reference for further exploration.

Neural Network Zoo

Feedforward Neural Networks(FFNs)

Feedforward networks were the first type of artificial neural network devised. In this network the information moves in only one direction: forward. Input nodes receive data and pass it along, as seen in Figure 1.

Simple Feedforward Network

From the input nodes data flows through the hidden nodes (if any) and to the output nodes, without cycles or loops, and may be modified along the way by each node. They are called feedforward because processing moves forward from left to right.

When feedforward networks have multiple layers, they are called multilayer networks.

Convolutional (Neural) Networks (CNNs or ConvNets)

Convolutional neural networks are a specific type of multilayer feedforward network typically used in image recognition and (more recently) some natural language processing tasks.

Recurrent Neural Networks (RNNs), including Long Short-Term Memories (LSTM)

First, RNNs support bi-directional data flow, propagating data from later processing stages back to earlier stages as well as linearly from input to output. This diagram from Christopher Olah’s excellent overview article shows the shape of an RNN:

Unrolled recurrent neural network
This architecture enables the RNN to “remember” things, which makes them great for processing time-series data (like events in an event log) or natural language processing tasks (like understanding the roles each word plays in a sentence, in which remembering what word came before can help you figure the role of the current word).

Secondly, RNNs can process arbitrarily-sized inputs and outputs by processing vectors in a sequence, one at a time. Where feedforward and CNNs only work on fixed sized inputs and outputs, RNNs can process one vectors one after another thereby work on any shape of input and output. Andrej Kaparthy comes to the rescue with a diagram that shows this from his excellent blog post titled The

Ways In Which Machines Learn

Supervised Learning

Supervised Learning trains networks using examples where we already know the correct answer. Imagine we are interested in training a network to recognize pictures from your photo library that have your parents in them. Here’s the steps we’d take in that hypothetical scenario.

Step 1: Data Set Creation and Categorization

We would start the process by going through your photos (the data set) and identifying all the pictures that have your parents in them, labeling them. We would then take the whole stack of photos and split them into two piles. We would use the first pile to train the network (training data) and the second pile to see how accurate the model is at picking out photos with our parents (validation data).

Step 2: Training

To continue the process, the model makes a prediction for each photo by following rules (activation function) to decide whether to light up a particular node in the work. The model works from left to right one layer a time–we will ignore more complicated networks for the moment. After the network calculates this for every node in the network, we’ll get to the rightmost node (output node) which lights up or not.

Step 3: Verify

Once we’ve processed all the photos from our first stack we will be ready to test the model. We would grab the second stack of photos and use them to see how accurately the trained model can pick up photos of your parents.

Step 4: Use

Unsupervised Learning

Unsupervised Learning is for situations where you have a data set but no labels. Unsupervised learning takes the input set and tries to find patterns in the data, for instance by organizing them into groups (clustering) or finding outliers (anomaly detection). For example:

Some unsupervised learning techniques you’ll read about in the literature include:

Semi-supervised Learning

Semi-supervised learning combines a lot of unlabeled data with a small amount of labeled data during the training phase. The trained models that result from this training set can be highly accurate and less expensive to train compared to using all labeled data. Our friend Delip Rao at the AI consulting company Joostware, for example, built a solution using semi-supervised learning using just 30 labels per class which got the same accuracy as a model trained using supervised learning which required ~1360 labels per class. This enabled their client to scale their prediction capabilities from 20 categories to 110 categories very quickly.

Reinforcement Learning

Reinforcement learning is for situations where you again don’t have labeled data sets, but you do have a way to telling whether you are getting closer to your goal (reward function). The classic children’s game hotter or colder (a variant of Huckle Buckle Beanstalk) is a good illustration of the concept. Your job is to find a hidden object, and your friends will call out whether you are getting “hotter” (closer to) or “colder” (farther from) the object. “Hotter/colder” is the reward function, and the goal of the algorithm is to maximize the reward function. You can think of the reward function is a delayed and sparse form of labeled data: rather than getting a specific “right/wrong” answer with each data point, you’ll get a delayed reaction and only a hint of whether you’re heading in the right direction.