# Machine Learning in Action Chapter04 Classifying with probability theory: naive Bayes

## 基于贝叶斯决策理论的分类方法

• 优点：在数据较小的情况下，仍然有效，可以处理多类别问题
• 缺点：对于输入数据的准备方式较为敏感
• 使用数据类型：标称型数据

# Machine Learning in Action Chapter02 Classifying with k-Nearest Neighbors

## KNN 概述

KNN 算法是测量不同特征之间的距离来进行分类的算法。

• 优点：精度高，对异常值不敏感，无数据输入假定
• 缺点：计算复杂度高，空间复杂度高
• 使用数据范围：数值型和标称型

# Machine Learning Week11 Application Example Photo OCR

## Photo OCR

### Problem Description and Pipeline

#### What is photo OCR problem?

• Photo OCR = photo optical character recognition
• With growth of digital photography, lots of digital pictures
• One idea which has interested many people is getting computers to understand those photos
• The photo OCR problem is getting computers to read text in an image
• Possible applications for this would include
• Make searching easier (e.g. searching for photos based on words in them)
• OCR of documents is a comparatively easy problem
• From photos it's really hard

# Machine Learning Week10 Large Scale Machine Learning

## Gradient Descent with Large Datasets

### Learning With Large Datasets

#### Why large datasets?

• One of best ways to get high performance is take a low bias algorithm and train it on a lot of data

• e.g. Classification between confusable words
• We saw that so long as you feed an algorithm lots of data they all perform pretty similarly
• So it's good to learn with large datasets

# Machine Learning Week9 Anomaly Detection

## Density Estimation

### Problem Motivation

• We have a dataset which contains normal(data)
• How we ensure they're normal is up to us
• In reality it's OK if there are a few which aren't actually normal
• Using that dataset as a reference point we can see if other examples are anomalous
• First, using our training dataset we build a model

• We can access this model using p(x)
• This asks, "What is the probability that example x is normal"

# Machine Learning Week8 Unsupervised Learning

## Clustering

### Unsupervised Learning Introduction

• What is clustering good for
• Market segmentation - group customers into different market segments
• Social network analysis - Facebook "smartlists"
• Organizing computer clusters and data centers for network layout and location
• Astronomical data analysis - Understanding galaxy formation###

# Machine Learning Week7 Support Vector Machines

## Large Margin Classification

### Optimization Objective

#### An alternative view of logistic regression

• Begin with logistic regression, see how we can modify it to get the SVM

• With hθ(x) close to 1, (θTx) must be much larger than 0
• With hθ(x) close to 0, (θTx) must be much less than 0