Machine Learning in Action Chapter07 Improving classification with the AdaBoost meta-algorithm




Machine Learning in Action Chapter05 Logistic regression


(1) 收集数据:采用任意方法收集数据。  
(2) 准备数据:由于需要进行距离计算,因此要求数据类型为数值型。另外,结构化数据 格式则最佳。
(3) 分析数据:采用任意方法对数据进行分析。
(4) 训练算法:大部分时间将用于训练,训练的目的是为了找到最佳的分类回归系数。
(5) 测试算法:一旦训练步骤完成,分类将会很快。
(6) 使用算法:首先,我们需要输入一些数据,并将其转换成对应的结构化数值; 接着,基于训练好的回归系数就可以对这些数值进行简单的回归计算,判定它们属于 哪个类别;在这之后,我们就可以在输出的类别上做一些其他分析工作。

Machine Learning in Action Chapter04 Classifying with probability theory: naive Bayes



  • 优点:在数据较小的情况下,仍然有效,可以处理多类别问题
  • 缺点:对于输入数据的准备方式较为敏感
  • 使用数据类型:标称型数据

Machine Learning in Action Chapter03 Splitting datasets one feature at a time: decision trees


分类决策树模型是一种描述对实例进行分类的树形结构。决策树由结点(node)和有向边(directed edge)组成。结点有两种类型:内部结点(internal node)和叶结点(leaf node)。内部结点表示一个特征或属性(features),叶结点表示一个类(labels)。


Machine Learning in Action Chapter02 Classifying with k-Nearest Neighbors

KNN 概述

KNN 算法是测量不同特征之间的距离来进行分类的算法。

  • 优点:精度高,对异常值不敏感,无数据输入假定
  • 缺点:计算复杂度高,空间复杂度高
  • 使用数据范围:数值型和标称型

Machine Learning Week11 Application Example Photo OCR

Photo OCR

Problem Description and Pipeline

What is photo OCR problem?

  • Photo OCR = photo optical character recognition
    • With growth of digital photography, lots of digital pictures
    • One idea which has interested many people is getting computers to understand those photos
    • The photo OCR problem is getting computers to read text in an image
      • Possible applications for this would include
        • Make searching easier (e.g. searching for photos based on words in them)
        • Car navigation
  • OCR of documents is a comparatively easy problem
    • From photos it's really hard

Machine Learning Week10 Large Scale Machine Learning

Gradient Descent with Large Datasets

Learning With Large Datasets

Why large datasets?

  • One of best ways to get high performance is take a low bias algorithm and train it on a lot of data

    • e.g. Classification between confusable words
  • We saw that so long as you feed an algorithm lots of data they all perform pretty similarly
  • So it's good to learn with large datasets

Machine Learning Week9 Anomaly Detection

Density Estimation

Problem Motivation

  • We have a dataset which contains normal(data)
    • How we ensure they're normal is up to us
    • In reality it's OK if there are a few which aren't actually normal
  • Using that dataset as a reference point we can see if other examples are anomalous
  • First, using our training dataset we build a model

    • We can access this model using p(x)
      • This asks, "What is the probability that example x is normal"

Machine Learning Week8 Unsupervised Learning


Unsupervised Learning Introduction

  • What is clustering good for
    • Market segmentation - group customers into different market segments
    • Social network analysis - Facebook "smartlists"
    • Organizing computer clusters and data centers for network layout and location
    • Astronomical data analysis - Understanding galaxy formation###

Machine Learning Week7 Support Vector Machines

Large Margin Classification

Optimization Objective

An alternative view of logistic regression

  • Begin with logistic regression, see how we can modify it to get the SVM

    • With hθ(x) close to 1, (θTx) must be much larger than 0
    • With hθ(x) close to 0, (θTx) must be much less than 0
Your browser is out-of-date!

Update your browser to view this website correctly. Update my browser now