Natural Language Processing Lecture02

Natural Language Processing Lecture02

图搜索算法

Depth-first search (DFS)

  • Time Complexity
    • Some left prefix of the tree
    • If m is finite, takes time O(bm)
  • Space Complexity
    • Only has siblings on path to root, so O(bm)
  • Completion
    • No, can be infinite
  • Optimal
    • It finds the "leftmost" solution, regardless of depth of cost
  • BFS outperform BFS
    • The tree is deep and the answer is deep and frequent
    • It has advantages according to computing complexity

Breadth-first search (BFS)

  • Time Complexity
    • Processes all nodes above shallowest solution
    • Let depth of shallowest solution be s
    • Search takes time O(bs)
  • Space Complexity
    • Has roughly the last tier, so O(bs)
  • Completion
    • Yes
  • Optimal
    • Yes

  • BFS outperform DFS
    • The branch factor is relatively small
    • The depth of the optimal solution is relatively shallow
  • The required optimal condition of BFS
    • all costs between two nodes are positive or zero
    • sort the list used to maintain the searching history in every iteration

Similar to BFS under the optimal conditions.

Differences :

  • It sorts the list by the sum of the cost for initial node to current node and the estimated cost from current node to the goal.

    \[ total = cost(s,c) + estimated_cost(c,g)\]

  • In every iteration, it check all path and prune some paths that will never be the optimal results.

机器学习

介绍

Tom Mitchell, 1996: Machine learning is the study of how to make programs improve their performance on certain tasks from (own) experience.

  • performance = speed, accuracy
  • experience = earlier observation = data

Conclusion: use data to learn a model that has good performance on unseen data.

最优化(梯度下降)

定义原函数的代价函数,通过梯度下降(合适的学习率求偏导)来使得代价函数最小值,从而得到原函数的最优解。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
本次例题是用波士顿房价数据预测模型

from sklearn.datasets import load_boston
import random

dataset = load_boston()

x,y=dataset['data'],dataset['target']

# detaset.feature_names
# detaset['DESCR']
# x.shape = (306, 13)
# y.shape = (506, )

X_rm = x[:,5] # 我们只取13个属性中一个来做线性预测模型
plt.scatter(X_rm,y)

  1. 定义目标函数为:

    \[ y = k*rm + b\]

    1
    2
    3
    #define target function
    def price(rm, k, b):
    return k * rm + b
  2. 均方差为:

    \[ loss = \frac{1}{n} \sum{(y_i - \hat{y_i})}^2\]

    \[ loss = \frac{1}{n} \sum{(y_i - (kx_i + b_i))}^2 \]

    1
    2
    3
    # define loss function
    def loss(y,y_hat):
    return sum((y_i - y_hat_i)**2 for y_i, y_hat_i in zip(list(y),list(y_hat)))/len(list(y))
  3. 偏微分为:

    \[ \frac{\partial{loss}}{\partial{k}} = -\frac{2}{n}\sum(y_i - \hat{y_i})x_i\]

    \[ \frac{\partial{loss}}{\partial{b}} = -\frac{2}{n}\sum(y_i - \hat{y_i})\]

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    # define partial derivative
    def partial_derivative_k(x, y, y_hat):
    n = len(y)
    gradient = 0
    for x_i, y_i, y_hat_i in zip(list(x),list(y),list(y_hat)):
    gradient += (y_i-y_hat_i) * x_i
    return -2/n * gradient

    def partial_derivative_b(y, y_hat):
    n = len(y)
    gradient = 0
    for y_i, y_hat_i in zip(list(y),list(y_hat)):
    gradient += (y_i-y_hat_i)
    return -2 / n * gradient
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
#initialized parameters

k = random.random() * 200 - 100 # -100 100
b = random.random() * 200 - 100 # -100 100

learning_rate = 1e-3

iteration_num = 200
losses = []
for i in range(iteration_num):

price_use_current_parameters = [price(r, k, b) for r in X_rm] # \hat{y}

current_loss = loss(y, price_use_current_parameters)
losses.append(current_loss)
print("Iteration {}, the loss is {}, parameters k is {} and b is {}".format(i,current_loss,k,b))

k_gradient = partial_derivative_k(X_rm, y, price_use_current_parameters)
b_gradient = partial_derivative_b(y, price_use_current_parameters)

k = k + (-1 * k_gradient) * learning_rate
b = b + (-1 * b_gradient) * learning_rate

best_k = k
best_b = b
1
plt.plot(list(range(iteration_num)),losses)

1
2
3
4
price_use_best_parameters = [price(r, best_k, best_b) for r in X_rm]

plt.scatter(X_rm,y)
plt.scatter(X_rm,price_use_current_parameters)

# NLP

Comments

Your browser is out-of-date!

Update your browser to view this website correctly. Update my browser now

×