算法岗面试题

深信服算法岗

选择自己觉得做的最好的项目从需求、样本、特征工程、模型选择等进行项目介绍
模型融合的优势以及方法。
优势：随着集成中个体分类器数目T 的增大，集成的错误率将指数级下降，最终趋向于零。
方法（基学习器是否存在强依赖）：bagging（优化方差），boosting（优化偏差），stacking，blending
会哪些机器学习算法：决策树，朴素贝叶斯分类，最小二乘法，逻辑回归，支持向量机，集成方法；聚类算法，主成分分析，奇异值分解，独立成分分析，
LR和SVM的区别：都是分类算法；若不考虑核函数，分类决策面都是线性的；都是监督学习算法；都是判别模型
区别：
1. 1）损失函数不同，LR损失函数，SVM损失函数
2. 2） SVM 只考虑局部的边界线附近的点，LR 考虑全局，远离的点对边界线的确定也起作用
3. 3）在解决非线性问题时，SVM 采用核函数的机制，而 LR 通常不采用核函数的方法
4. 4）线性 SVM 依赖数据表达的距离测度，所以需要先对数据做 normalization, LR 则不受影响。
5. 5） SVM损失函数自带正则。LR需要添加正则项
白样本多，黑样本少，选择哪个模型更加合适
决策树往往对类别不均衡的数据表现不错；使用SVM，其超平面的选取只与支持向量有关。解决方法：数据重采样，扩充数据集（数据合成），加权（为样本数较少的损失函数设置较大的权重），更换其他分类标准
LR过拟合是什么样的情形，如果样本有很多重复的特征，对于LR训练效果有没有影响
LR过拟合：过度拟合训练数据，对测试数据表现不友好。降低维度（PCA算法降维），正则化（给代价函数添加正则项）。LR处理重复特征：较为敏感，例如两个高度相关自变量同时放入模型，可能导致较弱的一个自变量回归符号不符合预期，符号被扭转。需要利用因子分析或者变量聚类分析等手段来选择代表性的自变量，以减少候选变量之间的相关性；（逐步回归分析常用与解决多重共线性）
深度学习模型的初始化要注意哪些问题：权重随机初始化，避免网络退化，初始化范围要小，缩小样本空间，输入样本要BN，防止梯度消失
问了给一本圣经和一个单词，如何统计圣经中单词出现的次数(mapreduce)：grep –o 单词文件名|wc -l
怎么做word2vec，rmsprop优化器底层的原理（和动量方法类似）
kmeans原理，优缺点，如何不自己设置k就能知道k取多少
原理：对于给定的样本集，按照样本之间的距离大小，将样本集划分为K个簇。让簇内的点尽量紧密的连在一起，而让簇间的距离尽量的大。
优缺点：优（原理简单，实现方便，收敛速度快，聚类效果较优，模型的可解释性较强，调参只需要调整k）；缺（k的取值不好把控，非凸数据集难收敛，类型不平衡的数据聚类效果不好，得到局部最优[迭代]，对于噪音和异常点敏感）
K值的确定：手肘法（误差平方和）；轮廓系数（凝聚度a和分离度b）
grid search：网格搜索，一种调参手段
nlp现在发展：单个学习器，集成学习，强化学习；完全监督，半监督，无监督，迁移学习
overfitting解决，如何从100个特征中选20个特征
过拟合：使用更小的batch size，正则化模型，减少模型参数，扩大数据集，提早结束训练，droupot，batch normalize，集成学习。特征选择：过滤式（通过对每一维进行打分选择高分特征），包裹式（通过启发算法将特征子集的选择看成是搜索寻优问题），嵌入式（在确定模型的过程中，挑选出那些对模型的训练有重要意义的特征，类似回归）
垃圾短信多分类任务(如何分开发票，广告，商铺信息等)：SVM
L1正则和L2正则的区别：正则化的本质：加入惩罚项，约束优化空间
L1和L2的区别：他们都可以看做是损失函数的惩罚项，L1正则添加了Laplace先验（结构化风险项）是一个一范数；L2正则添加了Gaussian先验，是一个二范数。使用L1范数，可以使得参数稀疏化（权重稀疏），用L2范数，倾向于使参数稠密地接近于0（权重平滑）。L1减少的是一个常量，L2减少的是权重的固定比例
Python相关面试题（迭代器，字典的键的要求[可hash对象，不可变类型，比如数字，字符串，元组]）
迭代器：只要一个对象有__iter__方法和__next__方法，那么这个对象就可以叫做迭代器。对一个可迭代对象调用它的__iter__方法，得到的就是迭代器对象。
优缺点：优（不依赖索引，惰性计算，节省内存）；缺（不如按索引取值方便；一次性，只能往后取，不能往回退）

原文链接：https://blog.csdn.net/qq_27095007/article/details/100087399

深信服提前批C++面试题

1、小强想要抓兔子，但狡兔三窟。·························

这题应该用递归做：

#include "stdafx.h"
#include <iostream>
#include <algorithm>
using namespace std;
#define K 1000
int getmaxlivedays(int cur_pos, int cur_days, int tot_holes, int tot_days, int days_catch[]) {
	if (cur_pos == days_catch[cur_days])
		return 0;
	else {
		if (cur_days + 1 >= tot_days)
			return 1;
		int left = 0, right = 0;
		if (cur_pos - 1 >= 0)
			left = getmaxlivedays(cur_pos - 1, cur_days + 1, tot_holes, tot_days, days_catch) + 1;
		if (cur_pos + 1 < tot_holes)
			right = getmaxlivedays(cur_pos + 1, cur_days + 1, tot_holes, tot_days, days_catch) + 1;
		return
			left > right ? left : right;
	}
}
int main()
{
	int n, k;
	while (cin >> n >> k) {
		int days_catch[K];
		for (int i = 0; i < k; i++) {
			int temp;
			cin>>temp;
			days_catch[i] = temp - 1;
		}
		int maxlivedays = 0;
		for (int i = 0; i < n; i++) {
			maxlivedays = max(maxlivedays, getmaxlivedays(i, 0, n, k, days_catch));
		}
		if (maxlivedays < k)
			cout << "Yes" << endl;
		else
			cout << "No" << endl;
	}
    return 0;
}

2、小明的个人网站开业了。小明决定给最先访问网站的10个用户派发礼品。·············

题目中有一句，“有50%的输入数据满足：1<=n<=1000”,可能是因为我题目没截全，不知道有啥用。

#include "stdafx.h"
#include <iostream>
#include <set>
using namespace std;
#define N 100005
int main()
{
	int n;
	while (cin>>n)
	{
		int num[N];
		for (int i = 0; i < n; i++) {
			cin >> num[i];
		}
		set<int> user;
		for (int i = 0; i < n; i++) {
			user.insert(num[i]);
			if (user.size() == 10)
				break;
		}
		cout << user.size() << endl;
		set<int>::iterator ite1 = user.begin();
		set<int>::iterator ite2 = user.end();
		for (; ite1 != ite2;ite1++) {
			cout << *ite1 << endl;
		}
	}
    return 0;
}

3、小明帮数学老师出试卷，················，让这些题目的分数总和正好是100分。

还是有一句“有50%的输入数据满足：1<=n<=3”不懂，数据保证只有唯一解是一定有唯一解得意思吗？

#include "stdafx.h"
#include <iostream>
#include <vector>
using namespace std;
#define N 10
int main()
{
	int n;
	while (cin >> n) {
		int score[N];
		for (int i = 0; i < n; i++) {
			cin >> score[i];
		}
		int sum = 0;
		vector<int> num;
		int j = 1 << n;
		while (j > 0) {
			j--;
			for (int i = 0; i < n; i++) {
				int label = (1 << i) & j;
				bool tag = (1 << i) & j != 0;
				if (((1 << i) & j) != 0) {
					sum = sum + score[i];
					num.push_back(i);
				}
			}
			if (sum == 100)
				break;
			else {
				sum = 0;
				num.clear();
			}
		}
		cout << num.size() << endl;
		vector<int>:: iterator iter1 = num.begin();
		vector<int>:: iterator iter2 = num.end();
		for (; iter1 != iter2; iter1++) {
			cout << *iter1 + 1 << endl;
		}
	}
    return 0;
}

4、最长重复子串的长度。

#include "stdafx.h"
#include <iostream>
#include <algorithm>
using namespace std;
#define L 10005
bool isdupli(int start, int end, char s[L]) {
	if (end <= start)
		return false;
	int len = end - start;
	if (end + len > strlen(s))
		return false;
	int i = 0;
	while (i < len) {
		if (s[start + i] != s[end + i])
			break;
		i++;
	}
	if (i == len)
		return true;
	else
		return false;
}
int main()
{
	char s[L];
	while (cin >> s) {
		int length = 0;
		for (int i = 0; i < strlen(s); i++) {
			for (int j = strlen(s) - 1; j >= 0; j--) {
				if (isdupli(i, j, s) == true) {
					length = max(length, j - i);
				}
			}
		}
		cout << 2 * length << endl;
	}
    return 0;
}

5、给定1到n的一个排列，至少需要几次翻转能使得数组从小到大排列。
数逆序对。

#include "stdafx.h"
#include <iostream>
using namespace std;        
int main()
{
	int n;
	int num[9];
	while (cin >> n) {
		for (int i = 0; i < n; i++) {
			cin >> num[i];
		}
		int count = 0;
		for (int i = 0; i < n - 1; i++) {
			for (int j = i + 1; j < n; j++) {
				if (num[i] > num[j])
					count++;
			}
		}
		cout << count << endl;
	}
    return 0;
}

————————————————

深信服算法岗

原文链接：https://blog.csdn.net/qq_27095007/article/details/100087399

深信服提前批C++面试题

1、小强想要抓兔子，但狡兔三窟。·························

2、小明的个人网站开业了。小明决定给最先访问网站的10个用户派发礼品。·············

3、小明帮数学老师出试卷，················，让这些题目的分数总和正好是100分。

4、最长重复子串的长度。

原文链接：https://blog.csdn.net/allenqian0531/article/details/81070842