Control Engineering of China ›› 2019, Vol. 26 ›› Issue (11): 2013-2018.

Previous Articles     Next Articles

Research of Modeling with Small Sample for Complex Problem

  

  • Online:2019-11-20 Published:2023-11-29

一种面向复杂问题的小样本建模方法研究

  

Abstract: The problem of small sample size for machine learning is caused by the test cost of complex production, and experiment should be well designed to maximize the information under the size constraint of data set. This paper prompts a sample selection method for multiple linear regression (MLR): Hamming distance is used to evaluate the similarity of samples and depth-first strategy is employed to generate a data set with specified size by max-min Hamming distance, and the selected data set is used to evaluate the generalization performance. Finally a case of high pressure turbine disc design is used to verify this strategy, the result shows that the proposed strategy reduces experiment cost with necessary accuracy.

Key words: Small sample learning, design of experiments (DOE), Hamming distance, generalization performance

摘要: 科学研究中常因过高的试验成本导致机器学习的小样本问题,其难点在于数据集信息不足以描述原始问题的全部特征,因此需要根据学习算法特点精心安排试验以最大化小样本数据集的信息量。针对多元线性回归建模提出了一种样本选择方法:依托正交设计变量的水平特性,采用汉明距离评估试验样本相似性,以样本集最小汉明距离表征数据集的偏差;根据回归建模的样本量最低需求,采用深度优先算法建立最大最小汉明距离样本集以建立回归模型。最后以航空发动机高压涡轮盘为例验证该方法有效性,试验证明这种样本选择策略能在保证建模精度前提下降低试验费用。

关键词: 小样本学习, 试验设计, 汉明距离, 泛化能力