Trabajo de grado - Maestría
Segment-based feature proposal for the morphological classification of T Tauri star light curves
Fecha
2023-07-14Registro en:
instname:Universidad de los Andes
reponame:Repositorio Institucional Séneca
Autor
León, Benjamín
Institución
Resumen
T Tauri stars are young stellar objects that exhibit a wide range of
morphological variability in their light curves, product of multiple physical
processes. Numerical features can be designed to identify the characteristics
of these brightness variations. The identification process is automatized in
the literature through the use of machine-learning algorithms.
This Master thesis aims to utilize supervised machine-learning algorithms
for morphological classification of T Tauri stars with a set of proposed fea-
tures based on segmentation strategies. The features are tested on an ex-
ternal data set for evaluation and the best parameters for classification are
discussed.
We evaluate the features through the Welch t-test, the Mann-Whitney
U-test and the Levene test for equal variance. After the testing, seven algo-
rithms are trained with light curves from the Orion star formation complex
obtained from the TESS project. The algorithms are subject to sequential
reduction of feature space, hyper parameter grid search and recurrent im-
portance calculations to optimize classification results in terms of F1 score
and Cohen kappa. The optimized algorithms are then applied to a sample
of over 2000 hand-classified confirmed T Tauri stars.
In this work, we propose 61 features based upon robust statistics, pseudo-
time-series analysis and auto-correlation measurements. These features were
utilized in 13 implementations of binary and multi-class classifiers and opti-
mized taking advantage of a high-performance cluster. We implement statis-
tical tests as feature evaluation and light curve filtering strategies innovative
to astronomical feature design. The highest achieving features on the test-
ing data set were analyzed individually and were conceptually connected to
physical processes, signal crowding and systematic effects. The algorithms
obtained F1 scores higher than 0.4 for all classes with maximum feature
dimension of 10.
This work contributes a new set of useful features that consistently
achieve high importance when compared to features used in the litera-
ture. Innovative feature design, evaluation stages and algorithm optimiza-
tion pipelines were implemented.