
Introduction
Electroencephalography (EEG) has emerged as a powerful tool for non-invasive brain-computer interface (BCI) applications, particularly in decoding speech directly from neural signals. This project explores the feasibility of translating EEG data into intelligible sentences by classifying phonemes—fundamental units of sound in speech. Inspired by the work of Moreira et al. and Zhao & Rudzicz, the motivation for this project is rooted in accessibility: enabling individuals who are unable to verbally communicate to express themselves through thought-driven BCI systems. Using the EEG dataset for speech decoding by Moreira et al., this project aims to classify both consonant and vowel phonemes from EEG readings and parse them into words and sentences.


Methods
The project uses EEG data from Moreira et al.’s open-access dataset, which consists of recordings from subjects who listened to and repeated six consonants and five vowels, with data collected through 64-channel EEG caps under both standard and transcranial magnetic stimulation (TMS) conditions. Preprocessing was conducted using the open-source code provided by Moreira et al., allowing for streamlined extraction of clean EEG data suitable for model training.
Following preprocessing, the data was feature engineered into a structured format, incorporating information such as the stimulus type, TMS conditions and targets, articulatory features including place, manner, and voicing, as well as the phonological category. From this, two phonemes—phoneme1 and phoneme2—were combined into a single output label used for classification.
Two deep learning models were developed to classify phonemes from the EEG input. The first was a multilayer perceptron (MLP) trained with 7 hidden units, a dropout rate of 0.1, a learning rate of 0.0005, a batch size of 256, and trained over 100 epochs. The second was a convolutional neural network (CNN), which used 5 convolutional channels followed by 2 fully connected layers, with a dropout rate of 0.3, a learning rate of 0.0001, a batch size of 64, and also trained for 100 epochs. Both models used an 80/20 train-validation split to evaluate performance. After classification, the predicted phonemes were parsed into words using the NLTK library and joined to form simple sentences.


Results
The CNN model achieved the best performance, reaching an accuracy of 70.87% on the phoneme classification task. The MLP model also performed reasonably well, achieving an accuracy of 63.62%. These results are particularly promising given the limitations of the dataset and lack of a live EEG device for real-time testing. However, the dataset itself presents a key limitation: it does not include all phonemes present in the English language. This makes it impossible to map EEG signals to the full vocabulary of the language, thus limiting the expressiveness of the system. In addition, the lack of a user acceptance testing environment means the system has not yet been validated in real-world scenarios, though it is able to accept EEG input in the same format as the dataset and return predicted phoneme sequences parsed into structured, intelligible sentences.


Conclusion
This project demonstrates the early-stage feasibility of decoding speech from EEG data for the purpose of assistive communication. By combining EEG preprocessing, feature engineering, and deep learning through both MLP and CNN architectures, it is possible to classify phonemes with strong accuracy and parse them into sentences. Although the system is limited by the coverage of the dataset and the absence of a real-time EEG testing environment, the results suggest that with a more comprehensive dataset and further development, this approach could form the basis of a powerful BCI communication tool for individuals with speech impairments.

