Researchers at MIT’s Computer Science and Artificial Intelligence Laboratory, or CSAIL, are working to breathe new life into dead languages with help from machine learning. Their new system can automatically decipher lost languages that can no longer be understood, and can do so without needing advanced knowledge of their relation to early forms of other languages, like Greek or Hebrew, for example.
Many languages are considered to be lost because there’s not enough knowledge about their grammar, vocabulary or syntax to be able to understand their texts.
“This is a true historical tragedy: Without these languages, we lose out on an entire body of knowledge about the people who spoke them,” MIT Ph.D. student Jiaming Luo said in a statement Wednesday.
Luo developed the system with MIT Professor Regina Barzilay. It uses an algorithm that relies on historical linguistic principles, such as the predictable ways languages evolve. The algorithm can find patterns and relate them back to the original linguistic principles. It can also assess the proximity between two languages. And when it’s tested on known languages, the system can also identify language families.
Going forward, the team hopes to expand its work to identify the semantic meaning of words, even if they’re not readable yet. It ultimately hopes to be able to resurrect lost languages using just a few thousand words.
“We often don’t sit back and realize that languages are kind of like species of animals,” Luo said. “They can evolve and grow, but they can also diminish and even die out.”