The Physics of Language

thoughts on linguistics and related fields

Utilitarian versus Ranking Committees

In Manolo Martínez‘s recent talk at CUNY on the Rationalizing Role of Pains, he presented a model describing the decision-making process of a mind faced with multiple conflicting sensory messages of varying reliability. This model was by no means the focus of his talk, but it is what I’d like to focus on here.

I present an alternative model and then restrict myself to posing a few mathematical questions as to the relative power of the models. Questions of cognitive plausibility and interpretation are left unprovoked.

Please note that what follows is based on my imperfect recollection of his talk, and so all errors misinterpretations are my own.

Read the rest of this entry »


Unsupervised Image Descrambling and the Retina


One of the primary goals of contemporary neuroscience is the reverse-engineering of the brain’s functional architecture. In recent years our understanding has passed from largely descriptive to more and more functional, thanks in large part to the borrowing of ideas from computer science (especially information theory) and natural selection. Perhaps the best understood cortical design is the mammalian retina. For example, see V. Balasubramanian and P. Sterling’s paper which explain several aspects of retinal design using information- and selection-theoretic arguments in conjunction with computer simulation.

As the above paper demonstrates, it can be fruitful to reduce a complicated biological problem to a toy model and then ask basic mathematical questions about that model. This approach works so long as the model captures the essence of the biological feature of interest yet is simple enough to be analytically or computationally tractable.

In what follows I pose a simple machine learning problem, the answer to which may shed light on the relationship between natural selection and retinal design.

Read the rest of this entry »

Towards a (More) Biologically Plausible Neural Net

Of the many machine learning models, the artificial neural network (ANN) is of particular interest because of the obvious analogy to the function of the brain. However, the standard supervised cost function and error back-propagation algorithm are entirely implausible from a biological perspective, and in practice the performance of back-prop decreases sharply with the number of hidden-layers, requiring more and more labeled training examples which are often in short supply.

Read the rest of this entry »

The Neurolinguistic Paradigm Shift

As is the case in the physical sciences, I believe that linguistics proceeds in a pattern of punctuated equilibrium, those punctuations being paradigm shifts whereby the foundations of accepted theory are rewritten. In this way old mysteries become puzzles, explanations become more elegant, and new technological advances are enabled. Here I will briefly sketch the idea that generative linguistics should be reformulated in terms of quantitative neuroscience, and that such a reformulation will be immensely fruitful.

Read the rest of this entry »

Speech-Transcript Alignment


We describe a method for aligning a transcript to recorded speech, motivated by the need for alignment of audiobooks for use by SLA software. A combination of text-to-speech technology, mel-spectrum feature extraction, and dynamic time warping are employed to obtain a word-alignment for the input speech sample.

Read the rest of this entry »

Some Questions about Syntax

This week I have been reading up on syntax. I started with Baker’s The Atoms of Language, continued with Chomsky’s Syntactic Structures, and have just finished the syntax chapter of the textbook Contemporary Linguistics (3rd edition). I have learned a great deal thus far, but the more I read the more questions I have. I write these notes into the margins of my books, but I now feel the need to post the more salient questions for later reference, lest they get buried when I continue my linguistic inquiry.

In brief, my appetite for syntax has been whetted by Syntactic Structures, but now I hunger to know how to analyze all the language I come across. A more advanced syntax text should answer most of my specific questions. Then, more generally, I am becoming only more excited about the possibility of understanding grammar from a historical perspective and about reformulating grammar based on an understanding of how language actually works in the brain, although the time for the latter may yet be decades away.

Read the rest of this entry »

The Lomb Method

In a previous post I made some recommendations from among the established commercial language learning methods such as Berlitz, Pimsleur, Rosetta Stone, and a few course books. Over the past year I have branched out to experiment with the methods of various on-line linguaphile gurus, the likes of Piotr Wozniak, Tim Ferris, Mike Campbell, Prof Arguelles, and Katzumoto — all of whom are worth checking out. But just today I stumbled across the method of polyglot and simultaneous interpreter Kató Lomb, as explained in her book “This is How I Learn Languages”. I found Lomb’s book absolutely captivating. She makes a number of excellent points. Although her book is by no means a one-stop-shop for polyglottery, it is definitely worth a read for those aspiring to learn one or a few L2s.

I will give a few highlights of her book here.

Read the rest of this entry »

A Meta-Etymological Dict

Here is a very rough sketch of the idea:

motivation: aspiring polyglots face the huge challenge of massive vocabulary acquisition. Even in closely related languages, it is easy to miss cognates because of sound changes and different orthographies. I have found that using etymological information from dictionary look-ups has boosted my recall for learning from a single language — but I conjecture that this boost will carry over to other languages of interest if the dictionary would provide related words in every language of interest when they exist. However, in present form, this procedure requires searching through possibly 12 different books! I believe an electronic version would be a huge boon to the modern polyglot community.

Read the rest of this entry »

Syllables of Mandarin Chinese

Here is a list of the 3000 most commonly used characters in written Chinese, sorted by Mandarin reading, with tones ignored.

Some interesting points:

  • If one looks at only up to the 1000 most common characters, /shi/ is the most prevalent syllable. However, by 3000, /ji/ has clearly taken the lead.
  • The shape of the distribution is what I would call “Zipf’s law with a fat tail.”
  • Down in the “tail”, we find that there are many familiar characters with rather unpopular readings; e.g. ‘meng’, ‘neng’, ‘gei’, and ‘nv’.
  • Despite all of this syllable-level polysemy, the spoken language procedes without issue. This is primarily because of tones and the fact that many words in Chinese are two syllable, so the other syllable helps greatly to disambiguate. Also, the spoken language has a high context, which further aids in the disambiguation. However, things could get interesting if we were to work with a pinyin-based written language, especially if tones were not indicated!

Click ‘more’ to see the data.

Read the rest of this entry »

Adding Accents to Romanized Japanese

While working through Japanese the Spoken Language (JSL), I feel the need to make Anki cards for the spoken words that I’m having trouble recalling the meanings of. Now, I haven’t yet decided exactly what card format is best, but I was dreading having to type in accents on the romanized words. (See the introduction to JSL to see what these accents are.) So, I wrote a quick program to allow easy input of accented words.

This Python function makes it easy to write JSL-romanized Japanese text with accent markings. It works by taking as input the text with capitalization indicating high-pitch, and outputs JSL-type accent marked text. This is particularly useful when entering a list of spoken Japanese words into Anki.


“KYOo” ==> “kyôo”

“aTARASIi ZIsyo” ==> “atárasìi zîsyo”

“aSITA IKIMAsu yo” ==> “asíta ikimàsu yo”

Click on ‘More…’ to see the Python code.

Read the rest of this entry »