Abstraction via Repetition
In a fastAI class, I asked the following:
Do any language models attempt to provide meaning? For instance, “I’m going to the store” is the opposite of “I’m not going to the store.” Or “I barely understand this stuff” and “That ball came so close to my ear I heard it whistle” both contain the idea of something almost happening, being right on the border. Is there a way to indicate this kind of subtlety in a language model?”
And the answer came back:
Yeah, absolutely our language model will have all of that in it, or hopefully it will have it, or learn about it. We don’t have to program that. The whole point of machine learning is it learns it for itself, but when it sees a sentence like ‘Hey careful that ball nearly hit me,’ the expectation of what word is going to happen next is going to be different to the sentence, ‘Hey that ball hit me!” So, so yeah, language models generally you see in practice tend to get really good at understanding all of these nuances of, of English or whatever language it’s learning about.
Let’s take a closer look:
WHAM.
He had stayed still a second too long. The Bludger had hit him at last, smashed into his elbow, and Harry felt his arm break.

It is possible to extract sense from the above by doing a kind of simultaneous transation from words to meanings. Meanings expressed, that is, by other words. Notice that it isn’t a one- or two- or n-word regular affair. Meanings are contained in variable-length strings; sometimes the meaning can be provided by a single word; sometimes it will require more. The general rule of thumb is: what would it take to explain to another human being?
Automated scan-labeling is what we’re after. Scanning over various-length ideas, mathematically represented, also multi-labeled. It’s like a bear’s eye, a bear’s pupil, a bear’s angular contrast diagonal (at a low level); a bear’s fur, a bear’s head, etc. An enumeration of all the properties of a bear, adding up to a bear. But this is done via averaging, via approximation. Recurrent averaging is what allows the process of machine generalization in the first place.
What follows are some speculative takes on related concepts.
Plunging and Control
He plunged downward. Downward — directional, with gravity (z-axis)
Plunged. Likely implies ‘without control.’ ‘Fell is without control. ‘Dived’ suggests control. ‘hurtled’ could be either. ‘Streaked’ usually (not always) suggests control. Control/intention, intent.
Loss of control, accident, helpless.
Few-shot. So if we provide 30 examples of excess or overflow, would it be able to recognize it elsewhere?
Overwhelmed
He had waited a moment too long
He’d left the water running and it had spilled onto the floor
I’m overloaded
I’m exhausted (ie I’ve overstepped my energy limits)
I passed him on the fourth lap (exceeded)
Beyond all expectations
I did too much
I overdid it
“Yes!” he blurted out. It was not his place to speak. A stunned silence fell over the room.
swamped
buried
inundated
I’m older than he is
He’s a megalomaniac
You’re too old!
You’ve really gone above and beyond
You broke the record!
Numericize the following, which are both concepts drawn from words, and words themselves. What we are interested in is whether the abstracted concepts can be numericized :
1 = overflow
2 = speech
3 = physical effort
4 = distance (measured within space)
5 = time
6 = speed
7 = certainty
8 = doubt (any question has doubt (needs to resolve). This is a reason not to remove punctuation marks from text corpora. For instance “Is this any reason to doubt his word? I think not” would otherwise become “is this any reason to doubt his word I think not.” We can understand it, but we are then filling in the meaning that has been removed.
9 = need/requirement
10 = desire (want, crave, etc.)
11 = internal personal state(dreamed, thought, loved, cried, laughed, imagination) Actually it’s just personal state, with multilabel potential. Cried is both internal and externally visible. So the general category is ‘human’
13 = positive
14 = negative
Doubt is a subset of lacking. Physical is a scoop missing, an incompleteness. Musical corollary would suggests a dominant seventh chord (needs to resolve). A G7 chord in blues in the key of C.
And then how about addition and subtraction a la word2vec?
Thought + sleep = dream
Space + measurement device = distance
Movement through space from point A to point B = distance
Space + ruler = distance (or is it space * ruler = distance?)