    nmw 17:40:42 on 2015/09/14 Permalink
    Fuzzy Language and Fuzzy Vocabulary 

    Since I wrote about words vs. names a couple weeks ago (see “Names vs. Words: Strings for Identity vs. Strings for Information“), I have had an uneasy feeling that something is not well… and I think maybe I have figured it out now.

    The problem is this: online, there is no clear demarcation of words vs. names. As I indicated in the post I linked to above, neither is this true in a strict sense offline. However, even though many dictionaries exist for each of the most common languages (and even though they differ in the vocabularies they document, how they document these vocabularies, etc.), there is nonetheless a somewhat reliable order … such that anyone can be expected to “look up” any word in any dictionary and get a more or less reasonable explanation. Part of becoming literate involves being able to use a dictionary — indeed: any dictionary (more or less). Of course there are dictionaries which are unusable (as they are not well researched), but they are exceptions, not the rule. Most people depend on the notion of some standard dictionary, and such standard dictionaries describe the standard language.

    As I wrote about 10 years ago in my first “Wisdom of the Language” article, languages will always be moving targets. We have to be able to deal with such “facts of life”.

    But upon reflecting on the juxtaposition of “strings for identity” (names) versus “strings for information” (words), I notice a much more severe issue: It leaves no room for dictionaries. In the back of my mind, I have reasoned that all of the registered strings in COM would make up the “commercial” dictionary, all of the strings in DE would make up the German dictionary, and so on. But each of these lists of registered strings also includes a significant number of brand names (in other words: “strings for identities”).

    How will we know whether a string has been registered for a specific identity or whether it is registered for informative purposes? My gut feeling “hunch” reaction is that there may very well be attributes of the website / content that more or less clearly categorize the string as this type or that type. It might go like this: The more evidence there is of a “grass roots” type of community involvement in how the content is managed, the more the string would tend to be a word used by that community. Less evidence of this, and more evidence of a “top down” authoritarian management of the content would point towards an individual or organization identifying himself / herself / itself with the string.

    I realize this seems rather wishy-washy. Maybe someday I will figure out something more clear, but until then I guess I will just have to cope with such fuzzy notions: fuzzy vocabulary and fuzzy language.


    nmw 08:43:05 on 2015/08/25 Permalink
    Names vs. Words: Strings for Identity vs. Strings for Information 

    There is a long-standing tradition of distinguishing names from words — although it is not formally codified (even language itself is not “set in stone”, but rather is undergoing constant evolution — much like a living being), it can be roughly said that whereas words are in the (or a) dictionary, names are strings listed on specialized lists (e.g. baby names, trademark names, names identifying a company, product, service, etc.

    The purpose of names is to identify, and ideally a name should uniquely identify something (or someone). In an ideal world, a name would signify just one thing and exclude everything else. Try to explain that to John or to Joe or to Jane Doe and you will quickly realize we do not live in an ideal world. It seems that so-called “last” names were introduced several hundred years ago in order to more specifically exclude other Johns and other Joes … and perhaps some day soon we will again have to figure out how to make names like Jane Doe uniquely refer to this Jane and not that Jane.

    Such thinking is what leads many to think that the most perfect name is a name which is the most extremely exclusive. Rare is good, unique is best.

    Names have some similarity to words in selecting particular things, but they are also crucially different. Certainly no one would ever want to confuse eating with sleeping, and to mix up sex with rex would make many actors and tyrransauri in heaven rather “most irate”. Language does aim to specify, but it does not aim to specify uniquely. We eat many times over, we sleep many times over, we do and experience all sorts of things over and over again — more or less.

    Therefore, although rare words exist (as do common words), I do not know of a single word that could be described as unique. Likewise, I very much doubt that a universal word ever existed (though it seems as though this is alleged to be the beginning of everything, as we find in the very first book the Bible, Genesis, the statement that “in the beginning, was the word”).

    Today, we have a much more refined understanding of language. We understand that languages are not formed by decree or by other kinds over government regulation or state control. Instead they evolve according to the needs, wishes, whims and rational preferences of living people — in a sort of evolutionary process guided by principles much like “supply” and “demand” (and also the physical dexterity of the vocal apparatus, etc.). In some ways, language may perhaps be understood to evolve in symbiosis with humans, rather than being a technology devised by humans.

    There is little doubt in my mind that so-called natural language is the most basic of all information technologies, especially if you include such messages as are conveyed by intonation, gesticulation, facial expressions, body language, etc. into your notion of natural language. The importance of this fact is usually overlooked when people discuss “information technology” (IT) today.

    The main point I wish to make with this post is the following: It is very important to discern between names and words. The information technology (IT) functions referred to as “search”, and also “community”-oriented information retrieval must rely heavily on words (as only words/language can function as the basis of communication). In sharp contrast, identity is the opposite of  what is commonly referred to as “social” — it must be exclusively private, and ideally it would actually be unique.


    nmw 13:20:03 on 2015/07/23 Permalink
    To Read or to Be Read 

    When I was a kid, I used to go to the library a lot… and read books. Before reading them, I would need to find them. For those of you unfamiliar with this process: This was the prototype for most search engines (back then, people studying this process went to “library schools”, graduate programs for “information science” — and the field specifically focused on what is today referred to “search”, back then it was called “information retrieval”).

    But reading is not really something “for specialists only”. Before graduating high school, regular folks also had to learn about publishing. For example: They needed to know that books have authors, that they were published by publishing companies, and so on.

    Online, titles, authors and publishing houses became domain names. There are also numbers which refer to computers — perhaps this is roughly equivalent to the way people would refer to specific shelves where specific books were stored (this system of naming shelves, which was used in the earliest libraries, would later give way to so-called “call-numbers”, a system whereby a book was given a specific sequential number where it could be found). The biggest difference between traditional libraries and the Internet is probably the fact that online, the cataloging and indexing systems are integrated into the same system as the writing that they catalog / index. Although professional abstracting and indexing services also published such volumes (which looked very much like “regular” books), and these books were usually also given call numbers, putting them on par with the more ordinary literature, the librarian was the person who made this decision… and the librarian was the person ultimately responsible for maintaining the catalog (and also for choosing what would be included in the library’s collection).

    I guess only quite novice users would assume that if something was not in the library (and/or the library’s catalogs) that it would not exist.

    Contrast that with today — where there is now an entire generation of kids who seem to believe that if something cannot be found in Google, that it doesn’t exist.

    Even though the rate of illiteracy today is quite astounding already, I now observe also that in recent years an entirely new trend is catching on. People are becoming ever less concerned with reading or writing or behaving as functionally literate persons. Instead: They are becoming more obsessed with being read… — meaning that someone (or some company) is able to trace their moves. Whereas it is becoming ever more rare for people the read or write anything resembling written texts (and/or “literature”), it is becoming ever more commonplace for people to clutch on to gadgets which track everything such quantified fetishists seem to place such a high value on. The typical quantified fetishist will feel much the same way about their gadget fetish as a democratic idealist might view the sanctity of the voting booth.

    In this milieu, there seems to also be a widespread belief that the companies collecting this data will share it publicly out of the warmness of their hearts.

    nmw 14:11:49 on 2015/03/02 Permalink
    Wisdom of the Language — Nooblogs Essay 

    Today I wrote another “Wisdom of the Language” essay. If you are not familiar with the Wisdom of the Language, then don’t fret: This essay will hopefully summarize all of the salient points and also bring you up to speed with respect to the difference between generic top level domains vs. proprietary top level domains.

