Tagged: data Toggle Comment Threads | Keyboard Shortcuts

  • Profile photo of nmw

    nmw 18:49:07 on 2016/06/25 Permalink
    Tags: , , , , , , citation analysis, , , , data, few, , , , , , , , many, , , sample, sampling, , , ,   

    Don’t Listen to One Single Piece of Good Advice — Listen to Many 

    Several months ago, I mentioned on one of my other blogs that I enjoy listening to Gretchen Rubin’s „Happier“ podcast. I still do, even though I think content sponsored by advertising is by and large fake.

    Recently, Gretchen (and Elizabeth) asked her (their) listeners what the best piece of advice was that they ever got. I responded (they asked for people to phone in their comments – I think my remarks may have arrived a little too late for episode 70, but perhaps they might appear in episode 71(?).

    This was the gist of my message: Don’t Listen to One Single Piece of Good Advice — Listen to Many!

    This is also something Jason Calacanis mentioned in a recent episode of his „This Week in Startups“ podcast, but I can’t remember which one – that you should never rely on just one source of information. I remember thinking as I listened to Jason (and of course I had heard such advice decades before from many of my school teachers): „does that mean if you search for information you will not only listen to Google?“ Stange as it may seem, my hunch is that for the vast majority of the population, this is not the case. Indeed, my experience has been that most people will only search for information using Google’s algorithms – if they do not see anything that appeals to them via Google, they will assume that no such thing exists.

    Incidentally, there is also another kind of parochialism that I feel is closely related to this fanatical belief in Google’s scoring algorithm. In a recent episode published by HBR’s „Ideacast” podcast, Todd Rose was interviewed about a book he had recently written (” The End of Average: How to Succeed in a World That Values Sameness”) about measurement and statistics. His argument echoes something I have long held to be true (and I think I recall that one of my comments regarding this matter also appeared on a German radio program – perhaps 5 or more years ago).

    Oddly, Google fan-boys (and fan-girls, too, of course) often overlook the fact that Google also ranks results according to such „cooked“ statistics. In fact the situation is even worse: when Google calculates its metrics for websites, then those metrics are applied regardless of how relevant they are (or aren’t). So while SAT scores attempt to measure both mathematical ability and verbal ability, Google’s statistical measurement for quality (which was shown to be totally bogus decades ago) is applied whether or not the source is reliable for the search query. It is essentially a „one-size fits-all“ metric (which also happens to be totally unreliable). Yet very few people really care, because most people use Google mainly to search for domain names anyways (in other words: they „search“ for ebay because they are too lazy to type in ebay.com. I bet if people stopped doing that, then the reduction in energy required might actually reduce global warming significantly! 😉

     
  • Profile photo of nmw

    nmw 17:48:26 on 2016/05/31 Permalink
    Tags: , , , , Bible, , , , data, file, file name, file names, filename, filenames, files, graphical user interface, GUI, hardware, HCI, , human-computer interaction, , , , , , , , text, , ,   

    The Ubiquity of the Text Box (excursus) 

    One of my favorite authors in the field of „search“ is John Battelle. Although he was not trained in the field of information science or information retrieval, his experience in the fields of journalism and publishing at the cusp of the so-called „information revolution“ apparently led him to learn many things sort of by osmosis.

    One of my favorite ideas of his is the way he talks about human-computer interaction. Initially, this was almost exclusively text-based. Then, he notes, with the advent of „graphical user interfaces“ (GUIs), computers became more and more instruments with which humans, would point at stuff. He has presented this idea quite often, I don’t even know which presentation I should refer, link or point to – which one I should index.

    In the early days of search, the book was ubiquitous. Indeed, several hundred years ago it almost seems as though each and every question could be answered with one single codex – and this codex was called „Bible“ (which means, essentially, „the books“). We have come a long way, baby. Today, we might say that online, the text box is king“ (Tom Paine, eat your heart out! 😉 ).

    Although computer manufacturers desparately try to limit the choices consumers have once they have acquired their machines with loads of previously installed (and usually highly sponsored) software, it will not be very long before the typical consumer is confronted with a text box in order to interact with his or her mish-mash of hardware and software. Even without typing out any text whatsoever, whenever a human presses on a button to take a picture or clicks on an icon to record an audio or video, the associated files are given a text-string filename by the gizmo machinery. All of the code running on each and every machine is written out in plain text somewhere. When computers write their own Bible, it is quite probable that they would start off with something like „In the beginning was the text, and it was human.“

    If humans ever asked an „artificially intelligent“ computer a question like „what is love?“ the computer would probably be very hard-pressed not to respond „a four-letter word“.

     
  • Profile photo of nmw

    nmw 19:04:09 on 2016/05/27 Permalink
    Tags: , , artificial languages, , , , , , data, emerge, , human intelligence, , , , , , , , , , , , , , , , , , traing set, traing sets, , , , ,   

    Literacy and Machine Readability: Some First Attempts at a Derivation of the Primary Implications for Rational Media 

    Online, websites are accessed exclusively via machine-readable text. Specifically, the character set prescribed by ICANN, IANA, and similar regulatory organizations consists of the 26 characters of the latin alphabet, the „hyphen“ character and the 10 arabic numbers (i.e. The symbols / zyphers 0-9). Several years ago, there was a move to accommodate other language character sets (this movement is generally referred to as „Internationalized Domain Names“ [IDN]), but in reality this accommodation is nothing more than an algorithm which translates writing using such „international“ symbols into strings from the regular latin character set, and to used reserved spaces from the enormous set of strings managed by ICANN for such „international“ strings. In reality, there is no way to register a string directly using such „international“ characters. Another rarely mentioned tidbit is that this obviously means that the set of IDN strings that can be registered is vastly smaller than strings exclusively using the standardized character set approved for direct registration.

    All of that is probably much more than you wanted to know. The „long story short“ is that all domain names are machine readable (note, however, that – as far as I know – no search engine available today on the world-wide-web uses algorithms to translate IDN domain name strings into their intended „international“ character strings). All of the web works exclusively via this approved character set (even the so-called „dotted decimals“ – the numbers which refer to individual computers [the „servers“] – are named exclusively using arabic numerals, though in reality are based on groups of bits: each number represents a „byte“-sized group of 8 bits… in other words: it could be translated into a character set of 256 characters. In the past several years, there has also been a movement to extend the number of strings available to accommodate more computers from 4 bytes (commonly referred to as Ipv4 or „IP version 4“) to 6 bytes (commonly referred to as Ipv6 or „IP version 6“), thereby accommodating 256 x 256 = 65536 as many computers as before. Note, however, that each computer can accommodate many websites / domains, and the number of domain names available excedes the number of computers available by many orders of magnitude (coincidentally, the number of domain names available in each top level domain [TLD] is approximately 1 x 10^100 – in the decimal system, that’s a one with one hundred zeros, also known as 1 Googol).

    Again: Very much more than you wanted to know. 😉

    The English language has a much smaller number of words – a very large and extensive dictionary might have something like 100,000 entries. With variants such as plural forms or conjugated verb forms, that will still probably amount to far less than a million possible strings – in other words: about 94 orders of magnitude less than the number of strings available as domain names. What is more, most people you might meet on the street probably use only a couple thousand words in their daily use of „common“ language. Beyond that, the will use even fewer than that when they use the web to search for information (for example: instead of searching for „sofa“ directly, they may very well first search for something more general like „furniture“).

    What does „machine readable“ mean? It means a machine can take in data and process it algorithmicly to produce a result – you might call the result „information“. For example: There is a hope that machines will someday be able to process strings – or even groups of strings, such as this sentence – and be able to thereby derive („grok“ or „understand“) the meaning. This hope is a dream that has already existed for decades, but the successes so far have been extremely limited. As I wrote over a decade ago (in my first „Wisdom of the Language“ essay), it seems rather clear that languages change faster than machines will ever be able to understand them. Indeed, this is almost tautologically true, because machines (and so-called „artificial intelligence“) require training sets in order to learn (and such training sets from so-called „natural language“ must be expressions from the past – and not even just from the past, but also approved by speakers of the language, i.e. „literate“ people). So-called „pattern recognition“ – a crucial concept in the AI field – is always recognizing patterns which have been previously defined by humans. You cannot train a machine to do anything without a human trainer, who designs a plan (i.e., an algorithmic set of instructions) which flow from to human intelligence.

    There was a very trendy movement which was quite popular several years ago that led to the view that data might self-organize, that trends might „emerge from the data“ without needing the nuissance of consulting costly humans, and this movement eventually led to what is now commonly hyped as „big data“. All of this hype about „emergence“ is hogwash. If you don’t know what I mean when I say „hogwash“, then please look it up in a dictionary. 😉

     
  • Profile photo of nmw

    nmw 17:08:46 on 2016/03/16 Permalink
    Tags: , , , , , , data, , in real life, incoherence, incoherent, IRL, issue, issues, , , , Mark Twain, , , Mysterious Stranger, , , , , , situation, situations, The Mysterious Stranger, , , , , , ,   

    There is No Such Thing as Context-Free Meaning 

    For this post, I would like to start out with two operational definitions:

    1. Context-Free Websites
    2. Contextual Websites

    „Context-Free“ websites purport to be containers without context. Supposedly, there is no „situation“ or „issue“ which constrains the content – it is assumed that all data contained in such a website is provided without any context whatsoever. Some readers may be reminded that this is one marker of a „Retard Media“ website (but there are others – for more on „Retard Media“, see this definition).

    The diametric opposite of a context-free websites are referred to as „Contextual“ websites. Contextual websites are clearly situated in the real world, and they also have clearly defined „in real life“ (IRL) issues. They are anything but open spaces, free-for-all playgrounds, romping rooms released to anyone, be they brand name marketing department representives, or whether they come from spamming outbacks, hacker havens or other terrorist enclaves.

    Contextual websites are most definitely constrained: If you do not appreciate such constraints, then you are plain and simple not welcome.

    What is less clear to most is that context-free websites are by their nature meaningless. Yet if you reflect a little, it should be easy to see and understand that incoherent babble – as it does not cohere to anything – must be meaningless babble.

    One of my favorite authors – Mark Twain – described such a „detached from life“ being at the end of one of the last stories / novels he published (see, e.g. “The Mysterious Stranger and Other Stories”, by Mark Twain):

    “Life itself is only a vision, a dream.”

    It was electrical. By God! I had had that very thought a thousand times in my musings!

    “Nothing exists; all is a dream. God—man—the world—the sun, the moon, the wilderness of stars—a dream, all a dream; they have no existence. Nothing exists save empty space—and you!”

    “I!”

    “And you are not you—you have no body, no blood, no bones, you are but a thought. I myself have no existence; I am but a dream—your dream, creature of your imagination. In a moment you will have realized this, then you will banish me from your visions and I shall dissolve into the nothingness out of which you made me….

    “I am perishing already—I am failing—I am passing away. In a little while you will be alone in shoreless space, to wander its limitless solitudes without friend or comrade forever—for you will remain a thought, the only existent thought, and by your nature inextinguishable, indestructible. But I, your poor servant, have revealed you to yourself and set you free. Dream other dreams, and better!

    “Strange! that you should not have suspected years ago—centuries, ages, eons, ago!—for you have existed, companionless, through all the eternities. Strange, indeed, that you should not have suspected that your universe and its contents were only dreams, visions, fiction! Strange, because they are so frankly and hysterically insane—like all dreams: a God who could make good children as easily as bad, yet preferred to make bad ones; who could have made every one of them happy, yet never made a single happy one; who made them prize their bitter life, yet stingily cut it short; who gave his angels eternal happiness unearned, yet required his other children to earn it; who gave his angels painless lives, yet cursed his other children with biting miseries and maladies of mind and body; who mouths justice and invented hell—mouths mercy and invented hell—mouths Golden Rules, and forgiveness multiplied by seventy times seven, and invented hell; who mouths morals to other people and has none himself; who frowns upon crimes, yet commits them all; who created man without invitation, then tries to shuffle the responsibility for man’s acts upon man, instead of honorably placing it where it belongs, upon himself; and finally, with altogether divine obtuseness, invites this poor, abused slave to worship him!…

    “You perceive, now, that these things are all impossible except in a dream. You perceive that they are pure and puerile insanities, the silly creations of an imagination that is not conscious of its freaks—in a word, that they are a dream, and you the maker of it. The dream-marks are all present; you should have recognized them earlier.

    “It is true, that which I have revealed to you; there is no God, no universe, no human race, no earthly life, no heaven, no hell. It is all a dream—a grotesque and foolish dream. Nothing exists but you. And you are but a thought—a vagrant thought, a useless thought, a homeless thought, wandering forlorn among the empty eternities!”

    He vanished, and left me appalled; for I knew, and realized, that all he had said was true.

    The more incoherent a site is, the more meaningless are the messages the site aims to convey. The content – whether big data or small bits – thereby becomes as insiginificant as nothing more than a heap of garbage. To top it off, one could of course lie to people… – for example: one could tell them it is all about faces even though in reality it is all about brands. 😉

     

     
c
Compose new post
j
Next post/Next comment
k
Previous post/Previous comment
r
Reply
e
Edit
o
Show/Hide comments
t
Go to top
l
Go to login
h
Show/Hide help
shift + esc
Cancel
Skip to toolbar