Tagged: big data Toggle Comment Threads | Keyboard Shortcuts

  • Profile photo of feedwordpress

    feedwordpress 11:52:30 on 2018/01/02 Permalink |
    Tags: , , , , , big data, , , , , , , , , , ,   

    The Cooperative Principle in Conversation versus the Prejudice in Silence 

    In the following, I understand the Internet as a massive text connected by many participants conversing with one another. Parts of the text are in close connection, and the discussion can be viewed as heated insofar as the sub-texts reference each other in some way (links are merely one example of such cross-references). Other parts of the text are fairly isolated, hardly discussed, rarely (if ever) referenced. I want to argue that the former parts are “well formed” in the sense that they follow Grice (1975)’s cooperative principle, and that the latter seem to evidence a sort of prejudice (performed by the disengaged participants) — which I hope to be able to elucidate more clearly.

    Before I embark on this little adventure, let me ask you to consider two somewhat complementary attitudes people commonly choose between when they are confronted with conversational situations. These are usually referred to as “feelings” — and in order to simplify, I will portray them as if they were simply logically diametrically opposed … whereas I guess most situations involve a wide variety of factors each varying in shades of gray rather than simple binary black versus white, one versus zero. Let’s just call them trust and distrust, and perhaps we can ascribe to elements of any situation as trustworthy versus distrustworthy.

    Next, let me introduce another scale — ranging from uncertainty (self-doubt) to certainty (self-confidence).

    Together, these two factors of prejudice (in other words: preliminary evaluations of other-trustworthiness and self-confidence) crucially impact our judgment of whether or not to engage in conversations, discussions, to voice our own opinions, whether online or offline.

    As we probably all know, the world is not as simple as a reduction to two factors governing the course of all conversations. For example: How does it happen that a person comes to fall on this end or that end of either scale? No doubt a person’s identity is influenced by a wide variety of group affiliations and/or social mores, norms and similar contextual cues which push and pull them into some sort of category, whether left or right, wrong or fixed, up or down, in or out with mainstream groupings. One of the most detailed investigations of the vast complexity and multiplicity woven into the social fabric is the seminal work by Berger and Luckmann titled “The Social Construction of Reality”.

    While I would probably be the first to admit the above approach is a huge oversimplification of something as complex as all of human interactions on a global scale, I do feel the time is ripe for us to admit that the way we have approached the issue thus far has been so plagued with falsehoods and downright failures, that we cannot afford ourselves to continue down this path. In an extreme “doomsday” scenario, we might face nuclear war, runaway global warming, etc. all hidden behind “fake news” propaganda spread by robots gone amok. In other words, continuing this way could be tantamount to mass suicide, annihilation of the human race, and perhaps even all life on the planet. Following Pascal, rather than asking ourselves whether there is a meaning to life, I also venture to ask whether we can afford to deny life has any meaning whatsoever — lest we be wrong.

    If I am so sure that failing to act could very well lead to total annihilation, then what do I propose is required to save ourselves from our own demise?

    First and foremost, I propose we give up the fantasy of a simplistic true-or-false type binary logic that usually leads to the development of “Weapons of Math Destruction”. That, in my humble opinion, would be a good first step.

    What ought to follow next might be a realization that there are infinite directions any discussion might lead (rather than a simplistic “pro” vs. “contra”). I could echo Wittgenstein’s insight that the limits of directions are the limits of our language — and in this age of devotion to ones and zeros, we can perhaps find some solace in the notion of a vocabulary of more than just two cases.

    Once we have tested the waters and begun to move forewards toward the vast horizons available to us, we may begin to understand the vast multi-dimensionality of reality — for example including happy events, sad events, dull events, exciting events and many many more possibilities. Some phenomena may be closely linked, other factors may be mutually orthogonal in a wide variety of different ways. Most will probably be neither diametrically opposed nor completely aligned — the interconnections will usually be interwoven in varying degrees, and the resulting complexity will be difficult to grasp simply. Slowly but surely we will again become familiar with the notion of “subject expertise”, which in our current era of brute force machinistic algorithms has become so direly neglected.

    If all goes well, we might be able to start wondering again, to experience amazement, to become dazzled with the precious secrets of life and living, to cherish the mysterious and puzzling evidences of fleeting existence, and so on.

    propaganda, rational media,
    language, natural language,
    algorithm, algorithms, algorithmic,
    big data, data, research, science,
    quantitative, qualitative,
    AI, artificial intelligence,


  • Profile photo of nmw

    nmw 15:27:59 on 2016/07/12 Permalink
    Tags: academia, academic, , , , , , , , bandwagon, bandwagon effect, big data, , , , , , , , , compute, , corrupt, corrupted, corruption, , , , , , , group think, groupthink, , , , , , , , , , , , , majority, , , , , populism, populist, , , rason, , , , , , , , , , systemic, , , trusted, , , universities, , valid, validity, vote, votes, voting, ,   

    The Spectre of Populism 

    There is a spectre haunting the Web: That spectre is populism.

    Let me backtrack a moment. This piece is a part of an ongoing series of posts about „rational media“ – a concept that is still not completely hard and fast. I have a hunch that the notion of „trust“ is going to play a central role… and trust itself is also an extremely complex issue. In many developed societies, trust is at least in part based on socially sanctioned institutions (cf. e.g. „The Social Construction of Reality“) – for example: public education, institutions for higher education, academia, etc. Such institutions permeate all of society – be it a traffic sign at the side of a road, or a crucifix as a central focal element on the alter in a church, or even the shoes people buy and walk around with on a daily basis.

    The Web has significantly affected the role many such institutions play in our daily lives. For example: one single web site (i.e. the information resources available at a web location) may be more trusted today than an encyclopedia produced by thousands of writers ever were – whether centuries ago, decades ago, or even still just a few years past.

    Similarly, another web site may very well be trusted by a majority of the population to answer any and all questions whatsoever – whether of encyclopedic nature or not. Perhaps such a web site might use algorithms – basically formulas – to arrive at a score for the „information value“ of a particular web page (the HTML encoded at one sub-location of a particular web site). A large part of this formula might involve a kind of „voting“ performed anonymously – each vote might be no more than a scratch mark presumed to indicate a sign of approval (an „approval rating“) given from disparate, unknown sources. Perhaps a company might develop more advanced methods in order to help guage whether the vote is reliable or whether it is suspect (for example: one such method is commonly referred to as a „nofollow tag“ – a marker indicating that the vote should not be trusted).

    What many such algorithms have in common is that on a very basic level, they usually rely quite heavily on some sort of voting mechanism. This means they are fundamentally oriented towards populism – the most popular opinion is usually viewed as the most valid point of view. This approach is very much at odds with logic, the scientific method and other methods that have traditionally (for several centuries, at least) be used in academic institutions and similar „research“ settings. At their core, such populist algorithms are not „computational“ – since they rely not on any kind of technological solution to questions, but rather scan and tally up the views of a large number of human (and/or perhaps robotic) „users“. While such populist approaches are heralded as technologically advanced, they are actually – on a fundamental level – very simplistic. While I might employ such methods to decide which color of sugar-coated chocolate to eat, I doubt very much that I, personally, would rely on such methods to make more important – for example: „medical“ – decisions (such as whether or not to undergo surgery). I, personally, would not rely on such populist methods much more than I would rely on chance. As an example of the kind of errors that might arise from employing such populist methods, consider the rather simple and straightforward case that some of the people voting could in fact be color-blind.

    Yet that is just the beginning. Many more problems lurk under the surface, beyond the grasp of merely superficial thinkers. Take, for example, the so-called „bandwagon effect“ – namely, that many people are prone to fall into a sort of „follow the leader“ kind of „groupthink“. Similarly, it is quite plausible that such bandwagon effects could even influence not only people’s answers, but even also the kinds of questions they feel comfortable asking (see also my previous post). On a more advanced level, complex systems may be also be influenced by the elements they comprise. For example: While originally citation indexes were designed with the assumption that such citation data ought to be reliable, over the years it was demonstrated that such citations are indeed very prone to be corrupted by a wide variety of corruption errors and that citation analysis is indeed not at all a reliable method. While citation data may have been somewhat reliable originally, it became clear that eventually citation fraud corrupted the system.

  • Profile photo of nmw

    nmw 16:58:00 on 2016/06/13 Permalink
    Tags: , big data, , , , celeb, celebrities, , celebs, , dictators, , , , , , , , , , mesmerization, mesmerize, mesmerized, , politician, politicians, , , , , , , , , , , , ,   

    The Big Data Rationality of Large Numbers: Quantitative Statistics + Fanatical Delusions 

    There are virtually innumerable fans of so-called „big data“. Countless fanatics of this quasi-scientific method will swear on a stack of bibles that if you count anything – it really doesn’t matter what, as that minute detail will certainly „emerge“ from the data itself – you will be rewarded with insights beyond your wildest dreams. Such descendents of bean-counters from previous centuries have moved on to grains of sand, dust particles, the colors of a beautiful sunset, whatever.

    These people may strongly believe in science – without actually understanding much about scientific methods.

    There seems to be a link between such lacking understanding and fanaticism. Let’s go back to one of the greatest leaders of fanatical movements ever: Adolf Hitler was probably one of the most (if not even the most) quintessial dictators of all times. I think what many people overlook, though, in this example is not that he was able to mesmerize such humungous masses, but rather how the masses let themselves become mesmerized.

    Fans follow leaders (perhaps they should instead watch the parking meters 😉 ). There is a sort of quirky rationality to this behavior: When fans follow their leader, they apparently feel they no longer have to think themselves… – they simply accept whatever their leader says (i.e., dictates). This saves energy, because thinking can be quite difficult. Not thinking is easier than thinking.

    The important takeaway is this: If people feel able to let someone else do the thinking, they seem very willing to do so. One way they feel able to enable a dictator to think for them is if / when other people seem to approve of the dictator. Other people’s approval of a dictator seems to make it „OK“ to let the dictator do as he / she pleases… – whether the dictator is a politician, a celebrity, a brand name, or anything anyone happens to be a fan (i.e., a fanatical follower) of.

    When popular brand names such as Google or Facebook sell „big data“, of course they tell naive and innocent consumers a story about how important big data is in order for consumers to be able to find leaders. What they don’t tell such consumers (as those people who are willing to believe this story) is that the „big data“ plans are actually all about tracking consumer behavior. What they don’t tell advertisers is that the consumer behavior they track actually isn’t actually a pot of gold at the end of a rainbow, but merely a fanatical delusion hardly worth any more than a single grain of sand.

  • Profile photo of nmw

    nmw 19:04:09 on 2016/05/27 Permalink
    Tags: , , artificial languages, big data, , , , , , emerge, , human intelligence, , , , , , , , , , , , , , , , , , traing set, traing sets, , , , ,   

    Literacy and Machine Readability: Some First Attempts at a Derivation of the Primary Implications for Rational Media 

    Online, websites are accessed exclusively via machine-readable text. Specifically, the character set prescribed by ICANN, IANA, and similar regulatory organizations consists of the 26 characters of the latin alphabet, the „hyphen“ character and the 10 arabic numbers (i.e. The symbols / zyphers 0-9). Several years ago, there was a move to accommodate other language character sets (this movement is generally referred to as „Internationalized Domain Names“ [IDN]), but in reality this accommodation is nothing more than an algorithm which translates writing using such „international“ symbols into strings from the regular latin character set, and to used reserved spaces from the enormous set of strings managed by ICANN for such „international“ strings. In reality, there is no way to register a string directly using such „international“ characters. Another rarely mentioned tidbit is that this obviously means that the set of IDN strings that can be registered is vastly smaller than strings exclusively using the standardized character set approved for direct registration.

    All of that is probably much more than you wanted to know. The „long story short“ is that all domain names are machine readable (note, however, that – as far as I know – no search engine available today on the world-wide-web uses algorithms to translate IDN domain name strings into their intended „international“ character strings). All of the web works exclusively via this approved character set (even the so-called „dotted decimals“ – the numbers which refer to individual computers [the „servers“] – are named exclusively using arabic numerals, though in reality are based on groups of bits: each number represents a „byte“-sized group of 8 bits… in other words: it could be translated into a character set of 256 characters. In the past several years, there has also been a movement to extend the number of strings available to accommodate more computers from 4 bytes (commonly referred to as Ipv4 or „IP version 4“) to 6 bytes (commonly referred to as Ipv6 or „IP version 6“), thereby accommodating 256 x 256 = 65536 as many computers as before. Note, however, that each computer can accommodate many websites / domains, and the number of domain names available excedes the number of computers available by many orders of magnitude (coincidentally, the number of domain names available in each top level domain [TLD] is approximately 1 x 10^100 – in the decimal system, that’s a one with one hundred zeros, also known as 1 Googol).

    Again: Very much more than you wanted to know. 😉

    The English language has a much smaller number of words – a very large and extensive dictionary might have something like 100,000 entries. With variants such as plural forms or conjugated verb forms, that will still probably amount to far less than a million possible strings – in other words: about 94 orders of magnitude less than the number of strings available as domain names. What is more, most people you might meet on the street probably use only a couple thousand words in their daily use of „common“ language. Beyond that, the will use even fewer than that when they use the web to search for information (for example: instead of searching for „sofa“ directly, they may very well first search for something more general like „furniture“).

    What does „machine readable“ mean? It means a machine can take in data and process it algorithmicly to produce a result – you might call the result „information“. For example: There is a hope that machines will someday be able to process strings – or even groups of strings, such as this sentence – and be able to thereby derive („grok“ or „understand“) the meaning. This hope is a dream that has already existed for decades, but the successes so far have been extremely limited. As I wrote over a decade ago (in my first „Wisdom of the Language“ essay), it seems rather clear that languages change faster than machines will ever be able to understand them. Indeed, this is almost tautologically true, because machines (and so-called „artificial intelligence“) require training sets in order to learn (and such training sets from so-called „natural language“ must be expressions from the past – and not even just from the past, but also approved by speakers of the language, i.e. „literate“ people). So-called „pattern recognition“ – a crucial concept in the AI field – is always recognizing patterns which have been previously defined by humans. You cannot train a machine to do anything without a human trainer, who designs a plan (i.e., an algorithmic set of instructions) which flow from to human intelligence.

    There was a very trendy movement which was quite popular several years ago that led to the view that data might self-organize, that trends might „emerge from the data“ without needing the nuissance of consulting costly humans, and this movement eventually led to what is now commonly hyped as „big data“. All of this hype about „emergence“ is hogwash. If you don’t know what I mean when I say „hogwash“, then please look it up in a dictionary. 😉

Compose new post
Next post/Next comment
Previous post/Previous comment
Show/Hide comments
Go to top
Go to login
Show/Hide help
shift + esc
Skip to toolbar