Tagged: indexing Toggle Comment Threads | Keyboard Shortcuts

  • Profile photo of nmw

    nmw 15:27:59 on 2016/07/12 Permalink
    Tags: academia, academic, , , , , , , , bandwagon, bandwagon effect, , , , , , , , , , compute, , corrupt, corrupted, corruption, , , , , , , group think, groupthink, , indexing, , , , , , , , , , , majority, , , , , populism, populist, , , rason, , , , , , , , , , systemic, , , trusted, , , universities, , valid, validity, vote, votes, voting, ,   

    The Spectre of Populism 

    There is a spectre haunting the Web: That spectre is populism.

    Let me backtrack a moment. This piece is a part of an ongoing series of posts about „rational media“ – a concept that is still not completely hard and fast. I have a hunch that the notion of „trust“ is going to play a central role… and trust itself is also an extremely complex issue. In many developed societies, trust is at least in part based on socially sanctioned institutions (cf. e.g. „The Social Construction of Reality“) – for example: public education, institutions for higher education, academia, etc. Such institutions permeate all of society – be it a traffic sign at the side of a road, or a crucifix as a central focal element on the alter in a church, or even the shoes people buy and walk around with on a daily basis.

    The Web has significantly affected the role many such institutions play in our daily lives. For example: one single web site (i.e. the information resources available at a web location) may be more trusted today than an encyclopedia produced by thousands of writers ever were – whether centuries ago, decades ago, or even still just a few years past.

    Similarly, another web site may very well be trusted by a majority of the population to answer any and all questions whatsoever – whether of encyclopedic nature or not. Perhaps such a web site might use algorithms – basically formulas – to arrive at a score for the „information value“ of a particular web page (the HTML encoded at one sub-location of a particular web site). A large part of this formula might involve a kind of „voting“ performed anonymously – each vote might be no more than a scratch mark presumed to indicate a sign of approval (an „approval rating“) given from disparate, unknown sources. Perhaps a company might develop more advanced methods in order to help guage whether the vote is reliable or whether it is suspect (for example: one such method is commonly referred to as a „nofollow tag“ – a marker indicating that the vote should not be trusted).

    What many such algorithms have in common is that on a very basic level, they usually rely quite heavily on some sort of voting mechanism. This means they are fundamentally oriented towards populism – the most popular opinion is usually viewed as the most valid point of view. This approach is very much at odds with logic, the scientific method and other methods that have traditionally (for several centuries, at least) be used in academic institutions and similar „research“ settings. At their core, such populist algorithms are not „computational“ – since they rely not on any kind of technological solution to questions, but rather scan and tally up the views of a large number of human (and/or perhaps robotic) „users“. While such populist approaches are heralded as technologically advanced, they are actually – on a fundamental level – very simplistic. While I might employ such methods to decide which color of sugar-coated chocolate to eat, I doubt very much that I, personally, would rely on such methods to make more important – for example: „medical“ – decisions (such as whether or not to undergo surgery). I, personally, would not rely on such populist methods much more than I would rely on chance. As an example of the kind of errors that might arise from employing such populist methods, consider the rather simple and straightforward case that some of the people voting could in fact be color-blind.

    Yet that is just the beginning. Many more problems lurk under the surface, beyond the grasp of merely superficial thinkers. Take, for example, the so-called „bandwagon effect“ – namely, that many people are prone to fall into a sort of „follow the leader“ kind of „groupthink“. Similarly, it is quite plausible that such bandwagon effects could even influence not only people’s answers, but even also the kinds of questions they feel comfortable asking (see also my previous post). On a more advanced level, complex systems may be also be influenced by the elements they comprise. For example: While originally citation indexes were designed with the assumption that such citation data ought to be reliable, over the years it was demonstrated that such citations are indeed very prone to be corrupted by a wide variety of corruption errors and that citation analysis is indeed not at all a reliable method. While citation data may have been somewhat reliable originally, it became clear that eventually citation fraud corrupted the system.

     
  • Profile photo of nmw

    nmw 18:49:07 on 2016/06/25 Permalink
    Tags: , , , , , , citation analysis, , , , , few, , , , , indexing, , , many, , , sample, sampling, , , ,   

    Don’t Listen to One Single Piece of Good Advice — Listen to Many 

    Several months ago, I mentioned on one of my other blogs that I enjoy listening to Gretchen Rubin’s „Happier“ podcast. I still do, even though I think content sponsored by advertising is by and large fake.

    Recently, Gretchen (and Elizabeth) asked her (their) listeners what the best piece of advice was that they ever got. I responded (they asked for people to phone in their comments – I think my remarks may have arrived a little too late for episode 70, but perhaps they might appear in episode 71(?).

    This was the gist of my message: Don’t Listen to One Single Piece of Good Advice — Listen to Many!

    This is also something Jason Calacanis mentioned in a recent episode of his „This Week in Startups“ podcast, but I can’t remember which one – that you should never rely on just one source of information. I remember thinking as I listened to Jason (and of course I had heard such advice decades before from many of my school teachers): „does that mean if you search for information you will not only listen to Google?“ Stange as it may seem, my hunch is that for the vast majority of the population, this is not the case. Indeed, my experience has been that most people will only search for information using Google’s algorithms – if they do not see anything that appeals to them via Google, they will assume that no such thing exists.

    Incidentally, there is also another kind of parochialism that I feel is closely related to this fanatical belief in Google’s scoring algorithm. In a recent episode published by HBR’s „Ideacast” podcast, Todd Rose was interviewed about a book he had recently written (” The End of Average: How to Succeed in a World That Values Sameness”) about measurement and statistics. His argument echoes something I have long held to be true (and I think I recall that one of my comments regarding this matter also appeared on a German radio program – perhaps 5 or more years ago).

    Oddly, Google fan-boys (and fan-girls, too, of course) often overlook the fact that Google also ranks results according to such „cooked“ statistics. In fact the situation is even worse: when Google calculates its metrics for websites, then those metrics are applied regardless of how relevant they are (or aren’t). So while SAT scores attempt to measure both mathematical ability and verbal ability, Google’s statistical measurement for quality (which was shown to be totally bogus decades ago) is applied whether or not the source is reliable for the search query. It is essentially a „one-size fits-all“ metric (which also happens to be totally unreliable). Yet very few people really care, because most people use Google mainly to search for domain names anyways (in other words: they „search“ for ebay because they are too lazy to type in ebay.com. I bet if people stopped doing that, then the reduction in energy required might actually reduce global warming significantly! 😉

     
  • Profile photo of nmw

    nmw 17:57:54 on 2016/05/23 Permalink
    Tags: , , character, characters, , , , , , indexes, indexing, , informationretrievel, intelligability, intelligable, , , , , , , , , , , ,   

    Fundamental Principles of Rational Media 

    In my previous post, I noted that my concept of rationality differs from the general, widely accepted views of this notion. I do not disagree with these views. Instead, I believe the way I view rationality is more generalized.

    To put it simply: Rationality can be interpreted as any idea – in other words: any idea can be considered rational – if it can be expressed in language. What language is / isn’t – that’s perhaps a more difficult question to answer, but as mathematics is one such language… and as logic, i.e. „mathematical logic“ can be interpreted as a subset of mathematics, logic can also be interpreted as a language.

    Most so-called „programming“ languages are also, well: languages. „Natural“ languages are also languages (indeed: the distinction between „natural“ language and „artificial“ language is really not very distinct, clear, obvious or anything like that). And as I mentioned in my previous post, even facial expressions, scents, DNA and many other things can also be interpreted as language.

    In the context of „rational media“, however, I suggest limiting the meaning of the expression to what is often referred to as „machine readable“ language. I would even suggest limiting the extent of „rational media“ more than that, because there are actually many types of machine-readable expressions which are usually considered to be unintelligible by humans without machines. For example: Hollerith cards, magnetic tape and discs, compact discs, usb sticks, bar codes and QR-codes to name just a few. There are also some expressions which are simply difficult to express in the traditional notion of natural language – for example: numerical values written in hexadecimal formats.

    All of this is by and large simple and straightforward in an online setting, because web addresses are almost all written using what most people consider to be natural language expressions (though note that so-called „international domain names“ / IDNs are written in a code which allows for algorithmic translation between the latin character set used in all domain names to transformed expressions in specialized character sets [and vice versa] ). In general, surfing the web is very much like using an encyclopedia, a lexicon or what used to be called a „card catalog“. The primary difference is that whereas the web is considered to be distributed, the traditional forms were usually viewed as created by a single author, organization or institution. Therefore, whereas for many decades and even centuries people had become very accustomed to indexes being something created by specialized „indexers“ or „indexing services“, today the „index“ to the web is considered to be integrated into the web itself (note, however, that the registries of „top level domains“ [TLDs] are actually sort of like the „indexes of last resort“ … that is, „last resort“ excluding ICANN).

    I will simply abruptly stop here for now – as I feel this is probably already quite a lot to digest. If you would like to add comments, ideas, questions or anything like that, please feel free to register @ nooblogs.com, which is intended to be more for discussion and/or sharing of ideas.

     
  • Profile photo of nmw

    nmw 13:20:03 on 2015/07/23 Permalink
    Tags: author, authors, book, books, , , catalogs, , , , , , , , indexing, , , , , , , , , , , , , , , , , , ,   

    To Read or to Be Read 

    When I was a kid, I used to go to the library a lot… and read books. Before reading them, I would need to find them. For those of you unfamiliar with this process: This was the prototype for most search engines (back then, people studying this process went to “library schools”, graduate programs for “information science” — and the field specifically focused on what is today referred to “search”, back then it was called “information retrieval”).

    But reading is not really something “for specialists only”. Before graduating high school, regular folks also had to learn about publishing. For example: They needed to know that books have authors, that they were published by publishing companies, and so on.

    Online, titles, authors and publishing houses became domain names. There are also numbers which refer to computers — perhaps this is roughly equivalent to the way people would refer to specific shelves where specific books were stored (this system of naming shelves, which was used in the earliest libraries, would later give way to so-called “call-numbers”, a system whereby a book was given a specific sequential number where it could be found). The biggest difference between traditional libraries and the Internet is probably the fact that online, the cataloging and indexing systems are integrated into the same system as the writing that they catalog / index. Although professional abstracting and indexing services also published such volumes (which looked very much like “regular” books), and these books were usually also given call numbers, putting them on par with the more ordinary literature, the librarian was the person who made this decision… and the librarian was the person ultimately responsible for maintaining the catalog (and also for choosing what would be included in the library’s collection).

    I guess only quite novice users would assume that if something was not in the library (and/or the library’s catalogs) that it would not exist.

    Contrast that with today — where there is now an entire generation of kids who seem to believe that if something cannot be found in Google, that it doesn’t exist.

    Even though the rate of illiteracy today is quite astounding already, I now observe also that in recent years an entirely new trend is catching on. People are becoming ever less concerned with reading or writing or behaving as functionally literate persons. Instead: They are becoming more obsessed with being read… — meaning that someone (or some company) is able to trace their moves. Whereas it is becoming ever more rare for people the read or write anything resembling written texts (and/or “literature”), it is becoming ever more commonplace for people to clutch on to gadgets which track everything such quantified fetishists seem to place such a high value on. The typical quantified fetishist will feel much the same way about their gadget fetish as a democratic idealist might view the sanctity of the voting booth.

    In this milieu, there seems to also be a widespread belief that the companies collecting this data will share it publicly out of the warmness of their hearts.

     
c
Compose new post
j
Next post/Next comment
k
Previous post/Previous comment
r
Reply
e
Edit
o
Show/Hide comments
t
Go to top
l
Go to login
h
Show/Hide help
shift + esc
Cancel
Skip to toolbar