Data In The World Of AI: Why We Can't Trust & Measure Everything And Why We Need To Be Skeptic About It

Introduction to AI in Real-World

Despite the fact that artificial intelligence has been evolving for a while, it is now entering a critical phase in both its research and use. The ability to use this technology much more widely has been made possible over the past 10 years by a convergence of variables, including enhanced methods like deep learning and the expansion of data availability and computing capacity. There never seems to be a shortage of sensationalist headlines about how AI could increase human creativity, speed up human innovation, and heal diseases. You would assume that we already live in a world when AI has entered every part of civilization based solely on the headlines in the media. A worldview known as "AI solutionism" has emerged as a result of AI, which is indisputable in that it has created a multitude of exciting options. The idea behind this is that if given enough data, machine learning algorithms can resolve every issue facing humanity. However, there is a serious issue with this notion. By trusting too much data and having inflated expectations about what AI can genuinely accomplish for humans, it actually undermines the value of machine intelligence rather than advancing it. In short, there are numerous risks and errors while making use of AI as there are opportunities using AI.

Main Draft

Correlation detection by AI, such as the finding that clouds increase the likelihood of rain, is essentially the most basic form of causal inference. It's effective enough to have sparked the recent surge in deep learning, an AI method. This approach can produce very reliable predictions when given a wealth of information about well-known circumstances. Because it knows how frequently hundreds or even millions of other patients with the same symptoms have that condition, a computer can figure out the probability that a patient with a given set of symptoms has that disease. However, there is a growing consensus that if computers can't get better at figuring out causality, advancements in AI would stagnate. Machines would not have to constantly learn everything from scratch since they could apply what they had learned in one area to another if they could understand how certain things lead to other things. Furthermore, if machines were capable of using common sense, we would be able to trust them more to act independently because we would know that they won't make stupid mistakes. even by making use of the AI, certain things have to be get controlled manually(or atleast once in a while)the main reason is that for any management may or may not using AI, the data gets trusted too much, apart from that the intersection of such stats is fraught with four major issues:

1. Metrics can be addictive:

Mathematicians are generally regarded as trustworthy because they are known to deal in carefully considered logical arguments based on premises and axioms. We also believe in math because it is hard, is thought to be objective, and is hard because mathematicians are known to deal in values and axioms(or statements that everyone believes to be almost true) It is ingrained in us that understanding something requires measurement. We're all directly or indirectly instructed to never ask a stupid questions and to always project a sense of certainty and success. when we utilize mathematical models, our confidence in arthmetic has a clear effect, even if a concluion is incorrect, the very fact that the arthmaetic is involved makes it to appear predictable and undeniable. This is probably a result of any internal language, whether it be used by alchemists to turn the lead into gold or as by the bankers designing a default credit swap. By these means,we can develop the habit of measuring things more and more as a means of controlling and comprehending them once we begin to view mathematical models as reliable instruments. This is not a problem in and of itself. But it may easily develop into a type of addiction, particularly if we only acknowledge things that can be measured and if we have a thirst for everything to fit into the data box. Once we become used to the sense of control that modelling and measurement provides, a similar issue arises - how do we handle uncertainty? Accuracy becomes an issue, esp. when we want to have control.

For e.g - if you love someone how would you measure it? how would you measure the importance or value of reputation in a field? The effect or power that some officials have? How would you quantify those?

Along that, lets consider how this data-dependent mentality makes us blind and naive to some situations. The only thing we need to do to keep something private and out of reach from those data people is to keep it away from sensors or the data collecting bots. Orgs like NSA, CSS may easily be missing out on every-person(s) they are aiming to capture when collecting data on the citizens since certain people have the greater reasons than others to keep themserlves concealed.

Everytime something is measured, it may not gaurantee a clean or accurate outcome. the best example would be the Business Data, the monthly sales changes which is sometime predictable and the other times not. not all the info may be used and not all hard figures are accurate. however, taking decisive action when faced with uncertainty is a fairly typical result since it becomes difficult to acknowledge when one would lacks the enough data to take action. Trying to measure something you can't, Avoiding uncertainty when defining what the model does with the input can help you avoid making this error. Everytime you collect data, make sure that enough care has been taken. Make use of Black-Boxes if you have to. Keeping track of both IOs, how the data is being used? chances of odds? and if any odd is likely to occur what's the source for it? This sometimes depends on the model you chose. always try to choose the model that co-relates with the reality. If you're supposed to measure income but instead, for instance, you're utilising census data to estimate it, then mention so. Make sure you convey both the best estimates by varying approaches which helps to compute errors. Surveying and data-polls can be a great supplement to data-driven analysis. You don't have to have a PH.D in mathematics or in any of the stat fields to atleast undertstadinng at that level.(but if you do, then your data guy is hiding something from you)

2. Focussing more On Numbers rather than Characteristics/Patterns:

Even with the use of AI in almost every field that there is, still the modelers utilize the use of proxies when something is unable to be measured directly or by using AI. in fact it is almost always the case. We can monitor how many pages a user has read and how much of the time they spend on it(both on a whole and on an each page) but its really impossible to evaluate the user's interest in a website.(using AI? still may not be accurate) Generally speaking, that serves as an indicator of their interest, but there are always exceptions. When selecting the proxies, we have a great deal of influence over what sort of data is relevant and what isn't. the reminder is subsequently pushed to the margins and made invisible to the models. The strength of proxies varies generally and they can be fairly weak. making the best of what you have at the moment may be accidental or situational but other times it may be planned and a part of a bigger, more overt plan(a political paradigm)

The cleansing impact of mathematical modelling causes us to frequently mis-interpret the findings of data-analysis as being objective when in fact they are the only as objective as the underlying process and depend on the chosen proxies in opaque and complicated ways. the end result is ametric that appesrs to be both powerul and objective but is fact neither. the garbage in trash out issue is another name for this. for instance, the issue of selection bias as shining examples of big data success stories like Netflix's movie recommendation system suffer from this, if only because their model of "people" is biassed toward people who have the time and interest in rating a lot of movies online. This is putting aside other modelling issues Netflix has displayed like thinking anyone living in certain neighbourhoods with a high concentration of people from Southeast Asia are fans of Bollywood

Netflix is an example of an interpretation-after-the-fact problem, when it first came into action(or atleast the time it started to take off), we believe we have the consensus opinion when in reality we have the opinion of a particular population. This problem is different from a direct proxy problem because we can probably trust each person to give their honest opinion (or maybe not). We'll return to this oblique presumption that "N=all." There has recently been a significant effort put into quantifying schooling. How does one evaluate something as complex and significant as teaching a subject in high school? The solution, for the time being at least (this was before we began using sensors) comes in the form of student test scores as a stand-in. There are a plethora of proprietary models that claim to quantify the "value added" by a particular teacher based on the annual test scores of their pupils, most of which are marketed by private education consulting firms.

Note: How we're starting off by determining a teacher's effectiveness with a poor proxy. We never get to witness how the teachers engage with the pupils or whether they get motivated or eager to learn more. for instance How effectively do these models operate? It's interesting that these models lack an evaluation metric, making it difficult to tell for sure. However, there is indirect proof that these models are rather noisy: teachers who received two evaluations for the same subject in the same year, for separate classes, only find a 24% correlation between their two ratings.

Let's examine a third instance, In India, using incredibly flimsy proxies, credit rating agencies awarded AAA ratings to poor mortgage contracts. There was, of course, no historical information on the default rates of new types of mortgages like the no-interest, no-job(which are known to be NINJA mortgages) while they were being pushed on people, packaged, and sold. Instead, historical data on mortgages of greater quality were substituted by the modellers, but the models utterly failed. What that is needed to understand is its important to comminicate what the proxies you use are what the resulting limitations of your models are to people who will be explaining and using the same models.

Sometimes enough care is to be taken about the objectivity, it may be tricky. If you’re tasked with building a model to decide who to hire, for example, most of the time, you might find yourself comparing women and men with the exact same qualifications who have been hired in the past. Then, looking into what happens next, you learn that those women have tended to leave more often, get promoted less often, and give more negative feedback on their environments compared to the men. Your model might be tempted to hire the man over the woman next time the two show up, rather than looking into the possibility that the company doesn’t treat female employees well.

In business, Metaphors are inappropriate in data science because the devil is always in the details. The more clearly you comprehend what your data analysts are doing and how they use the raw data to draw conclusions, the more you'll be able to see how individuals can elude the models detection and learn what the models are missing.

3. Unable To Frame the Issue:

A translation stage is the first step in the data science process. To be specific, we take a question and turn it into a mathematical model. However, the translation process is not well defined since we often have to make important decisions. In other words, even something as straightforward as a measurement may be modelled in a variety of ways. for instance, How would one evaluate a business? by the amount it earns, the number of people it employs, or both? Do we evaluate its effects on the environment? What in this circumstance constitutes progress?

Once we've made a decision, especially if it's seen as an important measurement, we frequently find ourselves optimising to that progress bar, sometimes without checking to confirm that the progress it measures actually corresponds to the definition of progress we truly want to use. Even though the problem is quite well described, the evaluation of the solution still has to be done carefully. Selecting an inspection metric is a difficult process that, by all accounts, needs to be included in a model. After all, we have no reason to think a model is teaching us anything at all if there is no method to assess its efficacy. Even yet, it's common to see bogus assessment measures used, particularly if that poor decision might be profitable.

for e.g:- 'Do their advertisements result in sales that they otherwise wouldn't have seen?' This is what advertisers want to know. They rely on indicators like 'did that individual buy the product after clicking on the ad?' as proxies because it's difficult to determine it directly without mind-reading. 'Did that individual click on the ad?' is a question that is frequently asked since the available data is so scarce. 'Did they even notice the advertisement?' everything mentioned as the above example could be interesting and helpful especially to keeptrack or for monitoring the stats. But the real challenge would be whether the main question is being addressed, which is - Do these ads get people to buy the product who would not have purchased that product anyway?

Even by making using of Split-Testing(commony known as A/B-testing) is a mess. people clear cookies after they are done and its hard to make track of monitoring data. without A/B-testing, there are occasionally explicit confounders, which is worse. similar e.g like above is people who view ads for perfume on a site for high-end goods are more likely to buy perfume. additionally the team at randomwebsite.com has uncovered a world of phantom clicks generated by randomly that are clearly useless and never result in a purchase. If advertisers knew that using click-through rates to gauge success was pointless, one might assume that they would stop using them. But that's not what you perceive because the alternatives are constrained, habits are hard to break, and there are numerous bonuses calculated using these exaggerated metrics.

Let's take a look at a common problem-framing illustration:

1. The Netflix Prize-winning solution was never put into practise. Because it was so difficult, the engineers gave up. This was a straightforward framing issue, where the query should have put a cap on the complexity of the model itself rather than just correct ratings. 2. Let's now give an illustration of how people cling to a false sense of accuracy. People will frequently demand on knowing the r^2 to three decimal places in noisy data environments where the error bars are larger than the absolute value. This means that you are stuck in a place where you are unable to even determine the general direction of the outcome, let alone an exact solution. Or you can come into people who are fixated on the idea of accuracy for a rare-event model while the most inaccurate model is the one that gives everything a probability of zero. In that situation, a excellent model could also be a little model.

There are several things at stake in this situation. A data team frequently works on something that is designed for a particular definition of accuracy when the real objective is to cease making losses. I think the hardest part of becoming a successful data scientist is framing the question effectively.

A particular optimization technique's default functions frequently ignore the type of error being made a false negative may be significantly worse than a false positive, for example—but this is frequently not the case in real-world situations. Is the company's primary concern the same as your success metric? Politics often play out here. We occasionally observe deception in the naming or appraisal of a project or model when framing a challenge. Since it's difficult to quantify progress, we instead calculate GDP. Instead, why not consider the median household's quality of life? or for the least fortunate? Because determining the worth of our labour is difficult, we instead look at titles and pay. Instead than focusing on who has followers, which is biased toward young people without jobs, it is better to know who has influence(i.e., people with less influence). Consider who stands to gain from a poorly picked success metric or poorly defined progress bar.

4. Rejection Of Perverse Incentives by People:

As we've often observed, models, particularly high-stakes ones where participants' quality of life is at stake, beg for gaming. But for some reason, especially when they stand to gain from the gaming. we frequently observe modellers ignoring this feature of their models. However, faulty models or models with poor evaluation measures can nonetheless result in negative feedback loops even when there is no direct gaming. lets see about gaming, It's crucial to remember that it's not always possible to game a model, and the degree to which it is depends on the kind of proxies utilised and how powerful they are.

for e.g: The credit score model rates rather well on the gamability scale. For example, we are aware that timely bill payment will help us raise our credit score. Actually, the majority of people wouldn't even call that gaming. another instance is a highly transparent, proxy-based, high impact model will be gamed. It must function when the persons being judged by the model are aware of how it operates; it is insufficient to demonstrate that it functioned on test data prior to implementation.

that's the reason why most of the businesses use dashboard approaches rather solely relying on one inaccurate metric. But that depends on the background of the game loop which is sometimes referred as Campbel's Law or Goodhart's law. At the sametime, being cautious is necessary as the measurement itself will affect the same item your are trying to quantify. (Note: 1. Cambell Law - When a metric is used as a key predictor of success, its capacity to measure success effectively is usually reduced. 2. Goodhart law - as soon as you start using measures to try and affect outcomes, those measures become less useful)

Poor Designs-Testing of Data:

If one can imagine influence happening in real life, between people, then they can imagine it happening in a social medium. If it doesn’t happen in real life, it doesn’t magically appear on the Internet. for e.g: If a famous person tweeted that his fans should go see a movie, they would undoubtedly follow his advice because they would see him sitting next to them in their living room urging them to watch the film. they would follow his advice without question. If a celebrity were to advise someone to slim down while they were hanging out, one would just feel horrible and strange. Because no one, not Oprah, not Dr. Oz, not the family doctor, has discovered a magic drug that reliably persuades common people to make significant long-term improvements in their weight. Well, maybe if they used a knife to cut out a piece of one's stomach, but even that strategy is debatable. The truth is that this is a tremendously difficult problem for our entire society, and it's unlikely that it can be resolved by just repurposing that data. It would be unrealistic to imply that it could be resolved in that way.

There is a smell test which claims that just because something is mathematically formulated doesn't make it magically impact people. At best, it is an echo of the influence that is actually being used in real life. I haven't come across a counterexample to this yet. Anyone claiming to be a data scientist and that they will pass this smell test should stop right now since it only serves to increase the hype and noise surrounding big data.

Having said that, there are interesting aspects you can observe using twitter data such as how information spreads and the fact that it does. Instead of expecting that people will do things on social media that they would never do, let's have a look observing how people use it to accomplish tasks more quickly. (Note: using twitter specifically because being one of the biggest social media platforms, it has unique capabilities that, unlike other social media platforms allowing both consumers and brands to let loose, build relationships, and optimize engagement and also the best marketing platform with no-cost next to facebook)

Not Trusting Data Enough:

On the opposite end of the spectrum, we have a issues with different and occasionally more severe implications: underestimating the strength of models and data. You might easily miss out on good business prospects if you underestimate the value of data and data science, which is a not so tragic result. Naturally, if you miss out on possibilities to streamline with regard to the data at hand, your opponent will. This is just a fact of life in a competitive market. Underestimating facts has further tragic outcomes. Consider the unemployment rate and the housing situation as a starting point, as the crisis is still there. a large portion of this can be attributed to inadequate home price and mortgage derivative models, as well as meta-economic modelling of how much and how quickly middle class families can legitimately take on debt and whether we need derivatives market regulation. You could say that those were the results of financial and economic models, which differ somewhat from the models used by contemporary "big data" practitioners(but the more recent big data models might be even riskier)

People Don't Use Math to Estimate Value:

We hardly ever see models employed outside of finance, despite the fact that there are numerous techniques to obtain rough estimations of them. What volume of data should be anticipated? What is the data's anticipated signal? What volume of data do we have? What are the possibilities for applying this model? What are the benefits of those chances? If this model is successful, what is its scale? What is the likelihood that the concept is reliable? Once you've combined everything, you should get a value that represents an estimation of your model. A similar approach to back-of-the-envelope thinking can be used to assess company models. for e.g: if you take about Ice-Creams, What is the market for anticipating consumer interest in a certain flavour of ice cream? What would customers be willing to pay for a reliable ice cream flavour predictor? how much will i pay?(from individual/personal-point-of-view) How reliable are the preferences for ice cream data and how much of it can we collect to train our model? What is the real-time cost of that data?

Putting the Quant in the Back Room:

Quants or data scientists(terms I use interchangeably are often treated almost like pieces of hardware instead of creative thinkers and this is a mistake, but it’s understandable for a few reasons. i'll tell you why) 1. Their communication is nerdy. This becomes an obstacle for communication with business people right off the bat. 2. They frequently lack the topic knowledge necessary to contribute fully to debates on corporate strategy. This is another barrier, but it's overstated because mathematicians are skilled at picking up new information quickly and will do the same with domain knowledge. 3. They possess a seemingly magical ability that makes it simple to categorise their place in a company, especially since no one else is able to perform the same function. But that doesn't imply they couldn't perform that task even more well if given additional context. 4. If the quants were actually given data from the business in this situation, they'd probably figure out it's a money game because sometimes businesses don't actually want data people to do meaningful work—they just hired them as ornaments for their business, as marketing instruments to prove they're on the cutting edge of big-data.

Interpreting Skeptism As Negativity:

The fact that quants don't always offer positive news is another factor contributing to their general underappreciation. it is so because, If you give a data scientist a meaningful task to complete and assume they are capable and interested, they may very well be more familiar with the company's internal functionality than the owner. Ignoring them is like to a doctor disabling a heart monitor to prevent unpleasant news about a patient.

Ignoring Cultural Consequences:

While many of the ways that models can influence culture are rather complex and challenging to evaluate, there are other ways that are not as subtle or challenging to measure, nonetheless come under the category of "not my issue". This phenomena, known as an Externality in Economics. In simple terms, its an indirect cost or benefit to an uninvolved third party that arises as an effect of another party's activity. This is renowned for being challenging to manage. How, for instance, can pollution caused by businesses be held accountable when the consequences are hard to quantify and spread out over a long period of time? It is equivalent to considering the whole public as a stakeholder in the model design when we understand that models have the potential to cause problems and massive feedback loops. Modeling may have long-term impacts, but we got to ensure that any such effects are benign or, in the event that they aren't, that any such costs are outweighed by the benefits. for instance, people look for those public schools that are good when deciding where to buy a house. As a result, homes in those communities cost more since the score of the houses to live is desirable. This apparent feedback loop potentially weakens the degree to which education can legitimately be referred to as "public education" and widens the gap between men and women in terms of equality. If you spend a extra money for decent schools, you are simply pricing out the poor from using that resource.(Like to take a moment to thank EA™ Studios for developing SIMCITY the game franchise, the whole simulation layout of the game uses this concept played using real-time scenarios and I'm glad i played it, helped me understand the whole thing)

Lets re-check the assumption N=ALL that we often make in modelling. Some people argue that argue that part of the power of the big data revolution comes from the fact that unlike in the past where we had to work with small sample sizes and get only approximate results nowadays in our world of GPS, tracking, and sensors, we have all the data we’d ever need. This assumption is true, of course, because we now have a lot more knowledge about how individuals behave. However, the degree to which the "N=ALL" criterion is violated is crucial to comprehending how our models affect culture. Who is missing from the data? Whose vote was disregarded? Are we using a model that has been tested on one group, such as those in democratic nations like the USA, India, Sweden, and Norway, and applying it to a completely other demographic? What's the outcome?

The Quick Test for Big Data:

There is nearly always a predator and a prey in modelling scenarios. And you are the predator 9 out of 10 times as a modeller. In other words, you're probably manipulating people while also trying to convince them to do something—buy something, pay attention to something, commit to something, or provide their data in some way. That's not to say you're not providing them something in return, but let's face it, most customers have no idea how much their contributions are worth or even that they've engaged in a transaction, so it would be illogical to claim that it's a fair deal. Imagine you don't know who the prey is but you are actually making money.

Conclusion:

I hope I've provided a strong argument for what I've written in this article especially for data scepticism(as i found giving more priority to it is worthy) A healthy dose of scepticism is beneficial for productive creation rather than producing a negative and frightened atmosphere that paralyses and intimidates us. However, that does not imply that it is simple to develop or keep up. We may want to look into data scepticism as it now exists in academia or in DBs to see examples of existing centres of scepticism. Unfortunately, I get the idea that there is an excessive gap between the data culture practised by academics and that of actual practitioners. the gap on existing code and algorithms and the hands-on experience is a severe barrier, at least for the time being, even though academics are considering the correct issues regarding the cultural effects of modelling. Therefore, we must establish a place for doubt. Given the pressure to succeed that comes with the VC mentality of startups and the traditional cover-your-ass corporate culture of larger organisations, this is a difficult job. In other words, it's better to find a space for scepticism even though there isn't a simple formula for doing so (as I've said before). No one has ever given up trying. I believe the following stage is to compile examples of what works well.

Let's end on a happy note with some pretty good news - First, great tools are being developed right now(using AI, Machine Learning etc.,) that should be very beneficial in the pursuit of meaningful data storytelling, communication, and sharing. Products like the IPython Notebook, which enables data scientists to not only share code and results with nontechnical people but also to develop a narrative outlining their thought processes along the way, are being improved by the open source community. The medium-term objective would be to provide that non-technical person the opportunity to interact and experiment with the model in order to develop understanding and intuition. This will go a long way toward developing open communication with the firms.

Second, data used properly is a powerful force for good, despite all the harm that improperly applied data may cause. Although using open data is not a magic bullet, we are seeing more and more examples of this with projects like DataKind and other organisations. Data is a tool, and like all tools, its effectiveness depends on how it is applied. Furthermore, whether a data application is good or poor doesn't even depend on whether the data was used incorrectly; bad guys can make excellent data analysis mistakes just like good guys (and most guys believe they are good guys). But if you cultivate a healthy scepticism, which is to say, a habit of mind to question and insist on understanding the reasoning behind the findings, you're more likely to utilise data successfully and comprehend how others are using it.