Why I built GetData.IO

When training neural networks, until enough training data is collected there will be a period when the output of the neural network is full of false positives and false negatives (aka junk)

The same could be said of the human brain (a biological neural network), unless you have access to another human brain whose output you can totally trust and rely on, there will be a prolonged period when you fumble around while struggling to gather enough data to build a mental model of the new domain.

Based on my experience, the main challenge when breaking into new domains is that no pre-trained neural networks exists. During such situations, expect a prolonged period of confusion and fumbling around. Persistence (aka brute force iteration) is probably the only thing you can fall back on.

Thankfully, totally new domains seldom exists. Whatever “new domain” you think you are trying to break into, someone else is probably either doing it right now or has already done it.

That is why I built GetData.IO. It is to help people who need data to make good decision quickly find the data they need as well as people who might already have a trained model.

Thoughts on intuition and logic

The limbic and reptilian parts of the human brain have had more time to evolve. Compared to these two parts, the frontal lobe of the human brain where logic resides is a more recent phenomena. Ironically, the logical functions carried out by this part of the brain, which makes man distinct from other animals, are the ones most easily replicated by machines.

The entire human body, not just the deliberate thinking portion of it, should be considered to be a neural network. Using only the thinking portion of the human body for decision making purposes is sub-optimal. This is especially true for a human that has actively engaged in calibrating his body for a specific purpose. Prime examples are deliberate cultivation and heavy reliance on muscle memory by professional athletics, chefs, actors, music composers and detectives.

Intuitive gut feel can be considered muscle memory cultivated over time for specific functions yet expressed as formalized equations. To free up time, individuals can actively convert what they “intuitively know” into formalized equations and have the corresponding functions delegated to machines. Thereafter they could either further compound the effects of this process by building up muscle memories in other domains or sit by the beach and do nothing.

Humans will always have a role available to play in the future regardless of society’s degree of automation.

Related readings

 

The AI economy, Roger Bootle

Paradoxes

  • Polanyi Paradox
  • Moravec’s paradox

Key skill sets for the AI era

  • complex communication
  • Creativity
  • Strategic thinking / critical thinking
  • Empathy / humanity

Key themes

  • AI as labor cost versus AI as capital expenditure
  • Taxes on AI development versus edge in global competition
  • Labor versus leisure
  • Global positioning
  • Population size as advantage for big data

Evening out watching Rambo, last blood

From Sujit on trading

  • not necessary to get numbers further back than six months
  • stock market subjected to fractal distribution
  • it is possible to generate returns of up to 140% per year by trading on stocks that are moving within a range
  • going all in on each position each time leads to a very low Sharpe ratio
  • Sharpe ratio should be calculated separately for method and for SnP benchmarked against US treasury interest rates. The difference is the actual returns

On Rambo Last blood

A movie is a reflection of the culture and attitude of an age. Rambo was a very popular cultural icon during the eighties and the early nineties when memories of the Second World War and the Cold War against the communist were still very fresh in the minds of the people in America.

If you looked at the world today through the eyes of someone like Rambo, you would have been able to easily draw facts to back the narrative painted by Trump prior to being elected president.

When operating in an environment of uncertainty, a decision maker formulates multiple often competing narratives in the head that best explains majority of the facts presented. He calibrates the weightage assigned to the probability of each narrative as new pieces of data become available. He simultaneously utilizes multiple ones that are assigned high plausibility in his decision making to strive for the best possible expected outcome . It is a cognitively demanding iterative activity that goes on indefinitely.

  • common themes between movie and Trump’s narrative
    • Mexico drug cartels
    • Mexico prostitution rings
    • The world is a dark place
    • illegal border crossing
    • poor border fence
    • white male
    • Rust belt
    • Protagonist is in his 70s
    • Freedom fighter who fought the communist in Vietnam and Russia
    • guns and lots of blood
    • man of steel
    • manifest destiny
  • Cognitive biases
    • Narrative fallacy
    • Framing bias
    • selective bias

Related readings

  • Expert political judgement, Philip Tetlock

Afternoon with Tomasso on the limitations of Artificial intelligence’s application

For us to be able to successfully apply artificial intelligence on any domain, the following needs to be true

  • The behavior the system to be modeled must not be stochastic
  • The state of the system must be decipherable by the data scientist
    • it should be possible to understand the state in which the system is at through interpretation of data gathered
  • The domain can be modeled
    • the parameters for modeling the domain must be well defined

Only when all three premise are true can we determine where the adjustment should be made when a model fails to predict an outcome

The financial markets is stochastic  in the short run.

The underlying parameters are constantly changing and thus hard to model due to the emergent nature of impacts caused by human activities. The data is qualitative and thus hard to convert into clean quantitative datasets.

While the price movements are obvious it is hard, it is hard to attribute impact to the various parameters.

As such, it requires human neural networks that consumed all these qualitative data to perform the prediction/decision making.

The case for hastening the replacement of workers with AI

If the issue of aging population is an inevitable affliction of all industrialized countries and majority of countries will become industrialized within the next 30 years, then we should be expecting our population to collapse by 2050. Based on this premise rather than being worried that majority of workers will get replaced by Robots and made irrelevant, we should instead be worried that robots are not replacing tasks handled by forthcoming retirees fast enough,

Related References

https://www.bloomberg.com/amp/news/articles/2019-07-24/u-s-truck-driver-shortage-is-on-course-to-double-in-a-decade

https://amp.businessinsider.com/elon-musk-reiterates-global-population-is-headed-for-collapse-2019-6

 

Book summary AI super-powers, China, silicon valley and the new world order by Kai Fu Lee

The difference waves of AI

  • Internet AI – Facebook, Netflix, Google search
  • Business AI – Palantir
  • Perception AI – Tesla cars
  • Autonomous AI – Tesla self driving, Google self driving

Key locations

  • Silicon Valley
  • Zhong Guan Cun – Beijing

State of the Union

  • We are in the stage of implementation/application as opposed to RnD
    • having access to more data is more important than have expertise to do more RnD
    • having solid AI engineers is more important than AI researchers
  • We are still far from general AI
  • Key ingredients
    • data
    • computing
    • maybe work of strong AI algorithms engineers

Key differences between eco-systems

  • Silicon Valley businesses are mission and core values driven while Chinese businesses are pragmatically focused on profitability.
  • Silicon Valley businesses stay in bits and binaries offloading the brick and mortar to external vendors vendors while Chinese businesses extend their business model into the brick and mortar (online to offline)
  • Silicon valley prefers one size fit all strategy, Chinese businesses utilized localized solutions often investing/acquiring in local startups
  • Americans treat search engines like Yellow Pages (come and leave fast) while Chinese treat search engines like shopping mall (come to linger around long)
  • Silicon Valley is adversed to copying preferring to be unique Chinese business copy the heck out of each other

Chinese Advantage

  • Abundant data – quality and quantity aided by their online to offline initiatives
  • hungry entrepreneurs
  • AI scientist
  • AI friendly policy environment – strong emphasis by Chinese government
  • Hardware manufacturing know how – Shen Zhen
    • unparalleled supply chain flexibility – XiaoMi

Silicon Valley Advantage

  • Microchip manufacturing know-how

Trends within the Chinese eco-system

  • Darwinian eco-system has lead to extreme levels of competition
  • Chinese companies have already moved past the stage of clone Silicon Valley business models
  • Businesses innovate to build a defensive moat around themselves. Local businesses have advantage, with no timezone differences to deal with, decision making is relatively faster.
  • Online to offline
    • an essential ingredient to building strategic moats
    • caused the decline of cash use
  • Chinese government information systems will be able to leap frog US government information systems

Policy approaches

  • Google – impeccable safety
  • Tesla / China – trial by fire
  • key to winning the Autonomous AI race
    • is the bottleneck technology (Silicon Valley) or policy (China)?

Key concerns

  • having cheap labor is no longer going to be a source of advantage in a world heavily powered by automaton.  Developing countries hoping to employ this well tested strategy to progress will not be able to do so anymore
  • Estimated 60% potential job loss worldwide barring policy interventions
  • Job loss probability assessment
    • physical labor
      • environment – unstructured versus structure
      • tasks nature – level of dexterity versus high dexterity
    • cognitive labor
      • social – high versus low
      • cognitive – optimization based versus creativity/strategy based
  • AI replacement approach
    • single tasks approach
    • ground up rethink re-imagination
  • A population of irrelevant (no longer employable) as opposed to unemployed

Tackling Key concerns

  • Silicon valley – reduce, retrain and redistribute
  • Kai Fu Lee – stipends for care, service, education

New promise

  • Humans freed up from repetitive tasks can now focus on becoming more human oriented

Related readings

  • Disruptor, Zhou
  • www.Arvix.org – an online repository of scientific papers
  • Folding Beijing – Hao JingFang

Mark Zuckerberg chats with Yuval Noah Harrai on the Future of AI

Key take aways

  • Spread of inequality where some countries have the ability to harness AI while others don’t
  • AI based recommendation systems moving from being just an oracle to becoming a sovereign
  • AI as a tool is an amplifier
    • concerns that it will benefit totalitarianism more than democracy leading to totalitarianism becoming a more favorable governance model worldwide
    • surveillance
    • psychological manipulation – the inability to know your true self through your thoughts
    • what happens if morality and expediency diverge when it comes to governance
  • Effectiveness of curbing the negative effects of AI by encoding values within policy frameworks governing these AI based systems
    • Companies based in Democratic countries will encode democratic values within their systems vice versa for Totalitarian countries
  • Personalization versus Fragmentation
    • when everyone in a country chooses his own community that is mainly online there is no longer a glue holding the local community together
  • Long term versus short term
    • The long term benefits might come sooner than expected when taking a short term trade off

 

Book summary: Everybody lies by Seth Stephens-Davidowitz

Signals from Search

  • What people search for is in itself a signal
  • The order of keywords in which they search is also a signal
  • Quality of google search data is better than in Facebook because
    • You are alone with no fear of being judged
    • You have an incentive to be honest

On Big data

  • the needle is still the same size but the haystack has been getting bigger
  • Be judicious by cutting down the sample size of the data to be used

Data science

  • Trust your intuition as the initial signal but verify quantitatively to avoid narrative bias
  • Correlation is most often sufficient for utilization purposes – often the explanation of why the model works comes after the fact
  • critically assess the actual data underlying the narrative. At times it might tell a very different story than narrative presented
  • Clustering of groups of people helps predicts behavior – Netflix and baseball
  • AB testing to discover causations

Social Impact

  • great business are found on:
    • secrets about nature
    • secrets about people
  • Modeling
    • Physics – utilize neat equation
    • Human behavior – probabilistically via Naive Bayes classification

Managing angry people

  • Lecturing them will provoke their anger
  • Provoking their curiousity will cause their attention to be diverted causing anger to subside

Related readings

  • Zero to one, Peter Thiel

Book summary – The Signal and the noise

Risk versus uncertainty

  • Risk can be mathematically modeled to yield a probability
  • uncertainty cannot be mathematically modeled

Conditions for quality data

Why google’s Search data is better than Facebook profile data

  • subject feels she has privacy privacy
  • subject feels she is not judged
  • subject sees tangible benefit from being honest

The hedgehog versus the fox

  • The hedgehog approaches reality through a narrative/ideology while the fox thinks in terms of probabilities
  • The hedgehog goes very deep in an area while the fox employs multiple different models
  • The fox is a better forecaster than the hedgehog
  • The fox is more tolerant of uncertainty

Big data

  • More data does not yield better results and predictions
  • Deciding the right kind of data from the abundance available
  • To do prediction it is important to start from intuition and to keep model simple
  • qualitative data should be weighted and considered
  • Be self aware of your own biases

Prediction

  • Similarity scores – clustering in Netflix and baseball
  • Be wary of confirmation biases
  • Be wary of overfitting using small sample size – Tokyo earthquakes and global warming
  • Correlation does not equal causation
  • short hand heuristics to reduce the computational space – for example chess

Related references

  • Irrational exuberance, Robert Shiller
  • Expert political judgement, Philip E. Tetlock
  • Future shock, Alvin and Heidi Toffler
  • Principles of forecasting, J Scott Armstrong
  • Predicting the unpredictable, Hough