Summary of readings and conversations for the week

Trends observed

  • Worsening income inequality
    • driven by increased globalization and automation with failure in re-education as the primary cause
    • Continued low worldwide interest rates as central banks the world over struggles to prop up inflation rate at 2%
  • Rise in protectionism around the world in response to income inequality
    • Slowing trade volumes around the world
  • Demand saturation at the upper income segments
    • Slowing demand for housing in South Bay
    • Too much money chasing after too little deals

Related sources

Book summary AI super-powers, China, silicon valley and the new world order by Kai Fu Lee

The difference waves of AI

  • Internet AI – Facebook, Netflix, Google search
  • Business AI – Palantir
  • Perception AI – Tesla cars
  • Autonomous AI – Tesla self driving, Google self driving

Key locations

  • Silicon Valley
  • Zhong Guan Cun – Beijing

State of the Union

  • We are in the stage of implementation/application as opposed to RnD
    • having access to more data is more important than have expertise to do more RnD
    • having solid AI engineers is more important than AI researchers
  • We are still far from general AI
  • Key ingredients
    • data
    • computing
    • maybe work of strong AI algorithms engineers

Key differences between eco-systems

  • Silicon Valley businesses are mission and core values driven while Chinese businesses are pragmatically focused on profitability.
  • Silicon Valley businesses stay in bits and binaries offloading the brick and mortar to external vendors vendors while Chinese businesses extend their business model into the brick and mortar (online to offline)
  • Silicon valley prefers one size fit all strategy, Chinese businesses utilized localized solutions often investing/acquiring in local startups
  • Americans treat search engines like Yellow Pages (come and leave fast) while Chinese treat search engines like shopping mall (come to linger around long)
  • Silicon Valley is adversed to copying preferring to be unique Chinese business copy the heck out of each other

Chinese Advantage

  • Abundant data – quality and quantity aided by their online to offline initiatives
  • hungry entrepreneurs
  • AI scientist
  • AI friendly policy environment – strong emphasis by Chinese government
  • Hardware manufacturing know how – Shen Zhen
    • unparalleled supply chain flexibility – XiaoMi

Silicon Valley Advantage

  • Microchip manufacturing know-how

Trends within the Chinese eco-system

  • Darwinian eco-system has lead to extreme levels of competition
  • Chinese companies have already moved past the stage of clone Silicon Valley business models
  • Businesses innovate to build a defensive moat around themselves. Local businesses have advantage, with no timezone differences to deal with, decision making is relatively faster.
  • Online to offline
    • an essential ingredient to building strategic moats
    • caused the decline of cash use
  • Chinese government information systems will be able to leap frog US government information systems

Policy approaches

  • Google – impeccable safety
  • Tesla / China – trial by fire
  • key to winning the Autonomous AI race
    • is the bottleneck technology (Silicon Valley) or policy (China)?

Key concerns

  • having cheap labor is no longer going to be a source of advantage in a world heavily powered by automaton.  Developing countries hoping to employ this well tested strategy to progress will not be able to do so anymore
  • Estimated 60% potential job loss worldwide barring policy interventions
  • Job loss probability assessment
    • physical labor
      • environment – unstructured versus structure
      • tasks nature – level of dexterity versus high dexterity
    • cognitive labor
      • social – high versus low
      • cognitive – optimization based versus creativity/strategy based
  • AI replacement approach
    • single tasks approach
    • ground up rethink re-imagination
  • A population of irrelevant (no longer employable) as opposed to unemployed

Tackling Key concerns

  • Silicon valley – reduce, retrain and redistribute
  • Kai Fu Lee – stipends for care, service, education

New promise

  • Humans freed up from repetitive tasks can now focus on becoming more human oriented

Related readings

  • Disruptor, Zhou
  • www.Arvix.org – an online repository of scientific papers
  • Folding Beijing – Hao JingFang

Trends associated with GetData.IO

Alternative Data

Robotic Process Automation

Book summary – The Signal and the noise

Risk versus uncertainty

  • Risk can be mathematically modeled to yield a probability
  • uncertainty cannot be mathematically modeled

Conditions for quality data

Why google’s Search data is better than Facebook profile data

  • subject feels she has privacy privacy
  • subject feels she is not judged
  • subject sees tangible benefit from being honest

The hedgehog versus the fox

  • The hedgehog approaches reality through a narrative/ideology while the fox thinks in terms of probabilities
  • The hedgehog goes very deep in an area while the fox employs multiple different models
  • The fox is a better forecaster than the hedgehog
  • The fox is more tolerant of uncertainty

Big data

  • More data does not yield better results and predictions
  • Deciding the right kind of data from the abundance available
  • To do prediction it is important to start from intuition and to keep model simple
  • qualitative data should be weighted and considered
  • Be self aware of your own biases

Prediction

  • Similarity scores – clustering in Netflix and baseball
  • Be wary of confirmation biases
  • Be wary of overfitting using small sample size – Tokyo earthquakes and global warming
  • Correlation does not equal causation
  • short hand heuristics to reduce the computational space – for example chess

Related references

  • Irrational exuberance, Robert Shiller
  • Expert political judgement, Philip E. Tetlock
  • Future shock, Alvin and Heidi Toffler
  • Principles of forecasting, J Scott Armstrong
  • Predicting the unpredictable, Hough

Insights from dinner with Josh

Grepsr is increasingly being used in the work place by Quid.

Business people that don’t know how to code use Grepsr to pull data.

There is increasing demand for DIFFs to identify thematic trends. Themes are extracted from articles through the use of NLTK.

The proliferation of machine learning libraries and the maturing of the semantic web is democratizing the access to insights.

The legalization of online sports betting has open a fertile ground for this trend towards democratization.

NBA basketball predictive modeling should be done at the players level instead of the team level as the data becomes too lossy.

The odds of sports books at the opening lines is to encourage even bets on both side. The odds of the closing lines is a weighted average of bets (signals) from the crowd.

Insights from Klaren’s birthday

Conversations with Yi (EverString)

The forthcoming trend for engineering

Machine learning is increasingly becoming commoditized. DevOps becomes more important. Demand for specialized service where DevOps is encapsulated will further increase as demand for engineering tasks further outstrips engineering supplies.

On lead generation market

Companies in the lead generation space have need for scalable web crawlers. This helps offset the cost of retaining three in-house engineers.

Lead generation space has consolidated. There were priorly 120k such companies. There is 7k companies in operation. Majority of players are generating leads by scraping LinkedIn.

Consumer space require constant development of new features. Enterprise space requires service heavy. Enterprise space requires not just lead generation but entire channel marketing service suit (physical mail, online advertising, email marketing)

Lead gen hard to retain. The list becomes less valuable once it’s been used. 80% yearly churn is normal. One company reduces yearly churn to just 10% this by reducing second year subscription from USD800/yr to USD200/yr. further discount to USD100/yr if they don’t like. Recurring service is for grabbing fresh leads from same data source.

On Tele conference

Zoom’s product team compared with UberConference has developed a better understanding of the true conference needs of their users in various context. They have worked harder to ensure their product work seamlessly in identified scenarios. A typical example is the ability to join s conference bybthe press of a button on their mobile phone while driving instead of having to type the typical 4 pin digits.

Tesla autonomy

https://www.youtube.com/watch?v=tbgtGQIygZQ

The Mission

Building and optimizing the entire infrastructure (hardware and software) from ground up with autonomous self driving as the mission

Mission and decision making

Design decisions are made with trade off between functionality and cost to achieve the mission while keeping cost in control

  • Lidar is not useful when cameras are available
  • driving cars with HD mapping makes the entire operation brittle since actual road conditions can change

Operating structure

  • Data Team
  • Hardware Team
  • Software Team

The Data model

  • Cars on roads are constantly collecting new data
  • New data is being utilized to train and improve neural network model
  • New improved model is constantly being deployed back to the car to improve self driving
  • Real world data provides visibility into long tail scenarios that simulated data cannot. Simulating long tail scenario is an intractable problem
  • Balancing between data model and software
    • Neural network is suitable for problems that are hard to solve by defining functions / heuristics
    • Simple heuristics are better handled through coding in software

Future revenue model

Robo-taxi that will disrupt the ride-sharing space.

  • Consumer car – USD0.60 / mile
  • Ride sharing – USD 2-3 / mile
  • Telsa Network – USD 0.18 / mile

Main challenges:

  • Legal – need more data and processing time to get approved
  • Battery capacity
  • Social norms around robo taxi

Insights from the week

From Connie (Edmodo)

  • the key to consulting is to organize data into high level mutually exclusive buckets to allow easy defeating by decision makers

From Tim (Edmodo)

  • Kano model

From Val (Totango)

  • Company is concerned with increasing revenue and profitability. This will drive higher valuation during further exit

From Yip (ATT)

  • Analytics from Facebook page comments and twitter hashtag
  • need to balance customer support demand and cost of running department:
    • customer support hotline
    • Direct comments from influencers  which trigger negative sentiment to support staff
  • Business analyst reads comments manually to get qualitative needs and understands business needs
  • Data scientist explores data might not know the business needs
  • business analyst have problems working with data scientist
  • tools to help business analyst get directly at the insight instead of via data scientist
  • build model to predict call support volume by category
  • build model to quantify feature demand level needs
  • correlation of weather and commodity prices

Insights on managing Big Data from meet up with Dean and Ved

From Dean (Reputation.com)

  • Enterprise sales as an acquisition strategy is feasible because revenue per account ranges in the USD millions – e.g. 70 million USD
  • Once an auto company like Ford or GM signs up, they will start bringing their dealerships in
  • The infrastructure needs to be able to support the size of the data which can be up to billions of rows
  • Scaling of infrastructure to handle load ever increasing data becomes critical for the continued growth of the data company
  • Data Product will appear broken when user attempts generate report while the data is still being written into the database
  • The key challenge is that different solution is suitable for different operation
  • Types of data operation include
    • writing into the database
    • reading from the database
    • map reduce to generate custom view for data in the database to support different types of reporting for different departments in the client companies.
  • Successful data companies will create different layers of data management solutions to cater to the different data needs
    • MongoDB
      • good for storing relatively unstructured data
      • querying is slow
      • writing is slow
      • good for performing map reduce
    • Elastic Search
      • good for custom querying for data
  • Dev ops become a very important role
    • migration of data between different systems can extend up to weeks before completion
    • bad map-reduce query in codes while start causing bottlenecks in reading and writing causing the data product to fail
    • dev ops familiar with infrastructure might on occasion have to flush out all queries to reset
    • The key challenge is the inability to find bandwidth for flushing out bad queries within the codebase
  • Mistakes in hindsight
    • In hindsight lumping all the data from different companies into the same index on MongoDB does not scale very well
    • Might make better sense to create separate database clusters for different clients
  • Day to day operations
    • Hired a very large 100 strong Web Scraping company in India to make sure web-scrapers for customer reviews are constantly up
    • Clients occasionally will provide data which internal engineer (Austin) will need to look through before importing into relevant database
  • Need to increase revenue volume to gear up for IPO
  • The Catholic church has 10 times more money than Apple and owns a lot of health care companies.

From Dan (Dharma.AI), the classmate of Ved

  • Currently has 15 customers for their company
  • Customers prefer using their solution versus open source software because they can scale the volume of data to be digested and solution comes with SLA
  • Company provides web, mobile and table solutions which client companies’ staff can use in the field to collect demographic and research data in developing countries
  • The key challenge is balancing between building features for the platform and building features specific verticals:
    • Fields differ between industry: fields in the survey document for healthcare company will be very different for fields in the survey document for an auto company
    • Fields differ between across company size: survey format for one company might be different as compared to another in the same industry but of different size
    • Interface required is differs between companies
  • Original CEO has been forced to leave the company, new CEO was hired by PE firm to increase revenue volume to gear up for IPO

From Ved

  • As number of layers increase in the hierarchy, it becomes increasingly challenging for management to keep up to date on the actual situation in the market
  • New entrant of large establish competitor might sometime serve as an opportunity to ride the wave
  • when Google decided to repackage Google Docs for Education, it was a perfect opportunity for Edmodo to more tightly integrate into Google and ride that trend rather than being left behind
  • Failure to ride the wave will result in significant loss of market shares
  • It takes a lot of discipline to decide on just focusing on the core use case and constantly double down on it.
  • Knowing that a critical problem, which could potentially kill the company, exists versus successfully convincing everyone in the company that it is important to address it are two different things.

Insights from visit to far west fungi farm

On mushroom

  • Get woods chips from petco
  • Alder or ashpen shavings
  • 6 inch to a foot at the bottom
  • Every few months 2 inch on the top
  • use Sundew, a carnivores plants to get rid of insects
  • Go to blue bottle cafe for burlap sacks
  • Don’t soak more spawns for more than 12 hours each time
  • If too dry soak, then keep in air for a day or two
  • Drop some clay in water to detect chlorine in water used for soaking wood and spawn

Meetups

Useful resources

  • LibGen.IO – site where free books can be downloaded
  • sci-hub.tw – site where free research papers can be downloaded
  • www.bloodhorse.com/horse-racing – site for horse racing statistics
  • Far West Fungi Farm