Key takeaways from dinner with Josh


  • CHTR – good company to buy 18% YoY revenue growth over the next 3 years
    • friend jay buys on the dip in the stock market

Machine Learning companies

  • Quids engineering team
    • majority efforts focused on DevOps
    • been losing engineers a lot lately
      • main reason is because they feel not enough investment in core technology
    • Core engineering tasks
      • zero downtime deployment
      • Monitor service to expose API calls
      • elastic search
      • daily caching of data from LexisNexis into S3 in case elastic search goes down
    • uses IBM named entity extraction for news article
      • Data not even that good
    • users trying to do search for articles related to company names
      • wants to look into knowledge graph to help auto complete. Talked with DiffBot
      • Not sure why not use in house entity database
    • User profile: digital marketers trying to do competitive analysis
      • search and curate articles about company into clusters
    • He thinks they should focus on a venture arm that invest in companies instead
    • CEO is strong business person and CTO ex very technical guy
      • prefers the external solution procurement approach instead of building critical solutions in house
      • technical person does the 3rd party solution procurement
      • they even talked with DiffBot
    • Targets for this year: wants to focus on building out targeted interfaces for specific personas like marketers
    • revenue
      • 50% consulting
      • 50% SAAS service
      • 30 million USD last year
    • feela could do so much more with the data than just clobbering on third party solutions targeted at existing use cases
    • marketing team not that good
    • sales team doing acquisition via enterprise sales approach
    • just rolled out solution similar to what Vincent company is doing

On Sales

  • Being able to define the business case really fast to help frame the cost of the problem for the prospect
  • sell a solution to the prospect for a lesser amount than the cost they are currently incurring.

Reflections on the decentralized multi-sided market place – GetData.IO

The beginning

The concept of GetData.IO was first conceived back in November 2012. I was rewriting one of my side project ( in NodeJS back then. Part of the rewrite required that I wrote up two separate crawlers each for a different site which I was getting data for.

Very soon after I was done with the initial rewrite, I was once again compelled to write a third crawler when I wanted to buy some stocks on the Singapore stock exchange. I realized while the data for the shares were available on the site, they were not presented in a way that facilitated my decision making process. In addition to that, the other part of the data I needed were presented on a separate site and unsurprisingly not in the way I needed.

I was on my way to write my fourth crawler when it occurred to me, if I structured my code by cleanly decoupling the declaration from underlying implementation details, it is possible to achieve a high level of code re-use.

Two weekends of tinkering and frenzied coding later, I was able to complete the first draft of the Semantic Query Language and the engine that would interpret this query language. I was in love. Using just simple JSON, it allowed anybody the ability to declare the desired data from any parts of web. This includes data scattered across multiple pages on the same site or data scattered across multiple domains which could be joined using unique keywords.

The Journey

Five years have past since, during this time, I brought this project through an incubator in Singapore with my ex-co-founder, tore out and rewritten major parts of the code-base that did not scale well, banged my head countless times on the wall  in frustration due to problems with the code and with product market fit, watched a bunch of well-funded entrants came and went. To be honest, quite a few times I threw in the towel. Always, the love for this idea would call out to me and draw me back to it. I picked up the towel and continued ploughing.

It’s now June 2018. Though it has taken quite a while, I am now here in the Bay Area, the most suitable home for this project given to the density of technological startups in this region. My green card was finally approved last month. I have accumulated enough runway to allow my full attention on this project for the next 10 years. Its time to look forward.

The vision

The vision of this project is a multi-sided market place enabled by a Turing complete Semantic Query Language. The Semantic Query Language will be interpreted and executed upon by a fully decentralized data harvesting platform that will the capacity to gather data from more than 50% of the world’s websites on a daily basis.

Members can choose to participate in this data sharing community by playing one or more of the 4 roles:

  • Members who need data
  • Members who maintain the data declarations
  • Members’ who will run instances of the Semantic Query Language interpreter on their servers to mine for data
  • Member’s who sell their own proprietary data

From this vantage point, given its highly decentralized nature, it feels appropriate to deploy the use of block chains. The final part that needs to be sorted out prior to the deployment of blockchain to operate in full decentralized mode is figure out the “proof of work”.

Operations available in other database technologies will get ported over where appropriate as and when we encounter relevant use cases surfaced by our community members.

Why now and how is it important?

More as I dwell in this space, I see very clearly why it is only going to become increasingly important to have this piece of infrastructure in place. There are namely 3 reasons for this.

Leveling the playing field

The next phase of our computing will rely very heavily on machine learning. It is a very data intensive activity. Given that established data siren’s like Facebook, Google, Amazon and Microsoft have over the past years aggregated huge tons of data, this have given them a huge unfair advantage which might not necessarily be good for the eco-system. We need to level the playing field by making it possible for other startups to gain easy access to training data for their machine learning work.

Concerns about data ownership

GDPR is a cumulation of concerns of data ownership that has been building for the past 10 years. People will increasing want to establish ownership and control over their own data, independent of the data siren’s use to house them. This means a decentralized infrastructure which people can trust to manage their own data.

Increasing world-wide need for computing talents

Demand for engineering talent will only continue to increase as the pervasiveness of computing in our lives increase. The supply of engineering talents does not seem like it will be catching up and short fall is projected to continue widening till 2050. A good signal is the increasingly high premium paid to engineering talents in the form of salaries over the recent years. It’s just plain stupidity as a civilization to devote major portions of this precious engineering resource to the writing and rewriting of web crawlers for the same data sources over and over again. Their time should be freed up to do more important things.

The first inning

Based on historical observation, I believe we are on the cusp of the very first inning in this space. A good comparison to draw upon is the early days of online music streaming.

Napster versus the music publishers is similar to how the lay of the land was back 5 years ago when Craigslist was able to successfully sue 3Tap.

Last year, LinkedIn lost the law suit against folks who were scraping public data. This is a very momentous inflection point in this space. Even the government is starting to the conclusion that public data is essentially public and Data Siren’s like any of the big Tech should have no monopoly over data that essentially belongs to the users who generated them.

Drawing further upon on the music industry analogy, the future of this space should look like how Spotify and ITunes operate in the modern day online music scene

What about recumbents?

Further readings

Key take aways from “Gut Feelings”

  • Gut feelings definition
    • appears quickly in consciousness
    • underlying reasons we are not fully aware of
    • strong enough to act upon
  • Laws in the real world are different from those in the logical idealized world
  • Benjamin Franklin’s Moral Algebra
    • when in doubt weigh out the pros and cons of each side by cancelling out the pros that might weigh the same
  • Our brain
    • adaptive forgetting: data is destroyed with the aggregation of information into actionable insights
    • We can only decipher the output generated by our brain (a neural network) but cannot decipher the series of logical steps taken to derive this output
    • Deliberate thinking about reasons seems to lead to decisions that make us less happy
    • Thinking too much can slow down and disrupt performance
      • The gaze heuristics for catching baseball
    • the more complex a species the longer the period of infancy
    • The short term memory rule of 7 +/- 2
    • Intelligence means making bets, taking risk seeing more than what the eye sees
  • Satisfisers versus Maximizers
    • former is reported to be more optimistic, higher self-esteem and life satisfaction
  • Rule: Create scarcity and develop systematically
    • is a viable alternative in human and organizational development
  • Less is more
    • Stock picking of familiar stocks (partial ignorance) still out perform complex analysis (extensive knowledge)
  • Man and his environment
    • Herbert Simon: A man, viewed as a behaving system, is quite simple. The apparent complexity of his behavior over time is largely a reflection of the complexity in the environment he finds himself
    •  Steve Jobs, structured work place to maximize chance conversations
    • In an uncertain environment good intuitions must ignore information
    • Quality heuristics: we equate recognition to quality – Goldstein and Gigerenzer, 2002
      • why marketing might work in short run despite shitting products
    • One-reason decision making – a short cut people use despite official guidelines
      • implies a fast and frugal decision tree


  • Simple rules for a complex world, Epstein

Brian Rothenberg – Building market places from zero to billions

Key points for building market places

  • The only thing that matters: Build liquidity – the first one there wins
    • sellers can expect their items will sell
    • buyers can expect to buy items at price they can accept
    • cross side network effects – one side drives the other side
  • Always start with building building Supply first
    • seek out inefficient, fragmented markets that you can roll up
    • look for fragmented + high friction for supply sign up
    • pick method that works for your target market
    • Methods
      • trojan horse strategy – provide immediate value through the offer of the use of a tool – SAAS model
        • can help bootstrap the project
        • Eventbrite was a event management SAAS model before it became a market place for events
      • Manual hack #1 – Yelp
        • Web crawling
        • Clean data and categorize
        • pull out key insights and distribute
        • upload data into directory and invite service providers to claim their profiles
      • Brute force
        • hire 300+ remote team
      • Plug into existing liquid network
      • PR + timing
      • get customers to promote your brand
        • TripAdviser
  • Challenges
    • In the short run supply is always the challenge
    • In the long run demand is the bottleneck
  • Network effects create virtually impenetrable barriers to entry
    • retention grows
    • increasingly attractive economics
    • reaching scale as fast as possible matters
  • Competition
    • horizontals are going vertical
    • horizontals that fail to go vertical lose market shares – craiglists
    • moving forward focus on building niche networks
    • technology shifts sometimes open new windows to attach horizontal market positions
    • Winner either take most or takes all
  • Operations
    • track satisfaction / separate NPS for both supply and demand side
    • more satisfied demand side drives growth
    • very powerful dynamic:
      • look for overlap between buyers and sellers
        • Uber / Lyft

Book Summary: The driver in the driverless car

Conditions the presage the leap into the future in any specific economic segment or type of service

  • Systemic requisite:
    • Widespread dissatisfaction – latent or overt with the status quo
  • Technology requisite:
    • Moore’s Law
      • Cheap computers
      • Cheap sensors – IOT
      • increase in Connection speed
      • Hand hosted AI
    • IOT
      • software
      • data connectivity
      • Handheld computing
    • Artificial Intelligence and Automation
      • Shift of discrete analog task into networked digital one

Five paradigms of computing

  • Electromechanical
  • Relay
  • Vaccuum tube
  • Discrete transistor
  • Integrated circuits – Moores’ Law

Current Concerns

  • Speed of technology evolution versus  speed of regulation – codified ethics
  • Equality, Risks and Dependency versus Autonomy
    • Does the technology have the potential to benefit everyone?
    • What are the risk and rewards?
    • Does the technology more strongly promote autonomy or dependence?
      • cheap software based technologies inexpensively scaled to reach millions-billions
      • the more revenue generated the more motivated developers would want to share it broadly

Future Concerns

  • Biometric theft
  • Merging of humans with computers
  • Extent of Gene alteration that is socially acceptable – new class of humans differentiated by genetic differences
    • mitigating health risk
    • higher intelligence
    • better looks
    • greater strength
  • Privacy will be a thing of the pass
  • Navigating technology trends as a navigator instead of a passenger
  • Large scale drone attacks

Artificial Intelligence

  • Definition: a cheap reliable industrial grade digital smartness running behind everything, Kevin Kelly, Editor of WIRED magazine
  •  Types
    • Narrow AI
    • Strong/General AI
      • Watson
  • Impact of existing human occupations
    • Doctors in health care
    • Lawyers


  • Ancient Greece:
    • Socratic process whereby teacher guided students through the learning process by asking them questions
    • Education was privilege reserved for the elites
  • Middle Ages/Renaissance
    • Remained a priviledge
    • process of learning became more rote
    • more memorization
  • Online Education
    • Example: Khan academy
    • Researchers found people most likely to take advantage of online courses were those who need the least help
    • LA Unified: giving each student a tablet failed to move the needle
  • Minimally invasive Education, Mitra, New Delhi
    • NIIT building, Kalkaji slums
    • Key component of the learning process was the group dynamic
    • Self taught scholars learned as quick as school-bound peers
  • Self directed learning – flipped model of education
    • teacher no longer broadcast information, write lesson plans or stand in front of classes lecturing
    • teachers became coaches and guides to students needing additional help
    • students consumed recorded lectures or videos online at their own pace and in their own time
    • Teachers focus on judgment, nuances and emotional intelligence

Mores law and poverty

  • Comparatively poorer parts of the world will be able to leap frog into more modern and efficient era
    • wireless mobile phones
    • drones for deliver
    • Solar energy power plants
    • driver less cars
      • no need for traffic lights
      • freeways
      • Parking spaces
  • USA has no monopoly on innovation

Driver less Cars

  • Access versus ownership
  • Baidu, Google, Tesla
  • China
    • Bejing, Wuhu and Anhui
  • Singapore
  • city layouts become more flexible
  • commuting is less a hassle

Current trends

  • Plasma based water purification technology: kills 100% of bacteria and viruses
  • Energy

Further readings

  • How to create a mind: the secret of human thought revealed, Ray Krurzweil
  • The inevitable, Kevin Kelly
  • The internet of things: Mapping the value beyond Hype, McKinsey Global Institute
  • Infinite Resource: The Power of Ideas on Finite Planet
  • Abundance: The future is better than you think, Peter Diamandis

Deep learning and Machine Learning resources

A compiled list of sites that are useful for learning about fundamentals of artificial intelligence from a coders perspective:

Deep learning

  • –
    • Utilizes a breadth first approach through teaching
    • Focus on code first
    • Human learning flourishes when operating in different context.

Machine Learning


Business Applications