Reflections on the decentralized multi-sided market place – GetData.IO

The beginning

The concept of GetData.IO was first conceived back in November 2012. I was rewriting one of my side project ( in NodeJS back then. Part of the rewrite required that I wrote up two separate crawlers each for a different site which I was getting data for.

Very soon after I was done with the initial rewrite, I was once again compelled to write a third crawler when I wanted to buy some stocks on the Singapore stock exchange. I realized while the data for the shares were available on the site, they were not presented in a way that facilitated my decision making process. In addition to that, the other part of the data I needed were presented on a separate site and unsurprisingly not in the way I needed.

I was on my way to write my fourth crawler when it occurred to me, if I structured my code by cleanly decoupling the declaration from underlying implementation details, it is possible to achieve a high level of code re-use.

Two weekends of tinkering and frenzied coding later, I was able to complete the first draft of the Semantic Query Language and the engine that would interpret this query language. I was in love. Using just simple JSON, it allowed anybody the ability to declare the desired data from any parts of web. This includes data scattered across multiple pages on the same site or data scattered across multiple domains which could be joined using unique keywords.

The Journey

Five years have past since, during this time, I brought this project through an incubator in Singapore with my ex-co-founder, tore out and rewritten major parts of the code-base that did not scale well, banged my head countless times on the wall  in frustration due to problems with the code and with product market fit, watched a bunch of well-funded entrants came and went. To be honest, quite a few times I threw in the towel. Always, the love for this idea would call out to me and draw me back to it. I picked up the towel and continued ploughing.

It’s now June 2018. Though it has taken quite a while, I am now here in the Bay Area, the most suitable home for this project given to the density of technological startups in this region. My green card was finally approved last month. I have accumulated enough runway to allow my full attention on this project for the next 10 years. Its time to look forward.

The vision

The vision of this project is a multi-sided market place enabled by a Turing complete Semantic Query Language. The Semantic Query Language will be interpreted and executed upon by a fully decentralized data harvesting platform that will the capacity to gather data from more than 50% of the world’s websites on a daily basis.

Members can choose to participate in this data sharing community by playing one or more of the 4 roles:

  • Members who need data
  • Members who maintain the data declarations
  • Members’ who will run instances of the Semantic Query Language interpreter on their servers to mine for data
  • Member’s who sell their own proprietary data

From this vantage point, given its highly decentralized nature, it feels appropriate to deploy the use of block chains. The final part that needs to be sorted out prior to the deployment of blockchain to operate in full decentralized mode is figure out the “proof of work”.

Operations available in other database technologies will get ported over where appropriate as and when we encounter relevant use cases surfaced by our community members.

Why now and how is it important?

More as I dwell in this space, I see very clearly why it is only going to become increasingly important to have this piece of infrastructure in place. There are namely 3 reasons for this.

Leveling the playing field

The next phase of our computing will rely very heavily on machine learning. It is a very data intensive activity. Given that established data siren’s like Facebook, Google, Amazon and Microsoft have over the past years aggregated huge tons of data, this have given them a huge unfair advantage which might not necessarily be good for the eco-system. We need to level the playing field by making it possible for other startups to gain easy access to training data for their machine learning work.

Concerns about data ownership

GDPR is a cumulation of concerns of data ownership that has been building for the past 10 years. People will increasing want to establish ownership and control over their own data, independent of the data siren’s use to house them. This means a decentralized infrastructure which people can trust to manage their own data.

Increasing world-wide need for computing talents

Demand for engineering talent will only continue to increase as the pervasiveness of computing in our lives increase. The supply of engineering talents does not seem like it will be catching up and short fall is projected to continue widening till 2050. A good signal is the increasingly high premium paid to engineering talents in the form of salaries over the recent years. It’s just plain stupidity as a civilization to devote major portions of this precious engineering resource to the writing and rewriting of web crawlers for the same data sources over and over again. Their time should be freed up to do more important things.

The first inning

Based on historical observation, I believe we are on the cusp of the very first inning in this space. A good comparison to draw upon is the early days of online music streaming.

Napster versus the music publishers is similar to how the lay of the land was back 5 years ago when Craigslist was able to successfully sue 3Tap.

Last year, LinkedIn lost the law suit against folks who were scraping public data. This is a very momentous inflection point in this space. Even the government is starting to the conclusion that public data is essentially public and Data Siren’s like any of the big Tech should have no monopoly over data that essentially belongs to the users who generated them.

Drawing further upon on the music industry analogy, the future of this space should look like how Spotify and ITunes operate in the modern day online music scene

What about recumbents?

Further readings

Book Summary: The driver in the driverless car

Conditions the presage the leap into the future in any specific economic segment or type of service

  • Systemic requisite:
    • Widespread dissatisfaction – latent or overt with the status quo
  • Technology requisite:
    • Moore’s Law
      • Cheap computers
      • Cheap sensors – IOT
      • increase in Connection speed
      • Hand hosted AI
    • IOT
      • software
      • data connectivity
      • Handheld computing
    • Artificial Intelligence and Automation
      • Shift of discrete analog task into networked digital one

Five paradigms of computing

  • Electromechanical
  • Relay
  • Vaccuum tube
  • Discrete transistor
  • Integrated circuits – Moores’ Law

Current Concerns

  • Speed of technology evolution versus  speed of regulation – codified ethics
  • Equality, Risks and Dependency versus Autonomy
    • Does the technology have the potential to benefit everyone?
    • What are the risk and rewards?
    • Does the technology more strongly promote autonomy or dependence?
      • cheap software based technologies inexpensively scaled to reach millions-billions
      • the more revenue generated the more motivated developers would want to share it broadly

Future Concerns

  • Biometric theft
  • Merging of humans with computers
  • Extent of Gene alteration that is socially acceptable – new class of humans differentiated by genetic differences
    • mitigating health risk
    • higher intelligence
    • better looks
    • greater strength
  • Privacy will be a thing of the pass
  • Navigating technology trends as a navigator instead of a passenger
  • Large scale drone attacks

Artificial Intelligence

  • Definition: a cheap reliable industrial grade digital smartness running behind everything, Kevin Kelly, Editor of WIRED magazine
  •  Types
    • Narrow AI
    • Strong/General AI
      • Watson
  • Impact of existing human occupations
    • Doctors in health care
    • Lawyers


  • Ancient Greece:
    • Socratic process whereby teacher guided students through the learning process by asking them questions
    • Education was privilege reserved for the elites
  • Middle Ages/Renaissance
    • Remained a priviledge
    • process of learning became more rote
    • more memorization
  • Online Education
    • Example: Khan academy
    • Researchers found people most likely to take advantage of online courses were those who need the least help
    • LA Unified: giving each student a tablet failed to move the needle
  • Minimally invasive Education, Mitra, New Delhi
    • NIIT building, Kalkaji slums
    • Key component of the learning process was the group dynamic
    • Self taught scholars learned as quick as school-bound peers
  • Self directed learning – flipped model of education
    • teacher no longer broadcast information, write lesson plans or stand in front of classes lecturing
    • teachers became coaches and guides to students needing additional help
    • students consumed recorded lectures or videos online at their own pace and in their own time
    • Teachers focus on judgment, nuances and emotional intelligence

Mores law and poverty

  • Comparatively poorer parts of the world will be able to leap frog into more modern and efficient era
    • wireless mobile phones
    • drones for deliver
    • Solar energy power plants
    • driver less cars
      • no need for traffic lights
      • freeways
      • Parking spaces
  • USA has no monopoly on innovation

Driver less Cars

  • Access versus ownership
  • Baidu, Google, Tesla
  • China
    • Bejing, Wuhu and Anhui
  • Singapore
  • city layouts become more flexible
  • commuting is less a hassle

Current trends

  • Plasma based water purification technology: kills 100% of bacteria and viruses
  • Energy

Further readings

  • How to create a mind: the secret of human thought revealed, Ray Krurzweil
  • The inevitable, Kevin Kelly
  • The internet of things: Mapping the value beyond Hype, McKinsey Global Institute
  • Infinite Resource: The Power of Ideas on Finite Planet
  • Abundance: The future is better than you think, Peter Diamandis