Interesting having heard James Maltby of Yarc (Cray) Data and Stephen Brobst of Terradata last week at Semantic Days, to read this piece on NLP at LinguaFranca (via BifRiv).
Essentially – armed with massive number crunching power and statistics (courtesy of Google and/or Cray) the incentive to invest in semantic solutions is greatly reduced.
Just don’t mix two areas both called “semantics”!
Semantic analysis as a part of a particular family of NLP technologies – is in danger.
Semantic technologies in data modelling (what we’re involved in) will get a boost from cheaper and bigger formal models of natural language texts. It is not important for us whether such models are derived via parsing and ontology mapping, via statistical methods, or via totally obscure genetic algorithms – if they are good quality.
Understood Victor.
But as you say the question concerns quality. The statistical semantic quality for finding knowledge is “good enough” in general consumer business – but as the article says, you still need intelligence to spot the howlers – for those you do need actual semantics – intended semantics not just historically inferred semantics. Such howlers may be hilarious or disastrous depending on your context.
We want to avoid another Macondo / Deepwater Horizon, not analyse its post-mortem.
Reality will be hybrid – formal pre-definable semantics where you need them, whatever works where you don’t.
It is indeed interesting to see people trying to find the balance. Theoretically machine learning should be a combination of all other approaches. IBM Watson is a good example.
I was rather puzzled looking at https://www.posccaesar.org/export/3874/pub/SemanticDays/2013/Presentations/DAY%203/1130%20Craig%20Trim%20%20-%20IBM%20-%20Semantic%20Days.pdf . I was expecting broader look from IBM on Semantic Days.