The definition of Big Data and Semantics
According to SAS, Big Data can be described as a large volume of data, both structured and unstructured, that floods businesses on a daily bases. The Economist has defined this explosion of information as a “data deluge”. While the term “Big Data” implies a large volume of data, this is not important. What is important is what companies decide to do with this data. It can be analyzed so that companies gain a better understanding of their organisation which in turn leads to the creation of more effective strategies.
Semantics can be described as the section of linguistics and logic that concerns itself with meaning. There are two main categories in this field. These are logical semantics and lexical semantics. The first category, logical semantics, concerns itself with topics such as implication, reference, presupposition and sense. On the other hand, lexical semantics concerns itself with analysing the meanings of words as well as the affinity between them.
Challenges of Big Data and Semantics
In the past few years we have seen a trend where large companies have started to show interest in combining Big Data and semantic web technology in order to have more added value. This combination, although it comes with several benefits, is not without it’s challenges. Let’s take a look at the challenges facing big data and semantics.
One challenge that presents itself when dealing with data is the fact that most data is initially unstructured i.e. written in a natural language. This meaning that data is limited in the degree to which it can be interpreted by machines. Examples of this “unstructured data’’ are text messages, e-mails, reports, blog articles, social network feeds and so on.
A second challenge that faces users of Big Data is time management. Due to the high speed in which data is eliminated, we are forced to restrict the time we spend working with Big Data. To deal with this, one needs to attain the most suitable software capable of analysing the data as quickly as possible. This software is able to transform the unstructured information into structured information as soon as possible, or if we’d like to be more eloquent, we could say that this software is capable of extracting value from chaos.
A third challenge that people are confronted with when dealing with Big Data is the sheer amount of “hidden” information that is still far from being analysed and understood, the so-called “Dark Data”. Google’s chief economist, Hal Varian, claims that “data are widely available; what is scarce is the ability to extract wisdom from them”.
How can we transform information into knowledge?
From the challenges presented by big data, we can see that it is an unquestionable fact that data must not only be processed in real time but that it must also be analysed for its semantic content. A way in which to not be smothered by the ‘data deluge’ is to attempt to make sense of this vast amount of digital information.
The semantic information hidden in the data poses several challenges to Natural Language Processing (or NLP). NLP is a field of computer science, artificial intelligence and linguistics that concerns itself with the interactions that occur between computers and human (natural) language.
One of the NLP tasks that involves natural language understanding is topic extraction (or topic recognition). This meaning that any piece of text is separated into segments. Each segment is then devoted to one specific topic. That one specific topic is then identified and analysed.
A second NLP task that concerns itself with natural language understanding is sentiment analysis. This kind of analysis extracts the subjective opinions people have on a specific brand or product and is done by analyzing the polarity of the sentiment (positive vs. negative) used to describe their experience with it (as can be seen in reviews of the brand/product).
Both these tasks are especially useful for marketing purposes and customer experience management: they allow us to understand what people are talking about on social media which in turn allows us to identify trends and new insights on public opinion. Deriving value out of the data doesn’t rely on how “big” the data is, rather on how efficiently our natural language processing tools can analyse the content, spot patterns and extract useful information.
How do Big Data and Semantics integrate into a company’s workflow?
The use of Big Data and Semantics has paved the way for the creation of a new partnership between marketing management and linguistics as well as created a new role in the organization process. In regards to the organisation process, someone (in this case a computational linguist) has to make sense of the black box of nonsense of big data by finding meaningful patterns through semantic analysis that will in turn drive marketing goals.
Although Big Data and Semantics do face certain challenges, one cannot deny that this trend of combining the two can certainly benefit companies and the more advanced technology becomes, the more we’ll be able to get out of this partnership.
Daniela Guglielmo is the Linguistic Project Manager at Buzzoole Holdings. She attained a Ph.D. in Linguistics in 2013 and is also a postdoctoral researcher at the University of Salerno. She is the author of several peer-review publications regarding General Linguistics and NLP..
This post is also available in: Italian