The analysis of modern social networks is widely used in many business areas such as marketing, forecasting, financial and stock markets, etc. Marketing, Predictive Analytics & Risk Management are the parts of Business intelligence (BI). BI provides reporting and analysis that can help make business decisions and show what happened and why. We would like to consider the ability of using data mining methods which were applied to unstructured data streams in BI solutions. One can gather such data from different sources, e.g. social network streams, specialized forums, RSS channels, etc. We especially study how such type of analysis can be applied to predictive analytics and risk management. Let us consider the grounds of these areas. Predictive analytics is an area of data mining that deals with extracting information from data and using it to predict trends and behavior patterns. Predictive models analyze past performance to assess how likely a customer is to exhibit a specific behavior in order to improve marketing effectiveness. With the number of competing services available, businesses need to focus efforts on maintaining continuous consumer satisfaction, rewarding consumer loyalty. So, it is important to analyze users' opinion which can be retrieved from users' messages in social networks. Predictive analytics can also predict this behavior, so that the company can take proper actions to increase customer activity. Apart from identifying prospects, predictive analytics can also help to identify the most effective combination of product versions, marketing material, communication channels and timing that should be used to target a given consumer. Predictive analytics and mining of social network streams can also be used to identify high-risk fraud candidates in business or the public sector. Another area where we can implement social network stream is risk management, this is about the identification, assessment, and prioritization of risks. Social network streams make it possible to reveal quantitative characteristics of background factor for the processes under analysis. Monitoring of indicators of social network streams allows to control the probability and/or impact of unfortunate events or to maximize the realization of opportunities. Lack of knowledge can be retrieved from semistructured data of message streams in social networks. This additional knowledge can also help to optimize analyzed processes and minimize overall risk. Text stream mining enables us to reveal the dynamics of different risk sources by analyzing quantitative indicators, retrieved from social network data. One of important factors in risk management is users' opinion about some entity, e.g. process, services, etc. Such opinions can be retrieved using sentiment mining approach applied to informational streams of social networks. Modern systems of business intelligence widely use the analytical methods of non-structured and semi structured data, gathered from different sources.
We would like to show the possibility of analyzing economical and financial indicators using the stream of textual data and informational streams of social networks, special-purpose forums, and RSS channels. Consider the data mining of social network streams. To receive information streams, we used Twitter API and special Python software for web scraping of special-purpose forums. The theoretical basis for the analysis was the theory of semantic fields, the analysis of formal concepts, and the theory of frequent sets and association rules, sentiment mining methods. For the predictive analytics we used ARIMA and VAR models. The Granger test was used to find causality between time series. As a result of data mining of text messages we will receive the time series of various quantitative characteristics of blog messages, e.g. support and confidence of association rules. The next step is to find correlations between the time series, which are the results of social network data mining and the time series that represent real stock markets. On this step, we need to find such time series of social media trends that not only correlate with stock market series but also have predictive potential. Very important for decision-making in risk management are the visualization of data and infographics, on the basis of which an expert makes his decision. That is why we attached a great importance to various methods how to represent our results. As our previous studies show it is very important to detect and remove anomalous communities that were dynamically formed in tweet streams. We also showed that it is very important to single out the tweets of competent users and main influencers. We can find them using different methods of graph theory.
As an example, consider the dynamics of popularity of some cosmetic brands, based on the downloaded tweet streams. Fig. 1-4 show the results obtained. We used various types to visualize our results in graph presentation. Such types of graphs may be used in business intelligence dashboards. They may also provide additional business information for the experts in marketing, predictive analytics, and risk management spheres.
Let us consider the dynamics of chosen brands, based on the analysis of messages from economic forums. Those messages were downloaded from forums using corresponding Python software. Fig.5-7 shows the obtained results in different graphical presentations.
Now we consider the dynamics of quantitative characteristics of one company, based on the analysis of downloaded tweet streams. We chose Apple company as an example. Fig.8-9 shows the graphs with the dynamics of keyword frequent itemsets and the dynamics of users' opinion. These results reflect the dynamics of the popularity of Apple products and users' opinion towards them.
Our next step is to consider if it is possible to predict Apple stock prices on the basis of obtained time series of keyword frequent itemsets. In our previous studies, we conducted the Granger test for the time series of frequent itemsetsand Apple stock prices. This test showed that the time series of frequent itemsets of analyzed tweet stream causes the peculiarities of the dynamics of stock prices. We use the VAR model to analyze the possibility to predict stock prices. This model takes into account both the dynamics of stock prices and the dynamics of some chosen frequent itemsets. Fig.10-12 show the calculation results with different sets of frequent itemsets. The bold points are the predicted values that were calculated on the basis of previous historical data. Fig.10-11 shows the calculations for three days ahead , and the fig.12 shows the calculations of the prediction for one day ahead. Confidential interval is marked by grey color.
Into VAR model, we included the time series of keywords and users' opinions of frequent itemsets. The obtained data show that on some analyzed intervals VAR model has appeared to be effective in predictive analytics approach to stock market forecasting In our further studies we are going to concentrate on the algorithms how to select effectively the sets of time series of frequent itemsets for the purpose of reducing the confidence interval and more accurate prediction for longer time periods.Our previous similar investigations can be found at:
Granger Causality Test for Frequent Itemsets of Keywords in Financial Tweets
Forecasting of the winners and favorites ofEurovision Song Contest 2013
Forecasting of the winners and favorites ofEurovision Song Contest 2013
We also give our selected scientific e-prints and links where we described the theoretical grounds of social network mining, which we used in our studies:
Tweets Miner for Stock Market Analysis
In this paper, we present a software package for the data mining of Twitter microblogs with the purpose of their usage in the stock market analysis. The package is written in R language using appropriate R packages. We considered the model of tweets and then compared stock market charts with frequent sets of keywords in Twitter microblog messages.
Can Twitter Predict Royal Baby's Name?
We analyze the existence of possible correlation between public opinion of twitter users and the decision-making of persons who are influential in the society. In our study, we use the methods of quantitative processing of natural language, the theory of frequent sets, the algorithms of visual displaying of users' communities. It was revealed that the structure of dynamically formed users' communities participating in the discussion is determined by only a few leaders who influence significantly the viewpoints of other users.
Forecasting of Events by Tweet Data Mining
This paper describes the analysis of quantitative characteristics of frequent sets and association rules in the posts of Twitter microblogs related to different event discussions. For the analysis, we used a theory of frequent sets, association rules and a theory of formal concept analysis. We revealed the frequent sets and association rules which characterize the semantic relations between the concepts of analyzed subjects. The support of some frequent sets reaches its global maximum before the expected event but with some time delay. Such frequent sets may be considered as predictive markers that characterize the significance of expected events for blogosphere users. We showed that the time dynamics of confidence in some revealed association rules can also have predictive characteristics. Exceeding a certain threshold may be a signal for corresponding reaction in the society within the time interval between the maximum and the probable coming of an event. In this paper, we considered two types of events: the Olympic tennis tournament final in London, 2012 and the prediction of Eurovision 2013 winner.