Predicting the future with social media - Asur - Cited by 264
Predicting tie strength with social media - Gilbert - Cited by 335
Predicting the present with google trends - Choi - Cited by 185
- www.grbj.com/articles/75926-social-media-pros-predict-the-futureJan 14, 2013 – What will the social media landscape look like throughout 2013? The Grand Rapids Business Journal asked local public relations, marketing ...
Note: There are many scholarly articles listed above here on this subject. I found the predictions of box office results impressive in the one here below. If you can predict box office outcomes you likely can also predict to a greater or lesser degree what countries will rise or fall as well through social media as well. The amount of things potentially that could be discovered through social media likely at this point at least is not limited. end note. However, this quote didn't come through very well here because of the type of formatting used in the abstract. So, going to this button directly might be more useful:
Predicting the future with social mediaarXiv:1003.5699v1 [cs.CY] 29 Mar 2010Predicting the Future With Social MediaSitaram AsurSocial Computing LabHP LabsPalo Alto, CaliforniaEmail: firstname.lastname@example.orgBernardo A. HubermanSocial Computing LabHP LabsPalo Alto, CaliforniaEmail: email@example.comAbstract—In recent years, social media has become ubiquitousand important for social networking and content sharing. Andyet, the content that is generated from these websites remainslargely untapped. In this paper, we demonstrate how social mediacontent can be used to predict real-world outcomes. In particular,we use the chatter from Twitter.com to forecast box-officerevenues for movies. We show that a simple model built fromthe rate at which tweets are created about particular topicscanoutperform market-based predictors. We further demonstratehow sentiments extracted from Twitter can be further utilized toimprove the forecasting power of social media.I. INTRODUCTIONSocial media has exploded as a category of online discoursewhere people create content, share it, bookmark it and networkat a prodigious rate. Examples include Facebook, MySpace,Digg, Twitter and JISC listservs on the academic side. Becauseof its ease of use, speed and reach, social media is fastchanging the public discourse in society and setting trendsand agendas in topics that range from the environment andpolitics to technology and the entertainment industry.Since social media can also be construed as a form ofcollective wisdom, we decided to investigate its power atpredicting real-world outcomes. Surprisingly, we discoveredthat the chatter of a community can indeed be used to makequantitative predictions that outperform those of artificialmarkets. These information markets generally involve thetrading of state-contingent securities, and if large enough andproperly designed, they are usually more accurate than othertechniques for extracting diffuse information, such as surveysand opinions polls. Specifically, the prices in these marketshave been shown to have strong correlations with observedoutcome frequencies, and thus are good indicators of futureoutcomes , .In the case of social media, the enormity and high vari-ance of the information that propagates through large usercommunities presents an interesting opportunity for harnessingthat data into a form that allows for specific predictionsabout particular outcomes, without having to institute marketmechanisms. One can also build models to aggregate theopinions of the collective population and gain useful insightsinto their behavior, while predicting future trends. Moreover,gathering information on how people converse regarding par-ticular products can be helpful when designing marketing andadvertising campaigns , .This paper reports on such a study. Specifically we considerthe task of predicting box-office revenues for movies usingthe chatter from Twitter, one of the fastest growing socialnetworks in the Internet. Twitter1, a micro-blogging network,has experienced a burst of popularity in recent months leadingto a huge user-base, consisting of several tens of millions ofusers who actively participate in the creation and propagationof content.We have focused on movies in this study for two mainreasons.•The topic of movies is of considerable interest amongthe social media user community, characterized both bylarge number of users discussing movies, as well as asubstantial variance in their opinions.•The real-world outcomes can be easily observed frombox-office revenue for movies.Our goals in this paper are as follows. First, we assess howbuzz and attention is created for different movies and how thatchanges over time. Movie producers spend a lot of effort andmoney in publicizing their movies, and have also embracedthe Twitter medium for this purpose. We then focus on themechanism of viral marketing and pre-release hype on Twitter,and the role that attention plays in forecasting real-worldbox-office performance. Our hypothesis is that movies that are welltalked about will be well-watched.Next, we study how sentiments are created, how positive andnegative opinions propagate and how they influence people.For a bad movie, the initial reviews might be enough todiscourage others from watching it, while on the other hand,itis possible for interest to be generated by positive reviewsandopinions over time. For this purpose, we perform sentimentanalysis on the data, using text classifiers to distinguishpositively oriented tweets from negative.Our chief conclusions are as follows:•We show that social media feeds can be effective indica-tors of real-world performance.•We discovered that the rate at which movie tweetsare generated can be used to build a powerful modelfor predicting movie box-office revenue. Moreover ourpredictions are consistently better than those producedby an information market such as the Hollywood StockExchange, the gold standard in the industry .1http://www.twitter.com•Our analysis of the sentiment content in the tweets showsthat they can improve box-office revenue predictionsbased on tweet rates only after the movies are released.This paper is organized as follows. Next, we survey recentrelated work. We then provide a short introduction to Twitterand the dataset that we collected. In Section 5, we study howattention and popularity are created and how they evolve.We then discuss our study on using tweets from Twitterfor predicting movie performance. In Section 6, we presentour analysis on sentiments and their effects. We concludein Section 7. We describe our prediction model in a generalcontext in the Appendix.II. RELATEDWORKAlthough Twitter has been very popular as a web service,there has not been considerable published research on it.Huberman and others  studied the social interactions onTwitter to reveal that the driving process for usage is a sparsehidden network underlying the friends and followers, whilemost of the links represent meaningless interactions. Javaetal  investigated community structure and isolated differenttypes of user intentions on Twitter. Jansen and others have examined Twitter as a mechanism for word-of-mouthadvertising, and considered particular brands and productswhile examining the structure of the postings and the changeinsentiments. However the authors do not perform any analysison the predictive aspect of Twitter.There has been some prior work on analyzing the correlationbetween blog and review mentions and performance. Gruhland others  showed how to generate automated queriesfor mining blogs in order to predict spikes in book sales.And while there has been research on predicting movie sales,almost all of them have used meta-data information on themovies themselves to perform the forecasting, such as themovies genre, MPAA rating, running time, release date, thenumber of screens on which the movie debuted, and thepresence of particular actors or actresses in the cast. Joshiand others  use linear regression from text and metadatafeatures to predict earnings for movies. Sharda and Delen have treated the prediction problem as a classification problemand used neural networks to classify movies into categoriesranging from ’flop’ to ’blockbuster’. Apart from the factthat they are predicting ranges over actual numbers, the bestaccuracy that their model can achieve is fairly low. Zhangand Skiena  have used a news aggregation model alongwith IMDB data to predict movie box-office numbers. Wehave shown how our model can generate better results whencompared to their method.III. TWITTERLaunched on July 13, 2006, Twitter2is an extremelypopular online microblogging service. It has a very large userbase, consisting of several millions of users (23M unique users2http://www.twitter.comin Jan3). It can be considered a directed social network, whereeach user has a set of subscribers known as followers. Eachuser submits periodic status updates, known astweets, thatconsist of short messages of maximum size 140 characters.These updates typically consist of personal information aboutthe users, news or links to content such as images, videoand articles. The posts made by a user are displayed on theuser’s profile page, as well as shown to his/her followers. Itisalso possible to send a direct message to another user. Suchmessages are preceded by@useridindicating the intendeddestination.Aretweetis a post originally made by one user that isforwarded by another user. These retweets are a popular meansof propagating interesting posts and links through the Twittercommunity.Twitter has attracted lots of attention from corporationsfor the immense potential it provides for viral marketing.Due to its huge reach, Twitter is increasingly used by newsorganizations to filter news updates through the community.A number of businesses and organizations are using Twitteror similar micro-blogging services to advertise products anddisseminate information to stakeholders.IV. DATASETCHARACTERISTICSThe dataset that we used was obtained by crawling hourlyfeed data from Twitter.com. To ensure that we obtained alltweets referring to a movie, we used keywords present in themovie title as search arguments. We extracted tweets overfrequent intervals using the Twitter Search Api4, therebyensuring we had the timestamp, author and tweet text forour analysis. We extracted 2.89 million tweets referring to24different movies released over a period of three months.Movies are typically released on Fridays, with the exceptionof a few which are released on Wednesday. Since an average of2 new movies are released each week, we collected data overa time period of 3 months from November to February to havesufficient data to measure predictive behavior. For consistency,we only considered the movies released on a Friday and onlythose in wide release. For movies that were initially in limitedrelease, we began collecting data from the time it becamewide. For each movie, we define thecritical periodas thetime from the week before it is released, when the promotionalcampaigns are in full swing, to two weeks after release, whenits initial popularity fades and opinions from people have beendisseminated.Some details on the movies chosen and their release datesare provided in Table 1. Note that, some movies that werereleased during the period considered were not used in thisstudy, simply because it was difficult to correctly identifytweets that were relevant to those movies. For instance,for the movie2012, it was impractical to segregate tweetstalking about the movie, from those referring to the year. Wehave taken care to ensure that the data we have used was3http://blog.compete.com/2010/02/24/compete-ranks-top-sites-for-january-2010/4http://search.twitter.com/api/MovieRelease DateArmored2009-12-04Avatar2009-12-18The Blind Side2009-11-20The Book of Eli2010-01-15Daybreakers2010-01-08Dear John2010-02-05Did You Hear About The Morgans2009-12-18Edge Of Darkness2010-01-29Extraordinary Measures2010-01-22From Paris With Love2010-02-05The Imaginarium of Dr Parnassus2010-01-08Invictus2009-12-11Leap Year2010-01-08Legion2010-01-22Twilight : New Moon2009-11-20Pirate Radio2009-11-13Princess And The Frog2009-12-11Sherlock Holmes2009-12-25Spy Next Door2010-01-15The Crazies2010-02-26Tooth Fairy2010-01-22Transylmania2009-12-04When In Rome2010-01-29Youth In Revolt2010-01-08TABLE INAMES AND RELEASE DATES FOR THE MOVIES WE CONSIDERED IN OURANALYSIS.disambiguated and clean by choosing appropriate keywordsand performing sanity checks.246810121416182050010001500200025003000350040004500release weekendweekend 2Fig. 1. Time-series of tweets over the critical period for different movies.The total data over the critical period for the 24 movieswe considered includes 2.89 million tweets from 1.2 millionusers.Fig 1 shows the timeseries trend in the number of tweetsfor movies over the critical period. We can observe that thebusiest time for a movie is around the time it is released,following which the chatter invariably fades. The box-officerevenue follows a similar trend with the opening weekendgenerally providing the most revenue for a movie.Fig 2 shows how the number of tweets per unique authorchanges over time. We find that this ratio remains fairlyconsistent with a value between 1 and 1.5 across the criticalperiod. Fig 3 displays the distribution of tweets by different246810121416182011.11.21.31.18.104.22.168.81.92DaysTweets per authorsRelease weekendFig. 2. Number of tweets per unique authors for different movies01234567802468101214log(tweets)log(frequency)Fig. 3. Log distribution of authors and tweets.authors over the critical period. The X-axis shows the numberof tweets in the log scale, while the Y-axis represents thecorresponding frequency of authors in the log scale. We canobserve that it is close to a Zipfian distribution, with a fewauthors generating a large number of tweets. This is consistentwith observed behavior from other networks . Next, weexamine the distribution of authors over different movies.Fig 4shows the distribution of authors and the number of moviesthey comment on. Once again we find a power-law curve, witha majority of the authors talking about only a few movies.V. ATTENTION ANDPOPULARITYWe are interested in studying how attention and popularityare generated for movies on Twitter, and the effects of thisattention on the real-world performance of the movies consid-ered.A. Pre-release Attention:Prior to the release of a movie, media companies and andproducers generate promotional information in the form oftrailer videos, news, blogs and photos. We expect the tweetsfor movies before the time of their release to consist primarilyof such promotional campaigns, geared to promote word-of-mouth cascades. On Twitter, this can be characterized bytweets referring to particular urls (photos, trailers and other24681012141618202224012345678910x 105Number of MoviesAuthorsFig. 4. Distribution of total authors and the movies they comment on.FeaturesWeek 0Week 1Week 2url39.525.522.5retweet22.214.171.124TABLE IIURL AND RETWEET PERCENTAGES FOR CRITICAL WEEKpromotional material) as well as retweets, which involve usersforwarding tweet posts to everyone in their friend-list. Boththese forms of tweets are important to disseminate informationregarding movies being released.First, we examine the distribution of such tweets for dif-ferent movies, following which we examine their correlationwith the performance of the movies.2468101214161820222400.10.20.30.126.96.36.199MoviesTweets with urls (percentage)Week 0Week 1Week 2Fig. 5. Percentages of urls in tweets for different movies.Table 2 shows the percentages of urls and retweets in thetweets over the critical period for movies. We can observe thatFeaturesCorrelationR2url0.640.39retweet0.50.20TABLE IIICORRELATION ANDR2VALUES FOR URLS AND RETWEETS BEFORERELEASE.FeaturesAdjustedR2p-valueAvg Tweet-rate0.803.65e-09Tweet-rate timeseries0.935.279e-09Tweet-rate timeseries + thcnt0.9739.14e-12HSX timeseries + thcnt0.9651.030e-10TABLE IVCOEFFICIENT OFDETERMINATION(R2)VALUES USING DIFFERENTPREDICTORS FOR MOVIE BOX-OFFICE REVENUE FOR THE FIRST WEEKEND.there is a greater percentage of tweets containing urls in theweek prior to release than afterwards. This is consistent withour expectation. In the case of retweets, we find the values tobe similar across the 3 weeks considered. In all, we found theretweets to be a significant minority of the tweets on movies.One reason for this could be that people tend to describe theirown expectations and experiences, which are not necessarilypropaganda.We want to determine whether movies that have greaterpublicity, in terms of linked urls on Twitter, perform better inthe box office. When we examined the correlation between theurls and retweets with the box-office performance, we foundthe correlation to be moderately positive, as shown in Table3. However, the adjustedR2value is quite low in both cases,indicating that these features are not very predictive of therelative performance of movies. This result is quite surprisingsince we would expect promotional material to contributesignificantly to a movie’s box-office income.B. Prediction of first weekend Box-office revenuesNext, we investigate the power of social media in predictingreal-world outcomes. Our goal is to observe if the knowledgethat can be extracted from the tweets can lead to reasonablyaccurate prediction of future outcomes in the real world.The problem that we wish to tackle can be framed asfollows.Using the tweets referring to movies prior to theirrelease, can we accurately predict the box-office revenuegenerated by the movie in its opening weekend?0246810121416x 107051015x 107Predicted Box−office RevenueActual revenueTweet−rateHSXFig. 6. Predicted vs Actual box office scores using tweet-rate and HSXpredictorsTo use a quantifiable measure on the tweets, we define thetweet-rate, as thenumber of tweets referring to a particularWhile in this study we focused on the problem of predictingbox office revenues of movies for the sake of having a clearmetric of comparison with other methods, this method can beextended to a large panoply of topics, ranging from the futurerating of products to agenda setting and election outcomes.Ata deeper level, this work shows how social media expresses acollective wisdom which, when properly tapped, can yield anextremely powerful and accurate indicator of future outcomes.VIII. APPENDIX: GENERALPREDICTIONMODEL FORSOCIALMEDIAAlthough we focused on movie revenue prediction in thispaper, the method that we advocate can be extended to otherproducts of consumer interest.We can generalize our model for predicting the revenueof a product using social media as follows. We begin withdata collected regarding the product over time, in the formof reviews, user comments and blogs. Collecting the dataover time is important as it can measure the rate of chattereffectively. The data can then be used to fit a linear regressionmodel using least squares. The parameters of the modelinclude:•A: rate of attention seeking•P: polarity of sentiments and reviews•D: distribution parameterLetydenote the revenue to be predicted andǫthe error. Thelinear regression model can be expressed as :y=βa∗A+βp∗P+βd∗D+ǫ(4)where theβvalues correspond to the regression coefficients.The attention parameter captures the buzz around the productin social media. In this article, we showed how the rate oftweets on Twitter can capture attention on movies accurately.We found this coefficient to be the most significant in ourexperiments. The polarity parameter relates to the opinionsand views that are disseminated in social media. We observedthat this gains importance after the movie has been releasedand adds to the accuracy of the predictions. In the case ofmovies, the distribution parameter is the number of theaters aparticular movie is released in. In the case of other products,it can reflect their availability in the market.IX. ACKNOWLEDGEMENTThis material is based upon work supported by the NationalScience Foundation under Grant#0937060 to the ComputingResearch Association for the CIFellows Project.REFERENCES Jure Leskovec, Lada A. Adamic and Bernardo A. Huberman. Thedynamics of viral marketing.In Proceedings of the 7th ACM Conferenceon Electronic Commerce, 2006. Bernardo A. Huberman, Daniel M. Romero, and Fang Wu. Socialnetworks that matter: Twitter under the microscope.First Monday, 14(1),Jan 2009. B. Jansen, M. Zhang, K. Sobel, and A. Chowdury. Twitter power:Tweets as electronic word of mouth.Journal of the American Societyfor Information Science and Technology, 2009. D. M. Pennock, S. Lawrence, C. L. Giles, and F.̊A. Nielsen. The realpower of artificial markets.Science, 291(5506):987–988, Jan 2001. Kay-Yut Chen, Leslie R. Fine and Bernardo A. Huberman. Predictingthe Future.Information Systems Frontiers, 5(1):47–61, 2003. W. Zhang and S. Skiena. Improving movie gross predictionthroughnews analysis.In Web Intelligence, pages 301304, 2009. Akshay Java, Xiaodan Song, Tim Finin and Belle Tseng. Whywe twit-ter: understanding microblogging usage and communities.Proceedingsof the 9th WebKDD and 1st SNA-KDD 2007 workshop on Web miningand social network analysis, pages 56–65, 2007. Ramesh Sharda and Dursun Delen. Predicting box-office success ofmotion pictures with neural networks.Expert Systems with Applications,vol 30, pp 243–254, 2006. Daniel Gruhl, R. Guha, Ravi Kumar, Jasmine Novak and AndrewTomkins. The predictive power of online chatter.SIGKDD Conferenceon Knowledge Discovery and Data Mining, 2005. Mahesh Joshi, Dipanjan Das, Kevin Gimpel and Noah A. Smith. MovieReviews and Revenues: An Experiment in Text RegressionNAACL-HLT,2010. Rion Snow, Brendan O’Connor, Daniel Jurafsky and Andrew Y. Ng.Cheap and Fast - But is it Good? Evaluating Non-Expert Annotationsfor Natural Language Tasks.Proceedings of EMNLP, 2008. Fang Wu, Dennis Wilkinson and Bernardo A. Huberman. Feeback Loopsof Attention in Peer Production.Proceedings of SocialCom-09: The 2009International Conference on Social Computing, 2009. Bo Pang and Lillian Lee. Opinion Mining and Sentiment AnalysisFoundations and Trends in Information Retrieval, 2(1-2), pp. 1135, 2008. Namrata Godbole, Manjunath Srinivasaiah and Steven Skiena. Large-Scale Sentiment Analysis for News and Blogs.Proc. Int. Conf. Weblogsand Social Media (ICWSM), 2007.end quote from:http://arxiv.org/pdf/1003.5699.pdf