Wednesday, March 27, 2013

Predicting the Future by Studying Social Media

Scholarly articles for predicting the future with social media

Search Results

  1. [PDF] 

    Tech Report: predicting the Future With Social Media - HP Labs

    www.hpl.hp.com/research/scl/papers/socialmedia/socialmedia.pdf
    File Format: PDF/Adobe Acrobat - Quick View
    by S Asur - Cited by 255 - Related articles
    Predicting the Future With Social Media. Sitaram Asur. Social Computing Lab. HP Labs. Palo Alto, California. Email: sitaram.asur@hp.com. Bernardo A.
  2. Predicting the Future with Social Media - ACM Digital Library

    dl.acm.org/citation.cfm?id=1914092
    by S Asur - 2010 - Cited by 254 - Related articles
    In recent years, social media has become ubiquitous and important for social networking and content sharing. And yet, the content that is generated from these ...
  3. Predicting the Future with Social Media

    arxiv.org › cs
    by S Asur - 2010 - Cited by 255 - Related articles
    Mar 29, 2010 – Abstract: In recent years, social media has become ubiquitous and important for social networking and content sharing. And yet, the content that ...
  4. Social media pros predict the future - Grand Rapids Business Journal

    www.grbj.com/articles/75926-social-media-pros-predict-the-future
    Jan 14, 2013 – What will the social media landscape look like throughout 2013? The Grand Rapids Business Journal asked local public relations, marketing ...
  5. The Future of Social Media: 50+ Experts Share Their 2013 Predictions

     

    Note: There are many scholarly articles listed above here on this subject. I found the predictions of box office results impressive in the one here below. If you can predict box office outcomes you likely can also predict to a greater or lesser degree what countries will rise or fall as well through social media as well. The amount of things potentially that could be discovered through social media likely at this point at least is not limited. end note. However, this quote didn't come through very well here because of the type of formatting used in the abstract. So, going to this button directly might be more useful:
    Predicting the future with social media

    arXiv:1003.5699v1 [cs.CY] 29 Mar 2010
    Predicting the Future With Social Media
    Sitaram Asur
    Social Computing Lab
    HP Labs
    Palo Alto, California
    Email: sitaram.asur@hp.com
    Bernardo A. Huberman
    Social Computing Lab
    HP Labs
    Palo Alto, California
    Email: bernardo.huberman@hp.com
    Abstract
    —In recent years, social media has become ubiquitous
    and important for social networking and content sharing. An
    d
    yet, the content that is generated from these websites remai
    ns
    largely untapped. In this paper, we demonstrate how social m
    edia
    content can be used to predict real-world outcomes. In parti
    cular,
    we use the chatter from Twitter.com to forecast box-office
    revenues for movies. We show that a simple model built from
    the rate at which tweets are created about particular topics
    can
    outperform market-based predictors. We further demonstra
    te
    how sentiments extracted from Twitter can be further utiliz
    ed to
    improve the forecasting power of social media.
    I. I
    NTRODUCTION
    Social media has exploded as a category of online discourse
    where people create content, share it, bookmark it and netwo
    rk
    at a prodigious rate. Examples include Facebook, MySpace,
    Digg, Twitter and JISC listservs on the academic side. Becau
    se
    of its ease of use, speed and reach, social media is fast
    changing the public discourse in society and setting trends
    and agendas in topics that range from the environment and
    politics to technology and the entertainment industry.
    Since social media can also be construed as a form of
    collective wisdom, we decided to investigate its power at
    predicting real-world outcomes. Surprisingly, we discove
    red
    that the chatter of a community can indeed be used to make
    quantitative predictions that outperform those of artifici
    al
    markets. These information markets generally involve the
    trading of state-contingent securities, and if large enoug
    h and
    properly designed, they are usually more accurate than othe
    r
    techniques for extracting diffuse information, such as sur
    veys
    and opinions polls. Specifically, the prices in these market
    s
    have been shown to have strong correlations with observed
    outcome frequencies, and thus are good indicators of future
    outcomes [4], [5].
    In the case of social media, the enormity and high vari-
    ance of the information that propagates through large user
    communities presents an interesting opportunity for harne
    ssing
    that data into a form that allows for specific predictions
    about particular outcomes, without having to institute mar
    ket
    mechanisms. One can also build models to aggregate the
    opinions of the collective population and gain useful insig
    hts
    into their behavior, while predicting future trends. Moreo
    ver,
    gathering information on how people converse regarding par
    -
    ticular products can be helpful when designing marketing an
    d
    advertising campaigns [1], [3].
    This paper reports on such a study. Specifically we consider
    the task of predicting box-office revenues for movies using
    the chatter from Twitter, one of the fastest growing social
    networks in the Internet. Twitter
    1
    , a micro-blogging network,
    has experienced a burst of popularity in recent months leadi
    ng
    to a huge user-base, consisting of several tens of millions o
    f
    users who actively participate in the creation and propagat
    ion
    of content.
    We have focused on movies in this study for two main
    reasons.
    The topic of movies is of considerable interest among
    the social media user community, characterized both by
    large number of users discussing movies, as well as a
    substantial variance in their opinions.
    The real-world outcomes can be easily observed from
    box-office revenue for movies.
    Our goals in this paper are as follows. First, we assess how
    buzz and attention is created for different movies and how th
    at
    changes over time. Movie producers spend a lot of effort and
    money in publicizing their movies, and have also embraced
    the Twitter medium for this purpose. We then focus on the
    mechanism of viral marketing and pre-release hype on Twitte
    r,
    and the role that attention plays in forecasting real-world
    box-
    office performance. Our hypothesis is that movies that are we
    ll
    talked about will be well-watched.
    Next, we study how sentiments are created, how positive and
    negative opinions propagate and how they influence people.
    For a bad movie, the initial reviews might be enough to
    discourage others from watching it, while on the other hand,
    it
    is possible for interest to be generated by positive reviews
    and
    opinions over time. For this purpose, we perform sentiment
    analysis on the data, using text classifiers to distinguish
    positively oriented tweets from negative.
    Our chief conclusions are as follows:
    We show that social media feeds can be effective indica-
    tors of real-world performance.
    We discovered that the rate at which movie tweets
    are generated can be used to build a powerful model
    for predicting movie box-office revenue. Moreover our
    predictions are consistently better than those produced
    by an information market such as the Hollywood Stock
    Exchange, the gold standard in the industry [4].
    1
    http://www.twitter.com
    Our analysis of the sentiment content in the tweets shows
    that they can improve box-office revenue predictions
    based on tweet rates only after the movies are released.
    This paper is organized as follows. Next, we survey recent
    related work. We then provide a short introduction to Twitte
    r
    and the dataset that we collected. In Section 5, we study how
    attention and popularity are created and how they evolve.
    We then discuss our study on using tweets from Twitter
    for predicting movie performance. In Section 6, we present
    our analysis on sentiments and their effects. We conclude
    in Section 7. We describe our prediction model in a general
    context in the Appendix.
    II. R
    ELATED
    W
    ORK
    Although Twitter has been very popular as a web service,
    there has not been considerable published research on it.
    Huberman and others [2] studied the social interactions on
    Twitter to reveal that the driving process for usage is a spar
    se
    hidden network underlying the friends and followers, while
    most of the links represent meaningless interactions. Java
    et
    al [7] investigated community structure and isolated diffe
    rent
    types of user intentions on Twitter. Jansen and others [3]
    have examined Twitter as a mechanism for word-of-mouth
    advertising, and considered particular brands and product
    s
    while examining the structure of the postings and the change
    in
    sentiments. However the authors do not perform any analysis
    on the predictive aspect of Twitter.
    There has been some prior work on analyzing the correlation
    between blog and review mentions and performance. Gruhl
    and others [9] showed how to generate automated queries
    for mining blogs in order to predict spikes in book sales.
    And while there has been research on predicting movie sales,
    almost all of them have used meta-data information on the
    movies themselves to perform the forecasting, such as the
    movies genre, MPAA rating, running time, release date, the
    number of screens on which the movie debuted, and the
    presence of particular actors or actresses in the cast. Josh
    i
    and others [10] use linear regression from text and metadata
    features to predict earnings for movies. Sharda and Delen [8
    ]
    have treated the prediction problem as a classification prob
    lem
    and used neural networks to classify movies into categories
    ranging from ’flop’ to ’blockbuster’. Apart from the fact
    that they are predicting ranges over actual numbers, the bes
    t
    accuracy that their model can achieve is fairly low. Zhang
    and Skiena [6] have used a news aggregation model along
    with IMDB data to predict movie box-office numbers. We
    have shown how our model can generate better results when
    compared to their method.
    III. T
    WITTER
    Launched on July 13, 2006, Twitter
    2
    is an extremely
    popular online microblogging service. It has a very large us
    er
    base, consisting of several millions of users (23M unique us
    ers
    2
    http://www.twitter.com
    in Jan
    3
    ). It can be considered a directed social network, where
    each user has a set of subscribers known as followers. Each
    user submits periodic status updates, known as
    tweets
    , that
    consist of short messages of maximum size 140 characters.
    These updates typically consist of personal information ab
    out
    the users, news or links to content such as images, video
    and articles. The posts made by a user are displayed on the
    user’s profile page, as well as shown to his/her followers. It
    is
    also possible to send a direct message to another user. Such
    messages are preceded by
    @
    user
    id
    indicating the intended
    destination.
    A
    retweet
    is a post originally made by one user that is
    forwarded by another user. These retweets are a popular mean
    s
    of propagating interesting posts and links through the Twit
    ter
    community.
    Twitter has attracted lots of attention from corporations
    for the immense potential it provides for viral marketing.
    Due to its huge reach, Twitter is increasingly used by news
    organizations to filter news updates through the community.
    A number of businesses and organizations are using Twitter
    or similar micro-blogging services to advertise products a
    nd
    disseminate information to stakeholders.
    IV. D
    ATASET
    C
    HARACTERISTICS
    The dataset that we used was obtained by crawling hourly
    feed data from Twitter.com. To ensure that we obtained all
    tweets referring to a movie, we used keywords present in the
    movie title as search arguments. We extracted tweets over
    frequent intervals using the Twitter Search Api
    4
    , thereby
    ensuring we had the timestamp, author and tweet text for
    our analysis. We extracted 2.89 million tweets referring to
    24
    different movies released over a period of three months.
    Movies are typically released on Fridays, with the exceptio
    n
    of a few which are released on Wednesday. Since an average of
    2 new movies are released each week, we collected data over
    a time period of 3 months from November to February to have
    sufficient data to measure predictive behavior. For consist
    ency,
    we only considered the movies released on a Friday and only
    those in wide release. For movies that were initially in limi
    ted
    release, we began collecting data from the time it became
    wide. For each movie, we define the
    critical period
    as the
    time from the week before it is released, when the promotiona
    l
    campaigns are in full swing, to two weeks after release, when
    its initial popularity fades and opinions from people have b
    een
    disseminated.
    Some details on the movies chosen and their release dates
    are provided in Table 1. Note that, some movies that were
    released during the period considered were not used in this
    study, simply because it was difficult to correctly identify
    tweets that were relevant to those movies. For instance,
    for the movie
    2012
    , it was impractical to segregate tweets
    talking about the movie, from those referring to the year. We
    have taken care to ensure that the data we have used was
    3
    http://blog.compete.com/2010/02/24/compete-ranks-to
    p-sites-for-january-
    2010/
    4
    http://search.twitter.com/api/
    Movie
    Release Date
    Armored
    2009-12-04
    Avatar
    2009-12-18
    The Blind Side
    2009-11-20
    The Book of Eli
    2010-01-15
    Daybreakers
    2010-01-08
    Dear John
    2010-02-05
    Did You Hear About The Morgans
    2009-12-18
    Edge Of Darkness
    2010-01-29
    Extraordinary Measures
    2010-01-22
    From Paris With Love
    2010-02-05
    The Imaginarium of Dr Parnassus
    2010-01-08
    Invictus
    2009-12-11
    Leap Year
    2010-01-08
    Legion
    2010-01-22
    Twilight : New Moon
    2009-11-20
    Pirate Radio
    2009-11-13
    Princess And The Frog
    2009-12-11
    Sherlock Holmes
    2009-12-25
    Spy Next Door
    2010-01-15
    The Crazies
    2010-02-26
    Tooth Fairy
    2010-01-22
    Transylmania
    2009-12-04
    When In Rome
    2010-01-29
    Youth In Revolt
    2010-01-08
    TABLE I
    N
    AMES AND RELEASE DATES FOR THE MOVIES WE CONSIDERED IN OUR
    ANALYSIS
    .
    disambiguated and clean by choosing appropriate keywords
    and performing sanity checks.
    2
    4
    6
    8
    10
    12
    14
    16
    18
    20
    500
    1000
    1500
    2000
    2500
    3000
    3500
    4000
    4500
    release weekend
    weekend 2
    Fig. 1. Time-series of tweets over the critical period for di
    fferent movies.
    The total data over the critical period for the 24 movies
    we considered includes 2.89 million tweets from 1.2 million
    users.
    Fig 1 shows the timeseries trend in the number of tweets
    for movies over the critical period. We can observe that the
    busiest time for a movie is around the time it is released,
    following which the chatter invariably fades. The box-offic
    e
    revenue follows a similar trend with the opening weekend
    generally providing the most revenue for a movie.
    Fig 2 shows how the number of tweets per unique author
    changes over time. We find that this ratio remains fairly
    consistent with a value between 1 and 1.5 across the critical
    period. Fig 3 displays the distribution of tweets by differe
    nt
    2
    4
    6
    8
    10
    12
    14
    16
    18
    20
    1
    1.1
    1.2
    1.3
    1.4
    1.5
    1.6
    1.7
    1.8
    1.9
    2
    Days
    Tweets per authors
    Release weekend
    Fig. 2. Number of tweets per unique authors for different mov
    ies
    0
    1
    2
    3
    4
    5
    6
    7
    8
    0
    2
    4
    6
    8
    10
    12
    14
    log(tweets)
    log(frequency)
    Fig. 3. Log distribution of authors and tweets.
    authors over the critical period. The X-axis shows the numbe
    r
    of tweets in the log scale, while the Y-axis represents the
    corresponding frequency of authors in the log scale. We can
    observe that it is close to a Zipfian distribution, with a few
    authors generating a large number of tweets. This is consist
    ent
    with observed behavior from other networks [12]. Next, we
    examine the distribution of authors over different movies.
    Fig 4
    shows the distribution of authors and the number of movies
    they comment on. Once again we find a power-law curve, with
    a majority of the authors talking about only a few movies.
    V. A
    TTENTION AND
    P
    OPULARITY
    We are interested in studying how attention and popularity
    are generated for movies on Twitter, and the effects of this
    attention on the real-world performance of the movies consi
    d-
    ered.
    A. Pre-release Attention:
    Prior to the release of a movie, media companies and and
    producers generate promotional information in the form of
    trailer videos, news, blogs and photos. We expect the tweets
    for movies before the time of their release to consist primar
    ily
    of such promotional campaigns, geared to promote word-of-
    mouth cascades. On Twitter, this can be characterized by
    tweets referring to particular urls (photos, trailers and o
    ther
    2
    4
    6
    8
    10
    12
    14
    16
    18
    20
    22
    24
    0
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    x 10
    5
    Number of Movies
    Authors
    Fig. 4. Distribution of total authors and the movies they com
    ment on.
    Features
    Week 0
    Week 1
    Week 2
    url
    39.5
    25.5
    22.5
    retweet
    12.1
    12.1
    11.66
    TABLE II
    U
    RL AND RETWEET PERCENTAGES FOR CRITICAL WEEK
    promotional material) as well as retweets, which involve us
    ers
    forwarding tweet posts to everyone in their friend-list. Bo
    th
    these forms of tweets are important to disseminate informat
    ion
    regarding movies being released.
    First, we examine the distribution of such tweets for dif-
    ferent movies, following which we examine their correlatio
    n
    with the performance of the movies.
    2
    4
    6
    8
    10
    12
    14
    16
    18
    20
    22
    24
    0
    0.1
    0.2
    0.3
    0.4
    0.5
    0.6
    0.7
    Movies
    Tweets with urls (percentage)
    Week 0
    Week 1
    Week 2
    Fig. 5. Percentages of urls in tweets for different movies.
    Table 2 shows the percentages of urls and retweets in the
    tweets over the critical period for movies. We can observe th
    at
    Features
    Correlation
    R
    2
    url
    0.64
    0.39
    retweet
    0.5
    0.20
    TABLE III
    C
    ORRELATION AND
    R
    2
    VALUES FOR URLS AND RETWEETS BEFORE
    RELEASE
    .
    Features
    Adjusted
    R
    2
    p-value
    Avg Tweet-rate
    0.80
    3.65e-09
    Tweet-rate timeseries
    0.93
    5.279e-09
    Tweet-rate timeseries + thcnt
    0.973
    9.14e-12
    HSX timeseries + thcnt
    0.965
    1.030e-10
    TABLE IV
    C
    OEFFICIENT OF
    D
    ETERMINATION
    (
    R
    2
    )
    VALUES USING DIFFERENT
    PREDICTORS FOR MOVIE BOX
    -
    OFFICE REVENUE FOR THE FIRST WEEKEND
    .
    there is a greater percentage of tweets containing urls in th
    e
    week prior to release than afterwards. This is consistent wi
    th
    our expectation. In the case of retweets, we find the values to
    be similar across the 3 weeks considered. In all, we found the
    retweets to be a significant minority of the tweets on movies.
    One reason for this could be that people tend to describe thei
    r
    own expectations and experiences, which are not necessaril
    y
    propaganda.
    We want to determine whether movies that have greater
    publicity, in terms of linked urls on Twitter, perform bette
    r in
    the box office. When we examined the correlation between the
    urls and retweets with the box-office performance, we found
    the correlation to be moderately positive, as shown in Table
    3. However, the adjusted
    R
    2
    value is quite low in both cases,
    indicating that these features are not very predictive of th
    e
    relative performance of movies. This result is quite surpri
    sing
    since we would expect promotional material to contribute
    significantly to a movie’s box-office income.
    B. Prediction of first weekend Box-office revenues
    Next, we investigate the power of social media in predicting
    real-world outcomes. Our goal is to observe if the knowledge
    that can be extracted from the tweets can lead to reasonably
    accurate prediction of future outcomes in the real world.
    The problem that we wish to tackle can be framed as
    follows.
    Using the tweets referring to movies prior to their
    release, can we accurately predict the box-office revenue
    generated by the movie in its opening weekend?
    0
    2
    4
    6
    8
    10
    12
    14
    16
    x 10
    7
    0
    5
    10
    15
    x 10
    7
    Predicted Box−office Revenue
    Actual revenue
    Tweet−rate
    HSX
    Fig. 6. Predicted vs Actual box office scores using tweet-rat
    e and HSX
    predictors
    To use a quantifiable measure on the tweets, we define the
    tweet-rate
    , as the
    number of tweets referring to a particular
    While in this study we focused on the problem of predicting
    box office revenues of movies for the sake of having a clear
    metric of comparison with other methods, this method can be
    extended to a large panoply of topics, ranging from the futur
    e
    rating of products to agenda setting and election outcomes.
    At
    a deeper level, this work shows how social media expresses a
    collective wisdom which, when properly tapped, can yield an
    extremely powerful and accurate indicator of future outcom
    es.
    VIII. A
    PPENDIX
    : G
    ENERAL
    P
    REDICTION
    M
    ODEL FOR
    S
    OCIAL
    M
    EDIA
    Although we focused on movie revenue prediction in this
    paper, the method that we advocate can be extended to other
    products of consumer interest.
    We can generalize our model for predicting the revenue
    of a product using social media as follows. We begin with
    data collected regarding the product over time, in the form
    of reviews, user comments and blogs. Collecting the data
    over time is important as it can measure the rate of chatter
    effectively. The data can then be used to fit a linear regressi
    on
    model using least squares. The parameters of the model
    include:
    A
    : rate of attention seeking
    P
    : polarity of sentiments and reviews
    D
    : distribution parameter
    Let
    y
    denote the revenue to be predicted and
    Ç«
    the error. The
    linear regression model can be expressed as :
    y
    =
    β
    a
    A
    +
    β
    p
    P
    +
    β
    d
    D
    +
    Ç«
    (4)
    where the
    β
    values correspond to the regression coefficients.
    The attention parameter captures the buzz around the produc
    t
    in social media. In this article, we showed how the rate of
    tweets on Twitter can capture attention on movies accuratel
    y.
    We found this coefficient to be the most significant in our
    experiments. The polarity parameter relates to the opinion
    s
    and views that are disseminated in social media. We observed
    that this gains importance after the movie has been released
    and adds to the accuracy of the predictions. In the case of
    movies, the distribution parameter is the number of theater
    s a
    particular movie is released in. In the case of other product
    s,
    it can reflect their availability in the market.
    IX. A
    CKNOWLEDGEMENT
    This material is based upon work supported by the National
    Science Foundation under Grant
    #
    0937060 to the Computing
    Research Association for the CIFellows Project.
    R
    EFERENCES
    [1] Jure Leskovec, Lada A. Adamic and Bernardo A. Huberman. T
    he
    dynamics of viral marketing.
    In Proceedings of the 7th ACM Conference
    on Electronic Commerce
    , 2006.
    [2] Bernardo A. Huberman, Daniel M. Romero, and Fang Wu. Soci
    al
    networks that matter: Twitter under the microscope.
    First Monday
    , 14(1),
    Jan 2009.
    [3] B. Jansen, M. Zhang, K. Sobel, and A. Chowdury. Twitter po
    wer:
    Tweets as electronic word of mouth.
    Journal of the American Society
    for Information Science and Technology
    , 2009.
    [4] D. M. Pennock, S. Lawrence, C. L. Giles, and F.
    ̊
    A. Nielsen. The real
    power of artificial markets.
    Science
    , 291(5506):987–988, Jan 2001.
    [5] Kay-Yut Chen, Leslie R. Fine and Bernardo A. Huberman. Pr
    edicting
    the Future.
    Information Systems Frontiers
    , 5(1):47–61, 2003.
    [6] W. Zhang and S. Skiena. Improving movie gross prediction
    through
    news analysis.
    In Web Intelligence
    , pages 301304, 2009.
    [7] Akshay Java, Xiaodan Song, Tim Finin and Belle Tseng. Why
    we twit-
    ter: understanding microblogging usage and communities.
    Proceedings
    of the 9th WebKDD and 1st SNA-KDD 2007 workshop on Web mining
    and social network analysis
    , pages 56–65, 2007.
    [8] Ramesh Sharda and Dursun Delen. Predicting box-office su
    ccess of
    motion pictures with neural networks.
    Expert Systems with Applications
    ,
    vol 30, pp 243–254, 2006.
    [9] Daniel Gruhl, R. Guha, Ravi Kumar, Jasmine Novak and Andr
    ew
    Tomkins. The predictive power of online chatter.
    SIGKDD Conference
    on Knowledge Discovery and Data Mining
    , 2005.
    [10] Mahesh Joshi, Dipanjan Das, Kevin Gimpel and Noah A. Smi
    th. Movie
    Reviews and Revenues: An Experiment in Text Regression
    NAACL-HLT
    ,
    2010.
    [11] Rion Snow, Brendan O’Connor, Daniel Jurafsky and Andre
    w Y. Ng.
    Cheap and Fast - But is it Good? Evaluating Non-Expert Annota
    tions
    for Natural Language Tasks.
    Proceedings of EMNLP
    , 2008.
    [12] Fang Wu, Dennis Wilkinson and Bernardo A. Huberman. Fee
    back Loops
    of Attention in Peer Production.
    Proceedings of SocialCom-09: The 2009
    International Conference on Social Computing
    , 2009.
    [13] Bo Pang and Lillian Lee. Opinion Mining and Sentiment An
    alysis
    Foundations and Trends in Information Retrieval
    , 2(1-2), pp. 1135, 2008.
    [14] Namrata Godbole, Manjunath Srinivasaiah and Steven Sk
    iena. Large-
    Scale Sentiment Analysis for News and Blogs.
    Proc. Int. Conf. Weblogs
    and Social Media (ICWSM)
    , 2007.
     
    end quote from:
     http://arxiv.org/pdf/1003.5699.pdf

     

     

No comments: