One of the most frequent demands business executives have of marketing is to provide normalized and predictable results. You do xyz per number of times or per volume and you get abc as a result. My observation is that things are no longer that predictable, if they ever were.
For companies that think mainly in terms of putting it all on a spreadsheet and a PowerPoint presentation, social media has been a major source of consternation. It has more variables and dimensions than a 2D medium, and needs more people who can think of their feet.
Watching this short (less than 6 minutes) and funny TED Talk by Sebastian Wernicke, I found two very interesting observations the speaker himself made in response to comments that address the issue of statistics and their analysis, which may help you think about your next social media project.
In response mainly to the assertion about statistical correlations saying nothing about causation, as shown in the video, Wernicke writes:
Never trust statistics - ever! However, I think there are at least three parts in the analysis where causation is actually plausible (then again, maybe I have spent too much time with the subject matter and start to see patterns everywhere):
1. The picture shown at 1:11 in the video is an actual correlation mapping between audience ratings. I think it makes sense that the general direction of the topic (rational vs. emotional, actions vs. ideas) should spark specific audience reactions.
2. The picture shown at 1:56 is derived from a semantic analysis (where words are automatically grouped into topics by a software tool). I think it makes sense that there is a tendency to rate those talks as your favorite that you can easily connect with emotionally.
3. Regarding the four word phrases at 3:03, it seems to me that those appearing in the most favorited TED talks are much more audience-centric than those in the least favorited TED talks.
Yesterday we talked about how comments, ratings, endorsements, likes and votes are part of collaborative filtering. As you go about creating content, uploading customer success stories, building communities, you will also provide reports to quantify measurement and results of such activities.
Could you in fact achieve better results by tweaking the language to please the community?
If you want to learn how the data was gathered, Wernicke explains a little further down in the comments:
Most parts of data gathering and analysis were a combination of Linux scripts and a (large) spreadsheet. However, two parts of the text analysis required special linguistic analysis software:
1) The top-10 word list (you need the tool to "normalize" words so that, e.g., different verb forms will be counted as the same word).
2) The most-favorite and least-favorite topics. This is based on a so-called "semantic analysis", where words are automatically grouped into a (manually curated) topic structure.
Text analysis in relations to stock price movements is in fact already being done by several financial institutions, with computers automatically interpreting and trading on news they receive via agency tickers (e.g., see http://en.wikipedia.org/wiki/Algorithmic_trading#Issues_and_developments).
He used open source tools to compile the data. Is this something we could start leveraging to operate more complex analysis of linguistic data? I'm thinking about sentiment analysis in particular. As an aside, I found it fascinating that a whole
conversation ensued in the comments about TED Talk comment ratings. Is anyone capturing that feedback?