An analysis of 1 month of tweets for event detection

March 23, 2018May 24, 2018 / Christopher / Leave a comment

Although I’ve worked in a variety of industries doing data science and machine learning projects, I’d never actually worked directly with Twitter data before today, so this was a new challenge. I found it an exciting exercise and am guilty of diving down the rabbit hole exploring nascent territory.

I knew immediately a major challenge was going to be preprocessing the data, as social media data is notoriously noisy. How noisy is it? Noisier than Gimli’s gastrointestinal tract after eating way too much lembas bread. As A Bhoi (2017) described it succinctly: “Identification of named entities (NEs) from microblog contents like twitter is a challenging task due to their noisy and short nature and lack of contextual information.” Couldn’t have said it better myself.

So first thing to do is take a look at the raw data. Let’s open the chicago.csv file in Excel and LibreOffice Calc. Turns out Excel and LibreOffice handle the data differently, particularly with regard to non-ASCII characters (e.g. emojis).

First things first. Create a virtual environment. I’ve found from past experience that keeping your projects isolated from each other is paramount to preventing chaos from unintended package versions being used with a given project. Virtual environments are a key tool to keeping your sanity and some semblance of organization. So too is a package manager. The old way was virtualenv & pip. Unfortunately, this requires the unacceptably tedious task of manually updating the requirements.txt file each time a new package is added/updated to your project. As Frank Costanza from Seinfeld, said, “There had to be another way!!” https://youtu.be/cFmEYOnpEkc

Fortunately, within the last year we have “pipenv”. I’m a fan and have found this to be the ideal tool for managing versions of packages and virtual environments. I’m not the only one; it’s now the recommended packaging tool for the Python community from Python.org.

Previous Work

I’m a firm believer of not reinventing the wheel for the sake of time-efficiency. So the first thing I did was a pretty thorough scientific journal lit review of extracting events from Twitter data. Turns out there’s a fair amount of prior research. This will come in handy.

Zhou et al (2016) wrote an article “Real world city event extraction from Twitter data streams” that outlines an unsupervised method to extract real world events from Twitter streams. This is a great starting point, as Zhou delineates several of the previous attempts at event detection and their shortcomings:

There is also some work which focuses on open domain events. Becker et al.3 provide a combination of online clustering and classification to distinguish real world events and non-events, but the work does not provide detailed classifications nor any explanation of the detected events.

Ritter et al.7 develop an open-domain event extraction and categorization system for Twitter. The system applies an LDA-based algorithm to detect topic clusters but requires manual inspection of the clusters types.

Unfortunately, Zhou’s solution focuses more on real world events that affect city services (e.g. traffic flow, weather, natural disaster), and he uses the same framework of 7 categories that Ritter et al

He explains this

Several similar event types are also subsumed into a categorization that encompasses those types, e.g. concert, festival, parade into ‘culture’. Since these events will result in a similar influence on the city, it is unnecessary to classify them into separate types.

Okay, so they group all events of culture type into a single parent class “Culture”. We actually want to identify these child class events (e.g. concert, festival, parade, sports game, protest), but again, this is a good starting point.

The great attribute of an unsupervised approach is you don’t have to worry about pre-labeling a ton of events for use in your training set. This is good because in the real world and your test set events can and will be myriad. So an LDA approach allows a much more flexible, dynamic approach to identifying potential events.

To design a generic solution and avoid the need of creating a training keyword set for each city, an unsupervised method based on Twitter-LDA (Twitter Latent Dirichlet Allocation) is proposed.

Twevent (Li et al, 2012)

It’s over five years old, but the Twevent Segment-based Event Detection from Tweets” paper is nearly tailor-made for this task of event detection and evaluation.

In a summary, Twevent solves the ED problem with three components:

tweet segmentation,
event segment detection, and
event segment clustering

Twevent basically approaches event detection as a clustering problem with burstiness as the most important attribute of detecting an event.

Feature-Rich Segment-Based News Event Detection on Twitter (Y Qin et al, 2013)

Quin built upon the Twevent model, but __________________________

Event Detection in Twitter: A machine-learning approach based on term pivoting (F Kunneman, 2014)

This Dutch team expanded upon existing work that uses Twevent (2012) for event detection. They build upon Qin et al, 2013 team, and focus on training a classifier on several features of an event to recognize significant events in contrast to mundane, insignificant events.

But rather than clustering based on segments, as Qin (2013) did, the Dutch team based their clustering model on unigrams. When I read unigrams, it gave me pause, as most NLP-literate individuals recognize that with an n-gram model, setting n=1 (i.e. a unigram) is suboptimal in English texts. Rather you can get much better predictive performance by using bigrams or even trigrams. Kunneman explain their rationale hower:

…in Dutch…, word formation is characterized by compounding, which means that Dutch unigrams…capture the same information as English bigrams. Compare, for instance, ‘home owner’ to ‘huizenbezitter‘…

This is a classic teaching moment to aspiring data scientists working on an NLP problem: always understand the language you are dealing with, and question your assumptions. English may be the de facto language in the US, but Twitter and social media are global platforms, and you will often find that an NLP approach that works with one language (e.g. English), may completely fall apart in another. Moving on.

I also liked that Kunneman defined explicitly what they meant by “significant”:

As a definition of what makes an event significant, we follow the definition given by [8]: ’Something is significant if it may be discussed in the media.’ As a proxy, we borrow the idea of [7] to include the presence of a certain name or concept as an article on Wikipedia as a weight in determining the significance of the candidate cluster of terms.

Location-Specific Tweet Detection and Topic Summarization in Twitter — V Rakesh – 2013

Rakesh’s team argue (rightly) that the geolocation of users does not necessarily correspond with the location specificity of the event they are tweeting about. They

classify a tweet to be location-specific “not only based on it based on it’s geographical information, but also based on the relevancy of it’s content with respect to that location. In this paper, we aim to discover such location-specific tweets by combining the tweets’ content and the network information of the user.

They built a weighting scheme called Location Centric Word Co-occurrence (LCWC) that uses both the content of the tweets and the network information of tweeters’ friends to identify tweets that are location specific. Their LCWC uses the following to build a likelihood score:

mutual information (MI) score of tweet bi-grams;
the tweet’s inverse document frequency (IDF);
the term frequency (TF) of tweets, and
the user’s network score to determine the location-specific tweets.

Why the use of bi-grams? In their own words:

users tend to use a combination of hash-tags and words to describe the event; therefore, relying simply on a uni-gram model cannot provide the much needed information about the event.

Now one drawback to Rakesh’s approach is that it relies heavily upon the network interaction of twitter users to infer geographic location of events. This works well when a majority of your users have friends tweeting about the same event, but this is not always the case.

“Geoburst — Real-time local event detection in geo-tagged tweet streams” — C Zhang et al, 2016

Geoburst was considered “state of the art” not even two years ago, but unfortunately, has pretty terrible precision (~30%), which simply will not do.

The other big issue with Geoburst and most pre-2016 event detection models was that it’s really hard to make a universal ranking function for accurate candiate event filtering.

“Finding and Tracking Local Twitter Users for News Detection” — H Wei – 2017

Enter the best (as of early 2018) event detection system I’ve discovered: Chao Zhang et al’s 2017 “TrioVecEvent: Embedding-Based Online Local Event Detection in Geo-Tagged Tweet Streams”.

Zhang’s team has successfully addressed and solved the big issues with existing social media event detection systems: notably:

capturing short-text semantics, and
filtering uninteresting activities

The use a two-step detection scheme:

divide tweets in the query window into coherent geo-topic clusters
1. learn multimodal embeddings of the location, time, and text
2. cluster the tweets with a Bayesian mixture model
extract feature set for best classifying the candidate events

What I found really cool about this approach was they ranked features of their model on importance.

Turns out latitude and longitude concentration and what they term “spatial unusualness” and “temporal unusualness”, along with “burstiness” are the most important features in the model

################################################################

Zhou & Chen (2015) wrote an excellent “An Unsupervised Framework of Exploring Events on Twitter: Filtering, Extraction and Categorization”

In particular, Alan Ritter wrote 2 papers: “Named Entity Recognition in Tweets:
An Experimental Study” (2011) and “Open Domain Event Extraction from Twitter” (2012). Ritter’s github repo I found intriguing.

But the big win I discovered was “A Deep Multi-View Learning Framework for City Event Extraction from Twitter Data Streams” from Farajidavara et al (2017). This team from the UK quite literally wrote the paper on extracting events from Twitter data.

After finishing the lit review I started doing some basic EDA, exploratory data analysis. I always love this part of a project, as it’s getting familiar with the new dataset, analyzing relationships between the variables, and getting a high-level understanding of what the data is showing.

The task is to detect (and classify) events (e.g. concert, sports game, festival, etc) from Twitter data. As with much of data science, there are many ways of skinning the proverbial cat.

How to infer that an event is occuring based on the data? First and most simple approach is to naively search through each tweet caption for specific keywords that allude to or explicitly mention an event type. This basically amounts to creating a specific event-name dictionary. Problem is that people aren’t robots and few people would tweet something like, “I’m heading down to watch the sports game tonight!!” Instead they’d probably substitute “Reds” or “#CincinnatiReds” for the generic “sports game”. We might get away with this approach for certain events, as most people would refer to a concert as “concert”. Same for a festival. But even that is complicated by the myriad instances of the Concert object. For example, someone tweets “I’m excited to see The War on Drugs tonight at Bogarts!” Our model would fail to infer that “The War on Drugs” is a band, and the context of the tweet being that they are playing at a concert. Indeed, the word “concert” is never explicitly mentioned.

So how to solve this conundrum? Well as with much of data science there are several ways to skin the proverbial cat, each with strengths and weaknesses.

One straightforward way is the classic Named Entity Recognition (NER). There are three excellent papers on this very task:

Vavliakis – ‎2013 actually wrote a decent paper on Event identification in web social media through named entity recognition and topic modeling that is highly relevant to our use case.
Abinaya (2014) also has a great paper on this in Event identification in social media through latent dirichlet allocation and named entity recognition.
Analysis of named entity recognition and linking for tweets by Derczynski (2015)

From LDA (Latent Dirichlet Allocation), to LSI (Latent Semantic Indexing), to HDP (Hierarchical Dirichlet Process) there are no shortage of methods to group texts into a set of topics.

One of many challenges is using LDA or LSI is tuning the hyperparameters, namely “How do I choose how many topics to classify?” It’s not an easy question to answer: choose too few and your model is not finding enough variety of topics. Choose too high and your model will be unusably complex. For example instead of basketball game it identifies topics at a much more granular level of detail (e.g. College_basketball_game, high_school_basketball_game, basketball_game_being_played_by_elves_versus_dwarves, etc).

Peter Ellis writes about how to solve this using good ol’ cross validation in his blog post “Determining the number of “topics” in a corpus of documents”. It’s unfortunately written in R rather than Python, but we won’t hold that against him 😉

Zhao, et al (2015) describe the issue:

“Lacking such a heuristic to choose the number of topics, researchers have no recourse beyond an informed guess or time-consuming trial and error evaluation. For trial and error evaluation, an iterative approach is typical based on presenting different models with different numbers of topics, normally developed using cross-validation on held-out document sets, and selecting the number of topics for which the model is least perplexed by the test sets… Using the identified appropriate number of topics, LDA is performed on the whole dataset to obtain the topics for the corpus. We refer to this as the perplexity-based method. Although the perplexity-based method may generate meaningful results in some cases, it is not stable and the results vary with the selected seeds even for the same dataset.”

Basically we train a bunch of models with a variety of values of k (where k is the number of latent topics to identify)

There are also several deep learning approaches to topic modeling. In today’s age of new libraries and tools being released daily, it’s easy to get caught up in the latest new shiny toy. It’s important to remember, however to always start with a stupid model. A wise guy named Al once said “Everything should be made as simple as possible, but not simpler.”

We could use a dictionary of event keywords to use for direct event detection, and that certainly could grab some low-hanging fruit, but many if not most events will not be explicitly spelled out in such a dictionary. Thus we need a more flexible, dynamic approach to event detection.

Michael Kaisser present a pretty solid and relevant talk way back in 2013 at Berlin Buzzwords called “Geo-spatial Event Detection in the Twitter Stream“. While somewhat outdated, I borrowed several techniques he proposed to generate a score for a twitter event. One of the many challenges of identifying Twitter events is how likely a potential event “candidate” is, in fact, an actual event. From a Bayesian perspective, we can never be fully confident that a candidate event is an actual event, but we can greatly increase our confidence as we find more evidence to support that a candidate event is an actual event.

One way of addressing this is counting the number of unique users who are tweeting about a given candidate event. If only a single user is tweeting about a candidate event, we probably shouldn’t give a lot of confidence that it is an actual event.

In contrast, if, say 7 unique users all tweet about a related topic within a certain time window and perhaps even within a certain geolocation radius, we should (in true Bayesian fashion) update our prior beliefs and increase our confidence that said candidate event is an actual event.

If you have lots of different people posting from the same geographic location, then that indicates a high probability of an event.

So factors that should increase our confidence of a positive event-detection should include:

number of unique users tweeting about the candidate event
geographical proximity of the candidate event
temporal proximity of the candidate event.
lexical similarity of the text context of the candidate event’s tweets

So in general, if a set of tweets contains similar topics and similar words, being tweeted from roughly the same location, clustered over a similar time, from different users, we should have high confidence that there exists an Actual Event concerning that set of tweets.

We can further leverage toponym names in tweets and use them to extrapolate likely geo locations. (e.g. “Having a beer tonight at @16BitBar” or “Great salsa dancing tomorrow evening down at #fountainsquare”, etc). Ajao 2017 actually has a great paper on location inference techniques on Twitter. But since all the tweets in this particular dataset contain geotagged coordinates, we will ignore the issue of inferring location from non-geotagged tweets.

Geographic proximity is probably a slightly stronger factor than temporal proximity, since events could potentially occur over hours, days, even weeks (e.g. the Cincinnati Reds baseball season, taken in totality, could be considered an event, though it occurs over several months).

To determine which set of features comprises the optimal model (as measured by Precision and Recall), we could apply a random forest model to try various feature sets

Evaluating the Model

So how do we evaluate our unsupervised model? Since we don’t have a set of labeled training data, how do we know the model is valid?

There are two approaches to evaluating an unsupervised LDA model, and they both use a “Human in the Loop:

Pretty much every approach of event detection heretofore has used K-fold cross validation (usually setting K=10) to evaluate their model. This is a sound approach, and with little reason to a solid method of evaluation, this is how I similarly evaluated my model.

Problem with evaluating an unsupervised model like LDA is there is no ground truth, so no cross validation (Nikolenko 2014). So one solution is to hold out a subset of documents (tweets), and then check their likelihood in the resulting mode.

Another solution is

Summarizing a Detected Event

This is more of a secondary priority task to actual event detection, but once an event has been detected, we need to summarize the content of that event, at least for labeling purposes. This too is a non-trivial task and Rudrapal et al 2018 wrote a 20 page article on the various ways to summarize an event. Since I’ve read their article I’ll spare you the details and summarize their summary: there are two main branches of Twitter topic summarization:

based on summary content
1. extractive summaries
2. abstractive summaries
based on event category
1. generic summaries
2. domain-specific summaries

The difference being that in the former, actual tweets concerning a given event are used to summarize said event, while in the latter,

There’s also a great summarization framework called Sumblr, which summarizes tweet streams with time line generation

Lin et al (2014) has established the de facto way of evaluating text summarization in his paper “ROUGE — A package for automatic evaluation of summaries” through several scores.

US Election 2016: Get Ready to be Disappointed

July 20, 2016 / Christopher / Leave a comment

The sad state of politics in this country is so beyond repair. It’s telling how disillusioned our generation is with the two parties and each of their candidates. People are slowly recognizing that the illusion of choice is just that: an illusion, and we are tired of being fed bottom-of-the-barrel politicians who simply say what people want to hear, make claims they never intend to keep, and are only in the running because they have the money to fund their campaign/advertising efforts. I hope the American people are finally understanding that the wealthiest candidate has a statistically much better chance of winning office than those who are less wealthy.

Furthermore, imagine how many problems we could solve or abate by directing all the campaign funds to providing clean water, food, education, dental care, health care, etc for people. Instead, we squander those same funds for TV time and billboards. What a waste. It’s a rampant and systematized misallocation of resources. To wit: https://www.opensecrets.org/orgs/list.php

As it has been for many decades, America is a country of the lobbyists, by the corporate lobbyists, for the corporations. The fact that US citizens continue to take our country’s political machinations seriously is evidence of just how deeply misled and/or oblivious we all are of reality.

I cannot and will not buy into a system that is rigged in such a way and funded via de facto bribes via lobbyists and that’s why I too will not vote in November. “Mickey Mouse for President! He’s just as arbitrary as the other current candidates.” Regardless the established system has very powerful entities that are incentivized to keep their power intact. That is why change will not happen under such a system. That is why we will all be (or should be) disappointed in November. Womp womp.

Slavery Alive and Well: We are All Debt-Slaves

July 9, 2016July 9, 2016 / Christopher / Leave a comment

I was reading an article about the recent Dallas police shootings in response to the recent shootings of Alton Sterling and such. The police abuse of power is out of control, no doubt. Especially with regard to treatment of blacks. It’s terrible and I’m not sure how to go about fixing it. The racial issues in this country are so deep that it may never be fully fixed.

However, this issue is but a symptom of a much more widespread and insidious problem that we face: the global debt enslavement. People think slavery is in the past and no longer relevant. In fact it’s just as common today as it was 200 years ago, and has actually increased its reach to encompass much more than the color of one’s skin. Today, the vast majority of us are slaves to the global monetary system, mainly via banks and the Federal Reserve. It does not care what color your skin is. It does not care where you were born or where you are from. It does not discriminate on age, race, gender, ethnicity, or sexual orientation. Debt only discriminates on the numbers in your bank account, your FICO score, your credit score. No, today’s enslavement is much more ubiquitous than that of the 19th century.

Only today’s debt-enslavement is sneakier. More evil. It’s invisible, and worse, there are fewer physical signs to detect it. Today’s global slavery has exchanged shackles for personal finances. It’s subtlety and persistence is its power. At least if we were all trapped behind physical bars, we could acknowledge our situation and plot to escape imprisonment. But debt enslavement works so well because most people have no idea how entrapped they are. Indeed, it’s hard to tunnel out of a prison cell if you do not even realize you’re living in one.

If we fix the much wider issue of global slavery via debt, then we can begin to address issues of past racial enslavement and other social issues. Until then, it is akin to cutting your garden weeds rather than pulling them up fully from the root; racial issues of inequality will continue to “grow back”. However, if we work together and address the banks as our common foe, racial inequality stands a much better chance of being addressed and fixed. But to get there we need to address the fundamental issues wrong with our global economical state of enslavement. We are all in this fight together. I would make the argument that this issue is at the core of what we ought to be concerned with in our lifetime, and in our every day lives. For it is nothing short of our very innate right to freedom and value as a human being that is at stake. As a great man once said, “Injustice anywhere is a threat to justice everywhere.”

How to not give a fuck

February 12, 2016 / Christopher / Leave a comment

Mark Manson presents a compelling argument in a blog called “The Art of Not Giving a Fuck”. I’ve adopted a similar philosophy over the past two years and attest that it has resulted in marked more happiness and life-satisfaction. There are certain aspects of life that certain require caring about, but we tend to care waaaay too much about things that don’t matter all that much. Pro sports games. Reality tv (actually most tv of any kind). Kardashians. Miley Cyrus. What our neighbor/coworker/significant other may think of us or judge us if we do/say a certain thing. How we are perceived by others around us. Unless you actually buy into George Berkeley’s nonsense idea of mind-dependent reality (“Esse est Percipi”), none of these things actually matter. I advocate giving a f**k about the small number of things that actually matter (e.g. your health, your happiness, engaging meaningfully with people who are important to you like your close friends and/or family, your passions/goals/”purpose”), and for the rest of the trivial detritus and gossip (c.f. Heidegger’s “das gerede”), just let go of it; it’s meaningless. I recommend the 12 minute read.

Too Big to Fail: A Review and Reaction

August 15, 2015 / Christopher / Leave a comment

So I’ve been reading Andrew Sorkin’s book “Too Big to Fail” about the 2008 financial crisis. I have never been one to find the world of high finance to be particularly interesting, so wasn’t expecting much. But I picked up the book because I wanted to understand and analyze what were the root causes for the near financial collapse. I have not yet finished it, but overall, it’s been incredibly interesting and enlightening. It reads less like a vanilla financial statement and more like a spicy, saucy soap opera. There are so many characters whose choices affect each other in myriad ways. A theme I’ve deduced from the book overall is the profound impact of human hubris, particularly from otherwise smart individuals with a past history of “success”. This hubris, combined with strong egos and aggressive, stubborn personalities, ultimately lead to the events we experienced in 2008.

One thing I found fascinating was just how…shall we say “creative” the top financial firms were in both creating “wealth” as well as preserving that wealth and their own financial well-being.

For example, when Bear Sterns tanked in early 2008, the other big investment banking firms could see the proverbial writing was on the wall. When the federal government concluded that to allow the big firms to collapse of their own accord, in a harsh acting out of the survival of the fittest paradigm, it was deemed they were “too big to fail”. But as investment banks, Goldman Sachs and Morgan Stanley were not banks and therefore were not protected by the FDIC. The solution? No problem: On September 21, 2008, both Goldman and Morgan just file your firm as a traditional “bank holding company” and suddenly you’re eligible for the now-infamous “bailout” money. As Goldman CEO Lloyd Blankfein put it so cryptically, “…our decision to be regulated by the Federal Reserve is based on the recognition that such regulation provides its members with full prudential supervision and access to permanent liquidity and funding,” Hmmm. Full prudential supervision. Access to permanent liquidity and funding? Sounds like a pretty good deal to me. Especially when you get to receive $12.9 billion from AIG counterparty payments via the Federal Reserve and then another $10 billion in TARP funds for “Troubled Asset Relief”. Wait, “troubled asset relief”? They’re getting $13+10 billion, for christ sake, “troubled asset” seems a bit of an understatement.

So basically Goldman Sachs just changed their as a bank holding company and thus qualified for federal bailout money?

In all, Goldman misled and even flat-out lied to its investors and profited from the collapse of the mortgage market. And how were they punished for this injustice? A congressional investigation that amounted to essentially a slap on the wrist, and a soft one at that. The motif at play here is the same that has recurred throughout history, which is that crime pays and money talks. If you are wealthy enough, you have enough power, even implicitly, to change your environment to suit your self-preservation. Goldman Sachs and Morgan Stanley shouldn’t even exist now. They should have been wiped out in the 2008 crisis, like the dinosaurs, in a fit of financial and Darwinian natural selection. The question is “Why didn’t they die off?” The answer is because had the US government allowed such massive institutions to fail, the natural way that the free market would dictate in such a scenario, the American economy would have truly collapsed. This is scary of course, but even more troubling is that our financial system is predicated upon such a shaky foundation that a few financial institutions are so incredibly powerful/valuable that their continued existence supersedes prudent fiscal policy.

To put it in perspective, if I, as a single investor, made some (in hindsight of course) poor investment choices and lost 95% of my net worth, the US government would give absolutely zero shits and I would essentially be told, “Tough. Deal with it. No excuses, blah blah blah…” and similar rhetoric. That is because my demise would be inconsequential to the overall global system. But if a massive firm like JP Morgan Chase, Citigroup, Bank of America, Morgan Stanley, or Goldman Sachs were to deteriorate, they would suffer zero consequences. What precedent does this establish in similar future situations? Simply: that if you are big enough and important enough, there is no penalty or ill consequences for taking exorbitant risks, since the federal government has gottcha covered. So risk away, rich man! You are protected.

The Rise And Fall Of The Full Stack Developer

August 5, 2015 / Christopher / Leave a comment

On the interconnectedness and interdependency of complex systems

June 20, 2015June 20, 2015 / Christopher / Leave a comment

Was listening to a great podcast called Numbers and Narratives while biking into work the other week, that interviewed Yaneer Bar-Yam, founding president of the New England Complex Systems Institute. What he said further confirmed this belief. To wit: “When things happen that affect everybody, they happen because of how things depend on each other…” Things happen that affect everybody. They happen because”

I’ve long espoused the belief that we are intimately connected to others and the world around us. We in the Western world in particular value the Individual with autonomy and control over our fate and like to see ourselves as largely independent of our surroundings. We hear stories like, “Well despite my poor upbringing I was able to rise above the challenge and pull myself up by my bootstraps, etc.”

Yet what we forget is how our actions, our outcomes, indeed our very identities are closely linked to those of other people and our environment. You cannot separate the two. We like to think we are wholly in control of our lives and have autonomy over our actions. But do we really?

Nowhere is this more clearly illustrated than, for example, a viral or disease outbreak.
When the wealthy CEO complains about the annual flu epidemic, he forgets that the actions he has taken have had a substantive financial impact on the population, which in turn affects that population’s ability to a) take steps to prevent acquiring said flu, and b) limit the spread of said flu once it has taken hold, both of which, ironically come full circle in affecting that very same person’s probability of contracting the flu. In summary, his actions have, indirectly through a series of causal chain of events, caused him to contract the flu. Public health is inextricably linked to the health of the least healthy subpopulation. The more unhealthy populations pose a risk to the healthy populations. Thus, when our system votes down the option to support the poorest of the poor, we are ironically shooting ourselves in the foot; showing no concern for the health of those lowest on the socioeconomic ladder will inevitably cause a spread of that pandemic to those higher on the socioeconomic ladder. Our lives are interdependent upon each other.

Another example is in how greatly our driving behavior affects each other. I spend a lot of time riding on the road, and being an inquisitive and observant person, I’ve learned a lot about driver behavior. Would you believe that drivers who have a heavy foot on the gas pedal are killing people? Allow me to explain: 4,486 U.S. soldiers died in Iraq and 2,345 U.S. soldiers died in Afghanistan according to the Huffington Post (http://www.huffingtonpost.com/h-a-goodman/4486-american-soldiers-ha_b_5834592.html). When Americans drive in an aggressive manner (surging when a red light turns green, driving 15-20 mph over the speed limit, not letting off the gas when they see a traffic light turn red up ahead, etc), they consume far more gasoline than someone driving more efficiently and intelligently. Those aggressive drivers must replenish the fuel at a higher rate at the pump, which collectively increases the market demand for gasoline/oil in the US. With a natural resource such as oil in relatively limited supply domestically, our government must get its oil sources abroad. This reliance on foreign oil increases our chances of having to go to war, when diplomacy fails, to take by force the oil that Americans use and rely upon so heavily. In order to enforce this action, our government backs up its strong words with military force. Inevitably, soldiers are placed into harm’s way and lo and behold: several thousand of them die as a result of IEDs and bullets. Once again: our lives are interdependent upon each other.

Thus there is a direct causal link between flooring the “pedal to the metal” when you drive, and some mother in Michigan whose son in the US Army is dead. Complex systems like the world in which we live are partially interdependent.

So let us consider the ramifications: that same person who drag-races from a stop whenever the light turns green is the very same person who is causing soldiers to die. How’s that for a heavy conscience?

I find this particularly fascinating because so many of the massive, gas-guzzling Ford F-350 heavy-duty pickup trucks have bumper stickers reading something like, “Proud to be an American” and “Support Our Troops”. How ironic that their substantial dependence on gasoline, and thus oil, actually contributes to the killing of those very same troops that they subsume to want to support.

There is no denying that this happens, that it happens on a daily basis, and on a massive scale. And yet most (if not all) people are completely oblivious to the implications of their actions on others’. Let me put it another way: If I asked if you would like an otherwise innocent mother to lose her son to combat wounds, how would you answer? Unless you’re an inhuman sociopath, you’d answer with a vehement “No way!” Yet knowing that your driving habits would get that very same solider killed, would you knowingly continue with your same habits? A rational person, understanding this sound logic, would also reply No. And yet, the overwhelming majority of American drivers continue to behave in a way that will inevitably get more of their American soldiers killed. I call that irresponsible. I call that sad. It is a testament to just how little most people actually THINK. Most people drive in a zombie-like trance on autopilot, never thinking how their actions affect the lives (and deaths) of others. Sad. Then they weep like babies when their 18-year old is sent home from Iraq in a body basket with an American flag neatly folded over his body. Sigh: if only they understood the interdependency of complex systems.

The United States In Two Words

February 25, 2015 / Christopher / Leave a comment

Pretty interesting and insightful finding of two most common words to describe each state tweeted.

First Blog

November 25, 2014November 25, 2014 / Christopher / Leave a comment

Hello 21st century. Yes, the rumors are true. I have begun a blog. After dismissing the notion of blogging as a “passing fad” and declaring, “I don’t need to make public my thoughts and ideas for any kind of validation”, I have finally seen the utility and benefits of blogging.

There are more than enough blogs occupying the blogospace these days, so what is the purpose of this one?

Well to be honest, I’ve been thinking of implementing some sort o’ blog, mainly for myself as an outlet for the myriad ideas and thoughts I generate on the daily. It’s partially a space to reflect, to organize thoughts. But I do also have a plan to turn this into something more of value to you, o Reader On The Interwebs. They say you should write what you know. And I know a few things (mama didn’t raise no fool). I know quite a lot about bike racing, about personal training, about nutrition and eating healthy. I know quite a bit about data visualization and data science and predictive analytics. I figure there’s some solid potential for the intersection of those two seemingly disparate realms. The real issue here is to narrow down what I want to focus on for blog-material. There is obviously a dearth from which to further delve.

The second reason for this blog is to re-center, something that I have found to be increasingly difficult as busy 21st century Americans. The reasons for this drifting are numerous, and a subject for future blog posts, but I have found myself over the past few years drifting further from living in the way I feel we “ought” to live. I have recognized this and decided to make a concerted effort to redress the issue.

A third reason for creating this blog is because I rather enjoy writing, and blogging provides a conducive medium upon which to spill the contents this here ol’ brain. The only way to get better at writing is to write.

I aim to write about a variety of areas, from things that I think can be designed/done better or more efficiently, to the most energy and time efficient method of commuting from one’s residence to one’s place of work, to the importance of nutrition in its role in preventive medicine, to solid music, to technology, human behavior, data-driven decision-making, modern fashion trends vs timeless style, interpersonal relationships, bicycle racing, time management, observations about this crazy, hectic world in which we are plunged, and of course, the big (or at least bigger) questions of an existential nature.

Motivation:

So I’ve been working in the Real World for 18 months thus far. To label it a prison…would be going too far. But after a year and a half, I’ve been doing some self-analysis and I realize the office environs has produced in this man one of the most unhappy periods of his life, defined by a near-perpetual state of anxiety, heightened cortisol levels, and general feeling of emptiness. Yes, that’s probably the most accurate way to describe it: empty. Not necessarily good or bad, but mostly just empty. I am not entirely sure why this is, but I can attest with great certainty upon my own observations, that offices appear to be a most in-conducive environment for happy people. The people who aren’t walking around like quasi-living, empty husks of human beings are so heavily caffeinated and caught up in the rat-race overly concerned with perception that they aren’t even aware that they are unhappy.

I feel that I am capable of more in life. Much, much more. I also think a lot of people get complacent in their current state and become reluctant to change or, more to the point, evolve and grow. The past year or so has been some of the greatest stagnation I’ve ever personally experienced. Is this really what awaits the average educated adult American male?

I say there is more that can be had by life! Do not surrender to complacency and faux-comfort so readily. Resist the temptation to ask yourself what kind of dining set defines you as a person

I was re-reading parts of Ernest Hemingway’s The Sun Also Rises tonight and a passage from chapter 14 spoke directly to me. I shall transcribe it here for effect:

Ernie is rife with sage advice…

“You could get your money’s worth. The world was a good place to buy in. It seemed like a fine philosophy. In five years, I thought, it will seem just as silly as all the other fine philosophies I’ve had. Perhaps that wasn’t true, though. Perhaps as you went along you did learn something. I did not care what it was all about. All I wanted to know was how to live in it. Maybe if you found out how to live in it you learned from that what it was all about.”

So I guess that’s what I’m saying. I want to start be knowing how to live in the world. Hopefully by doing so, and doing it well, I can deduce some deeper meaning from it. Good night invisible readers, and human on…

datacyclist

Cycling. Bike racing. Nutrition. Data science. Observations about our world.