“We have built a way to harvest Twitter data while people are on the roads. It’s a controversial idea, because people shouldn’t be tweeting while they’re driving,” says Professor Richard Sinnott, director of the Melbourne eResearch Group and from the Melbourne School of Engineering’s Department of Computing and Information Systems.
Professor Sinnott and his team do their analysis using Australia’s road system data, available through the Australian Research Infrastructure Portal (AURIN) portal.
“Just as measuring a person’s vital signs — pulse, temperature, breathing rate and blood pressure —tell you a lot about their health. Measuring signs of activity, life and movement in a city can tell you how functional or dysfunctional it is,” says Dr Serryn Eagleson, an urban researcher and Manager Data and Business Applications at AURIN.
Collectively the AURIN community have curated and integrated vast amounts of urban data — over 2,000 datasets from 40 providers — mapped it and developed tools to analyse it so that researchers like Richard can put it to use.
“We’ve written a set of algorithms that identify tweets that originate on the road system. You can write an algorithm that moves along CityLink or the West Gate Bridge in Melbourne, or the Southern Expressway south of Adelaide, checking for tweets in each small section of the road,” Professor Sinnott explains.
Clusters of tweets
In this instance, Professor Sinnott and his algorithms aren’t reading the actual tweets.
“We’re not interested in what they’re tweeting about, whether it’s ‘I’m stuck in a traffic jam’ or ‘I had bacon and eggs for breakfast’,” he says. “The fact that they’re tweeting and, for example, they’re on CityLink is meaningful.”
The assumption is that people will tweet while on our road network if they’re in a traffic jam, if they’re stopped, or if they’re a passenger.
“We write software that can identify clusters of tweets in space and in time. Several tweets over 15 minutes from different people in the same location could indicate that something is going on there—traffic building up or an accident. From this, we can identify accident black spots and traffic jams in real time.”
“Our results compare well with official data for accident black spots from VicRoads.”
A tweet-ful of useful data
A typical 140 character tweet has nine kilobytes of data and metadata, such as your profile information, who you follow, who follows you, what language you’re tweeting in, what device you’re tweeting with and, if geolocation services are turned on, where you’re tweeting from. Professor Sinnott says this data has huge potential.
“For example, census data will tell you Vietnamese-speaking people lived in Melbourne’s inner eastern suburbs, like Richmond and Burnley in 2011. Twitter data might show there are a significant number of Vietnamese-speaking people in Point Cook (in Melbourne’s western suburbs) right now!”
“You can use this data to capture what is really happening, so that policy decisions can be driven by what we know, rather than what we think we know. If you’re going to build a new roadway, buildings, or large scale infrastructure, you want to know which people are there and that they will be well serviced.”
An international perspective
World-renowned British urban planner and geographer Professor Michael Batty — winner of the Vautrin Lud Prize — also uses the vast amounts of data generated in modern cities in his research. His work was presented in Melbourne as the inaugural AURIN Lecture, and can be viewed online.
He keeps his finger on the pulse of London’s Tube system by measuring Oyster card ‘tap-ins’ and ‘tap-outs’; similar to Melbourne’s myki and Sydney’s Opal public transport smartcards.
“We’re embedding computers for the first time into the built environment,” Professor Batty says, describing transport smartcard sensors as fixed computers and smartphones as mobile computers.
“The ‘exhaust’ from all this technology use is big data.”
The beeps of smartcard ‘touch-ons’, the tweets on social media and credit card and EFTPOS transactions are signs of activity in our modern, technology-assisted lives. Just as loyalty cards give retailers a lot of powerful information about your shopping habits and favourite products, mapped transaction data and social media activity can reveal a lot about city systems.
Professor Batty uses Oyster card data from London’s transport system to understand and improve flows through The Tube, including during disruptions on the network or major events, such as the London 2012 Olympics.
Who owns the data?
In 2015 prospective myki operators were asked to pay $50,000 to Public Transport Victoria for access to past usage data. Should researchers trying to improve transport systems and inform policy debates be asked to pay for data?
“Cities are huge data generators, but who owns that data?” Dr Eagleson asks. “As government services are increasingly outsourced to the private sector, it’s important that we consider the ownership and use of data generated by these services, and potentially build access for future research into service provider contracts.”
Wearing your heart on your tweet
In Melbourne, Professor Sinnott and his colleagues are also exploring whether or not analysis of the Twittersphere can help define the mood of the electorate and predict election results.
“We can do sentiment analysis, looking at whether people are happy or not. You can correlate public sentiment regarding particular politicians or election issues. And you can look at specific electorates— the swinging seats are interesting,” Professor Sinnott says.
“We’ve previously used mapped sentiment analysis to predict the outcome of the 2015 United Kingdom general election pretty accurately.”
For research on Australian people and places, Professor Sinnott and his colleagues and students import their (BYO) Twitter data in to the AURIN Portal, where they can interrogate it in more detail and compare it with existing datasets—including electoral boundaries, roadmaps, areas of high income, education levels, cultural and language information—that AURIN has cleaned, mapped and integrated for use in research.
“You can even ask silly questions like ‘do rich people swear more than poor people’ by comparing maps of bad language on Twitter with income distributions across suburbs.”
Studying the sentiment expressed in social media means that researchers can effectively conduct polling without people knowing they’re being polled.
“We can spot where people feel disenfranchised. It’s not an exact science, but you get such large quantities of data that you can infer patterns.”
Banner Image: Melbourne in rush hour. Picture:Erin Geary/Flickr