The Secret Life of (One) Professor: Two Years In

June 24th, 2010 by naaman

Matt Welsh of Harvard recently wrote on the Secret Lives of Professors, a post that stirred a lot of discussion and struck a chord with a somewhat less experienced professor (that would be me; two years on the job vs. Matt’s seven). I found my self nodding at many of Matt’s well framed observations.

Matt’s main “surprises” and lessons that he offers to grad students in his post include:

Lots of time spent on funding request. I have had a similar experience, because (like Matt) I enjoy working with, and leading, a large group of researchers. Of course, the batting averages are low for funding requests (Matt downplays his success rate but I bet it’s better than average). In my first two years, I submitted 3 NSF proposals, 2 of which were declined and one outstanding (a good sign); I am currently working on two more. Each of these took significant effort, in one case at least (an estimated) two full months of my time. In addition, I submitted a number of smaller-scale proposals, most of them to quick and easy to write, and was fortunate enough to get a Google Research Award (thanks again Goog!), and to be assigned as a faculty mentor to a superstar two-year postdoc Nick Diakopoulos. Together with some other odds and ends (thanks SC&I!) I feel pretty happy after two years regarding the group and resourced I amassed; but the cost on my time is still substantial. On the bright side, as Sam Madden points out in the comments to Matt’s article, some of the grant proposal process is actually helpful in helping me think about future work and research agendas, even if the specific proposal does not get funded.

The job is never done. Even as I write this, I could (and feel that I should!) be editing a paper, or looking at some data, or catching up on email, or working on one of two said proposals. Matt’s admits:

For years I would leave the office in the evening and sit down at my laptop to keep working as soon as I got home.

I can’t say my experience is far from that, although I still insist on taking good vacations. And a 2-year old kid certainly makes for a compelling reason to stop working at any time.

Can’t get to “hack”. True enough, most of the interesting work is delegated to students, as Matt complains that he doesn’t find time to write code. However, that is partially the decision that Matt (and I) knowingly take when we decide to work (and try to fund) a large group of students. Managing fewer or no students might allow more individual research work, which is certainly a path taken by some faculty that skip on the funding requests and the resultant students meetings. However, I am no Ayman, do not miss writing code, and am happy to farm that out to students. I do enjoy thinking about the intellectual and research issues, and often get to do that with the students. I would like to have fewer meetings and less email, but unlike Matt I feel involved enough in the intellectual work, at least so far. Nevertheless, I can’t dive into it like the grad students who indeed “have it good”.

Working with students. Matt writes:

The main reason to be an academic is… to train the next generation.

I see it the same way (the intellectual pursuit is also up there, but it could be claimed that you can perform similar intellectual pursuits in other settings like research labs). The students is why I am in academia, and the advising is by far my favorite activity. From solving someone else’s problems (e.g. a student not sure how do approach X or Y) to, more substantially, showing students a path from a first-year confusion to an experienced researcher that understands how to ask (and answer) research questions, and communicate it effectively. Well, I am clearly not quite there yet having just recently started doing it (and just started funding my first PhD student). But I am enjoying it already. Like Matt, for me it is not just working with the PhDs and Masters students; the undergrads play a big role. I started working with several star undergrads, some of them have never SSH’ed into a server before, most of them have never seen how research is done. Their wide-eyed excitement is an energy source, an inspiration and a cause of constant enjoyment.

So, the bottom line?

It is certainly not for everybody. It remains to be seen if it is even for me.

I will buy that, Matt. At the end of the day, for me, it’s the students, and the freedom to carve my own path. This summer I am lucky enough to be working with my group at SC&I consisting of one postdoc, 2-3 Phd students, 3 Masters students, and 1-3 undergrads (at any given time). With teaching (more on this topic later) out of the way, I spend two full days a week with this gang talking about research, writing papers or grants, having other “good” meetings, or playing Rock Band on our Wii. It’s definitely one of the best work summers I have had, much like my summers at Yahoo! Research Berkeley where we had most of our fantastic interns join in on the fun.

Speaking of the defunct Y!RB, and regarding that path-carving freedom, I feel a lot less constrained in academia compared to industry research. I have had a fantastic experience at Yahoo!, and was lucky to have a great team at the Berkeley lab. However, to start my own project at Yahoo!, that follows my own personal vision, and involved multiple people, would have taken a lot of convincing (and would need to be ultimately tied to corporate agenda). I know Ayman does not agree, so maybe this is just a false sense that I have, that moving a bunch of people towards a vision that I choose and craft is easier in academia. To do that with the students might be, as Matt put it, “the coin of the realm”.

Apple Does Migrations (Almost) Perfectly

June 17th, 2010 by naaman

Just got a new Macbook pro. I’ve been on Mac for about 5 years now, and the number one most impressive feature to me is the migration. As someone lucky enough to be in a place with a fantastic IT department (yes, I know that’s unlikely, but our IT people are superstars) it means just dropping off my old Mac, and, voila! few hours later I have all the setup I had before (down to the browser history items), reproduced on a lovely new machine.

Just a few things went wrong, most of which are Apple’s fault, and some of which are quite annoying.

First, the Mac didn’t recognize the iPhone. Luckily I was clever enough to think of checking for a Mac software update, and sure enough, the only update available was a fix to this bug. +1 point, Apple.

But it got worse once the iPhone was recognized. Soon enough I got this notice right here:
Annoying.

OK, a little scary, and totally wrong (not getting into DRM discussion here) but not so bad as a user experience — the dialog allowed me to continue, give me options, I can live with that (but why didn’t the migration carry forward my authorization?). Anyway, I asked to authorize, only to get another prompt: Something like “sorry, you already have 5 authorized computers”.  This time, I was offered no way out other than acknowledging that lovely, yet curious fact (which 5 machines I had authorized? Ayman certainly didn’t get my permission for any content!). I was too shocked to take a screen grab of that pesky dialog. Still, this wasn’t a big deal, because I knew what to do – de-authorize all my computers (the only one I knew I had authorized was not with me — I migrated from it, see — so I couldn’t just de-authorize it). But that’s wrong, Mr. Jobs. Why would a “normal” (i.e., not 6’8″) user know how to de-authorized their other computers? Instead, I would like to have seen this process:

1. “Hey, it seems like you already reached the maximum number of computers allowed to access your licensed content! Would you like to fix that?”

Options: But of course! / No, I’ll just curl up in the corner and cry

2. “Here are the details of your 5 authorized computers. Which one(s) would you like to de-authorize?”

Options: Select any number of computers to de-authorize.

3. Done!

Easy, Steve? -gazillion points, Apple!

Another thing that didn’t migrate properly was my Screensaver (although my desktop pictures preference were kept). I guess that’s because in Snow Leopard you need to use iPhoto albums to choose screensaver photos. But why would Desktop background work and screensaver break? Slightly bizarre.

The wifi was also a mild annoyance, forgetting all my preferences (but at least remembering the networks’ credentials for secure networks).

Finally (geek/grad student topic alert), I lost my Latex (MacTex) installation in the migration to the new Mac. I mean, the files were still there but the migration broke a few symbolic links and just tampered with a folder structure enough to make my various Latex editors not find the MacTex installation. MacTex have a several-step solution, but you know me, I take my short cuts (just upgraded to MacTex 2009), which fixed all these issues.

So, Apple could have made this really close a perfect game, but allowed a couple of walks in there late in the innings, just to have Naaman complain. Well, what would I do without them.

Mor : Kanye :: Ayman : ______

May 26th, 2010 by ayman

In case you missed the Poster Madness track at ICWSM 2010, Naaman helped me out with a stellar performance.


Ayman: Taylor, Naaman: Kanye from Mor on Vimeo.

Many thanks to Eric Gilbert for being a great sport too!

Conversation Shadows and Social Media

May 25th, 2010 by ayman

If you find yourself at ICWSM this week, say hi to us. I know I’ve been introduced to Naaman at least twice so far; I believe he still writes here. So far it’s been a nice mix of the standard social network analysis to S. Craig Watkins’s talk on Investigating What’s Social about Social Media (he’s from UT Austin’s Radio TV and Film department and gives a great perspective on personal motivations and behaviors). Yahoo!’s Jake Hofman gave a great tutorial on Large-scale social media analysis with Hadoop.

Tonight, I’ll be presenting my work on Conversational Shadows. In this work we look at how people tweeted during the inauguration and show some analytical methods for discovering what was important in the event, all based off of the shadow their Twitter activity casts upon the referent event. Let me give a clear example.

Ever go to a movie? Did you notice people chat with their friends through the previews. Once the lights go down and the movie starts, they stop chatting. Sure they might say “this will be good” or “yay” but the conversation stops. I began to wonder, shouldn’t this occur on Twitter while people are watching something on TV. Does the conversation slow down at that moment of onset or when the show starts?

During Obama’s Inauguration, we sampled about 600 tweets per minute from a Twitter push stream. The volume by minute varied insignificantly. However, “a conversation” on Twitter is exhibited via the @mention convention. The mention is highlighted. It calls for attention from the recipient. Our dataset averaged about 160 tweets per minute with an @ symbol. Curiously, there were 3 consecutive minutes where the number of @ symbols dropped significantly to about 35 @s per minute. We still sampled about 600 tweets, just there was a general loss of @s. People hushed their conversation. Perhaps even gasped. Here’s a graph to give you a better feel:

During those minutes where the @ symbols dropped, Obama’s hand hit the Lincoln bible and the swearing in took place. People were still shouting “Hurray!” but they weren’t calling to other’s via the @ symbol. Following the human centered insight (as we found by studying video watching behaviors), we can examine the @ symbols to find the moment of event onset. We call this a conversational shadow: the event has a clear interaction with the social behaviors to be found on the Twitter stream. We’ve found other shadows too, come by the poster session tonight to see them or, if you can’t attend, check out my paper.

HTTP and the 5th Beatle

May 5th, 2010 by ayman

Have you seen HTML5/CSS/Javascript lately? It’s crazy. More than a simple markup for a web page layout, we now design for interaction with the web. You can query the GPS for your location or even play Quake using Canvas. With all these amazing advancements, one thing does trouble me: HTTP/1.1 was last modified in 1999! Why is that? HTTP provides 7 verbs of which our browser uses two (GET/POST) or maybe three (if you are one of the few fancy people to use PUT). It’s asynchronous and transactional. You want a push update. Forget it. You want a stream to listen to, not happening. Much of the things I build need synchronous interaction which we can’t really do if we keep polling for information via GET requests. Is it ok just to fake it? After many a conversation with my colleagues, Elizabeth and Prabhakar who happened to be the panels cochairs for ACM WWW 2010, I really wanted to start thinking about what the next generation of the web should be and how we can build it?

So, last week at WWW 2010, I ran a panel entitled “What the Web Can’t Do” to address these issues. On the panel was: Adam Hupp (head engineer of the Facebook News Feed), Joe Gregorio (from Google and long time supporter of httplib2, REST, and web technologies), Ramesh Jain (UC Irvine and video guru), Seth Fitzsimmons (now at SimpleGeo with a past life at Flickr and Fire Eagle), and Kevin Marks (at BT former Googlite and Technorati).

Adam pointed out quickly that HTTP Long Poll works best for most applications. WebSockets might solve the rest of the gaps but one must consider this balance of latency and experience. The problem becomes how do we handle notifications, you can’t wake someone’s browser. To follow up, Joe reminded us (well me) the web is more than HTTP but rather a stack of technologies that becomes the whole browser experience. Furthermore, he cited HTTP to be Turing Complete (by the way, Turing began that definition with the phrase “Assuming the limitations of Time and Space are overcome…” which is the very nature of the problem with a transactional protocol and synchronous interactions).

Ramesh lead us from there into problems with interactive video online, and that we have no way to represent events in the stack (short of some SemanticWeb efforts which havent really taken full force, or as Naaman said are dead). Echoing this, Seth pointed out the current collection of the stack leads to privacy problems. If I delete a tweet from Twitter, it’s still in the search index of Google and Bing. Maybe the crawler will come to understand it should be deindexed and decached. Maybe!

Finally, Kevin Marks, who recorded the whole session using QIK [1, 2, 3] with a little help from his friend, pointed out the mess we are creating with no convergence. Pick an HTML5 browser and he’ll point out what video won’t work due to CODEC supports. More so, we traditionally handle these streaming connections through hidden Flash objects on the page; if we leave the plugin architecture in defense of open technologies, what will fill the gap?

At this point, I asked Joe Gregorio:

How come there is no FOLLOW or LISTEN verb in HTTP?

To which he responded:

You mean MONITOR. Actually it was in the original spec. It was abandoned because nobody could agree or figure out how it should work exactly. [NOTE: I'm paraphrasing here, cue the tape for the exact quote]

This floored me—an 8th verb! It’s like discovering there was a 5th Beatle or a 32nd flavor of Baskin Robbins! There was a verb at one point which would facilitate highly synchronous HTTP connections but it never made it to production. This all spoke to what Kevin was getting at: the issue is a combination of politics and technology. Seth and others on the panel started to conclude that OPEN technologies follow CLOSED innovation. In the case of the Web, we are are 2-12 years behind what should happen.

This is why now I can run Quake from 1997 in Chrome. HTML5 and Canvas does what Flash was doing years ago. As much as I would love it, running Dark Castle in a web browser, doesn’t help me very much. Furthermore, these technologies have never duplicated the other: HTML5 and Flash are mutually beneficial; they always have been. I’m not interested in building what I could have built years ago. If one’s to build highly interactive Web apps, then you have to sit closer to the metal, the chips, and the hardware. And while my panel was happy to tell me to wait because it will happen (perhaps MONITOR will make a comeback), where will the deep innovation occur while the web plays catchup? We need to think about the whole ecosystem of the WWW and innovate it faster. Till then, I’m likely to be resolved to writing Objective-C iPhone apps against web services and other arbitrary sockets. Research shouldn’t ride shotgun; it should be behind the wheel.

How many characters do you tweet?

April 21st, 2010 by ayman

I’m fresh back from CHI2010. Unlike our friends from the EU who were left stranded in Atlanta (and surviving off the good graces of GVU faculty, students and staff. Several trapped students received some extra travel assistance from SIGCHI…if you’re not a SIGCHI member, you should join). On the Sunday, there was a great workshop on microblogging where I had a great conversation with many people, one of whom was Michael Bernstein. We began to wonder, yes there’s 140 character limit, but how many characters do people actually type? Since I happened to have about 1.5 million tweets on hand and a little bit of R knowledge I did a quick investigation at the coffee break.

This is really not the distribution either of us expected. Clearly the bulk of tweets are around 40 characters long. But it’s really curious to see the large set of tweets that are verbose. More so, the exactly 140 count is high. I’d imagine the >135 character spike results from people trimming down verbose tweets to fit into the post size limit.

Are your a tweeter that walks the line or are your tweets short and concise? I wonder if Naaman’s meformers tweet a different distribution when compared to the informers.

Annotations (Twitter reads the Ayman and Naaman Show?)

April 16th, 2010 by naaman

Hey, Naaman’s back for my favorite type of activity: the “I told you so / I called it”. Twitter just announced Annotations, here are some technical details, and here’s the New York Times coverage:

Another new tool is called annotations. Already, individual posts show which app someone used to write the post and the date, time and (if users choose to make it public) location. With annotations, software developers will be able to add other material, which Twitter calls metadata, to Twitter posts.

This could significantly expand the amount of information a post includes, beyond its 140 characters, and could enhance the way Twitter is used.

Posts could include the name of the restaurant where a post was written and its star rating on Yelp, for instance. Then, someone could find Twitter posts about restaurants nearby with five stars. Or developers could add a way to make a payment and purchase, so retailers could sell items from within a post.

Twitter does not know what developers will decide to do with the tool, said Ryan Sarver, who manages the Twitter platform. “The underlying idea is think big, push yourself.”

Sounds very close to what I asked for. Of course, there are the Machine-tag skeptics but they just need a good moment alone with Aaron Cope, Clay Shirky and a machete. Free the information hierarchies!

Quake, Rumble, Tweet

March 19th, 2010 by ayman

Months ago, a 4.1 quake shook up San Francisco. Most people barely felt it, but it did make more of a rumble in the south bay, closer to the epicenter. Twitter became a flood of quake tweets. My follower/following friend @tomcoates sent out a tweet asking about the lack of geo-coded quake bots.

Startled by this, I began a little investigation of ‘how hard could it be’. By the end of the night, I had made the little python bot @sfusgs. It received quite a few followers and made the TC. Here’s a quick walkthrough of how I put it all together:

From there I made @lausgs and @earthusgs (which is really popular in Chile). Naaman, it’d be pretty easy to filter the @earthusgs feed with a pipe to get a stream for the NYC area. You can see a map of all the worlds quakes on my usgs quake page.

Statler at CSCW 2010

February 3rd, 2010 by ayman

Statler & Waldorf Do you know these guys? They are described asThey are two ornery, disagreeable old men who…despite constantly complaining about the show and how terrible some acts were, they would always be back the following week in the best seats in the house.” Looking at their snark in aggregate, one finds them to be particularly noisy when Fozzie Bear performed. Early last summer, I began to wonder if now-a-days they would be tweeting snark during a show.

Fortunately, people have stepped up to fill the void and tweet while they watch tv. So last year I began investigating people tweeting during live events/performances in order to discover interesting moments, people’s sentiment, what people are talking about, and media segmentation. The Statler prototype embodies most of my findings to date:

Statler Screenshot

The prototype has two modes: Debate 2008 & Inauguration 2009. Based on a sample of tweets from the first debate of 2008, Statler automatically identified 9 topic segments which align to CSPAN’s editorial slices with an accuracy of 93%. You can also see the trending tweets in comparison to top terms from the debate speakers (taken from the closed captioning). For the Inauguration, Statler uses 50,000+ tweets taken from the public timeline to give a more ‘real-time’ feel to how the crowd is moving as the tweets, the tweet structures and terms change over the course of the swearing in and the speech. Of note here is Statler identified the moment of swearing in as the most interesting point during the 30 minute Inauguration video as well as identified the messing up of the oath as something which was conversationally interesting. The latter will not result as a salient term using a conventional vector-space approach.

Feel free to try out the demo and say be sure to say hi if you’re at CSCW. Look for me in the Horizons and Demo programs. If you can’t find me, look for Naaman who has a good line of sight to spot people in the crowd.

Milgram to TagMaps like Lynch to Flickr Alpha Shapes

December 24th, 2009 by naaman

After we came up with Tag Maps at Yahoo! Research Berkeley, Morgan Ames (then one of our star interns) pointed out the surprising similarities to a study that was done 30 years earlier, by Stanley Milgram, the famous social psychologist. In his study, Milgram asked 30+ participants to list names of attraction in Paris. He then visualized these on a map, in a size according to the number of times each was mentioned. Here are the automatically-created, Flickr-based TagMap of Paris (based on geotagged photos taken in that area), and the same exact area as represented by Milgram’s visualization.

tagmaps paris

milgram paris

I have been showing both these images in my talk for a while now — can’t seem to get sick of them, even if my audience might just be…

I’ve also been talking for a while on how we can use the aggregate contributions on Flickr to mark boundaries of geographical objects, such as, say, neighborhoods, using all the photos tagged with a neighborhood name. Talk is cheap, but the smart people at Flickr not only figured out how to do it (with slightly different data than tags) but also released the data and the source code for anyone to use. Blame Aaron Cope and Rev Dan Catt.

Well, here’s the thing: turns our a famous scholar also beat Flickr to it, some 40 years ago. Kevin Lynch, in his groundbreaking essay/book The Image of the City, collected people’s descriptions and hand-drawn maps of three cities (Boston shown here, also Jersey City and downtown LA). In one study, he extracted the “maximum boundaries” for each neighborhood as drawn by all the interviewees, and plotted them on the map.

Here are the automatically-created, Flickr-based map of Boston Neighborhoods, visualized using the excellent Tom Taylor’s Neighborhood Boundaries, and Lynch’s maximum boundaries of neighborhoods in the same area.

Neighborhood Boundaries for Boston

Lynch's neighborhood boundaries

I have pre-ordered Milgram’s book of essays, to arrive in February. Might as well find out what’s there before we re-invent something else!