Archive for January, 2009

NIN, Look here!

Saturday, January 24th, 2009

Ayman has disappeared so it seems like there’s nothing to stop Naaman from blabbering some more. And this time: yet-yet-another-another things-that-happened-to-be-on-my-browser-at-the-same-time. The difference is that they are even more related this time around…

First of all, congratulations to Mr. Kennedy! and me, Naaman! Yay for our recently-accepted WWW’09 paper, awesomely named “Less Talk, More Rock: Automated Organization of Community-Contributed Collections of Concert Videos” (Lyndon is a Rock Star). Just to make clear, that latter part about Lyndon is not part of the title, but its a fact nonetheless.

I will write more about the paper soon (and upload the paper as well), but here is part of the abstract:

We describe a system for synchronization and organization of user-contributed content from live music events. We start with a set of short video clips taken at a single event by multiple contributors, who were using a varied set of capture devices. Using audio fingerprints, we synchronize these clips such that overlapping clips can be displayed simultaneously. Furthermore, we use the timing and link structure generated by the synchronization algorithm to improve the findability and representation of the event content…

In other words, this work builds on social multimedia, those videos and photos that everybody now takes, and some share online, when they go to music shows, concerts, and any other public or private event. Like at an inauguration:

Capturing the moment
AP Photo/J. Scott Applewhite*

Yes, that’s the perfect photo to demonstrate the “Everybody capturing content” idea. And I just discovered it today.

In our paper, as the abstracts hints, we show how we can take those captures (in this case, videos) and use their audio track to synchronize them, creating a much-improved presentation and also improving on the metadata and organization of the content. In short, a perfect technology for Nine Inch Nails for use in their newly-launched (or re-launched?) website devoted to content from fans, from their years of concerts, which I re-discovered today. 10 videos from the each concert? In 5 years, you will have 100s. We’ll take care of them. How? I promised Sagee I will give him the details… coming soon.

* Photo reproduced in thumbnail size to maintain fair use and will be removed on demand; via the awesome Boston.com Big Picture)


Lessig, Times, Colbert

Tuesday, January 13th, 2009

Here’s another post in the series “things that are related mostly because they were on Naaman’s browser tabs at the same time”.

First tab: I do not agree with Ben the Practicalist saying that Lessig did a good job when he was talking with Stephen Colbert about the hybrid information economy, aka “read/write culture” or “remix culture”. I personally think that Lessig’s strategy of handling Colbert’s musing was not effective; the major points did not come through. Still, maybe more people will actually buy the book (I will).

The Colbert ReportMon – Thurs 11:30pm / 10:30c

So, which other tabs were open on my browser? As is often the case, tabs the reflected the move to new information economy:

  • An 1989 article from the New York Times about Ehud Banai was referenced in a documentary I was watching last week. Ehud is my favorite Israel artist, and I looked up the article (a review of one of Ehud’s early shows on his first US tour) when I got back home. The amazing thing here is the Times opening their full archive for free access. Probably one of the most comprehensive collections of information, available for remix (and linking) for free. The Times get it (still at the Times, Nick dug a couple of really old articles – I mean really old – for a story about the death of print for the Radar)
  • The excellent Hype Machine, already aggregating mp3 content from blogs around the world, has an also-excellent Top 10 album list – including full album listen for each. Whoever thinks this is bad for music and for the bands, didn’t read Lessig recently.

What am I saying? That we are already in a Lessig economy. More socialist, as Ben hopes? I don’t know yet.

Naaman Editing Wikipedia

Thursday, January 8th, 2009

As part of my class I am going to have my student edit a Wikipedia page of their choice. In preparation, I decided to do so myself – for the first time ever.

Why did I deserve this, then?

MySQL/php

Ouch!

It happened when I tried to preview my first edit ever (no, it was not Ayman’s Wikipedia entry). I am not sure how many of my students will have the nerves to handle this kind of error messages! (OK, I did go back and click “preview” again, and then it worked).

Otherwise, editing was kind of ok. Took me a little bit to get the linking/link text model (simple) and understand the references format – both of which will not be trivial to a non-CS or not experienced person, I am afraid. Well, let’s see how my students end up doing (and which edits they choose to do!)[1].

[1] I would have told you mine but I’d like to keep my editing persona private for now. Or can someone figure it out otherwise?

Teaching retrieval of information: What do you leave out?

Sunday, January 4th, 2009

Hello, academia! One of Naaman’s new responsibilities is, of course, teaching. In the long run, the teaching load will be comprised of courses that are driven by my own research interest (Social Media class? Mobile Information?), as well as core courses in information science. I had tons of fun teaching Research Methods to a great group of MLIS students last semester. This spring, I will be teaching Retrieving And Evaluating Electronic Information to undergrads:

“In this course, students examine and analyze the information retrieval process in order to more effectively conduct electronic searches, assess search results, and use information for informed decision making. Major topics include search engine technology, human information behavior, evaluation of information quality, and economic and cultural factors that affect the availability and reliability of electronic information.”

Now there’s a topic that can had launched a thousand PhD theses… how do I pack it into one semester? As I see it, the class should be a combination of “how to” and “how it works”: both understanding how the technologies work and how to use it best (these are of course interrelated).

For now, I have the class set up in the following way (with thanks to Marie, Nina and Nick who taught this class before me):

First I will spend time discussion the basics of how to search. Starting from the very basic how to choose/iterate on keywords, through boolean operators and advanced search functions. Then, I will spend a few sessions talking about search technology, or how search engines work (you know, crawling/indexing/ranking). I believe that everyone should have an understanding of how search works in order to realize the bias and limitations inherent in the process. In the middle I will discuss the presentation of search results as well as the topic of browsers.

All this will take me 8-9 sessions (1.15 hours each) out of a total of about 25.

Then, beyond the generic search, there are other sources of retrieving information, even on the web. Directories, reference sites (e.g., dictionaries but many more) and business databases. Of course, specialized databases like, say, academic libraries and other digital libraries play a major role in this world, especially for university undergrads.

This concludes the very basic “what you need to know” about retrieving information. And about half the class sessions. But we’re only getting warmed up. Here are a dump of additional topics I am planning to cover: news, breaking information and tracking topics (alerts and feeds); Web Reference Tools (from Wikipedia to Yahoo Answers), which of course leads to the topic of information reliability; publishing information on the Web; economics of information (here’s another topic that can last a few semesters); legal aspects of information use (e.g., copyright issues, Creative Commons); bookmarking and knowledge collections; social media and blogs, and Multimedia search (of course).

This, together with student presentations and exams, will pretty much conclude the class. But there’s so much else one can cover… here’s what I left out for now: ethical and cultural aspect of information; information overload; mobile information retrieval; the semantic web (ok, “semantics on the web” maybe a better title); personal information management (e.g., Stuff I’ve Seen); non-text retrieval (e.g. location-driven information); the hidden web; Web of Data; phew!

Now, our undergraduate program at SCILS offers classes that touch in depth on many of these issues. But can I possibly leave these topics out from any basic retrieval of information class? Doesn’t everyone need to know about these? Is there anything else I left out that must be covered?

Whatever form the class takes, I am excited. Mostly, I am curious to see what undergrads these days know about search, and how their perceptions can change in 14 short weeks.