Library of Congress Preserving Our Social Media Legacy

Share on Facebook


The way of all Tweets?
By Steve Zalusky

Social networking is growing activity in libraries. With online access available to virtually anyone who passes through library doors, patrons of all ages can be seen expressing themselves on such social networking sites as Twitter.

But have you ever wondered what happens to all those tweets. Does posterity have a place for them? The answer, ironically enough, came in the form of a tweet from a library, the Library of Congress (LOC).

It read, "Library to acquire ENTIRE Twitter archive -- ALL public tweets, ever, since March 2006! Details to follow."

That tweet arrived in April of 2010.

A more official announcement was delivered in a blog post on the Library of Congress website.

The tweets will be kept among the library's holdings  of historical documents, but will not be unlimited. Excluded will be those tweets that Twitter uses have declared protected or private.

Since tweets are capped at 140 characters, it would not seem on the surface that there is a lot here to store. But, according to the LOC,Twitter processes more than 50 million tweets every day, with the total numbering in the billions. 

The archive contains "billions and billions of tweets" and is "a unique record of our time," he said.

LOC cited a few examples of significant tweets, including  the first-ever tweet from Twitter co-founder Jack Dorsey, President Obama’s tweet about winning the 2008 election and tweets from a photojournalist who was arrested in Egypt but then freed because of a series of events set into motion by his use of Twitter.

This foray into digital preservation is not unique for LOC. Preservation, its blog points out is about more than "just books."

The blog reads, "The Library has been collecting materials from the web since it began harvesting congressional and presidential campaign websites in 2000. Today we hold more than 167 terabytes of web-based information, including legal blogs, websites of candidates for national office, and websites of Members of Congress. We also operate the National Digital Information Infrastructure and Preservation Program, which is pursuing a national strategy to collect, preserve and make available significant digital content, especially information that is created in digital form only, for current and future generations. In other words, if you’re looking for a place where important historical and other information in digital form should be preserved for the long haul, we’re it!"

So where do things stand now?

Recently, Bill Lefurgy, digital initiatives program manager at the library's digital information infrastructure and preservation program, told, "We have an agreement with Twitter where they have a bunch of servers with their historic archive of tweets, everything that was sent out and declared to be public."

Lefurgy spoke further about LOC's mission on Federal Drive with Tom Temin and Amy Morris.

"We were excited to be involved with acquiring the Twitter archives because it's a unique record of our time," Lefurgy said. "It's also a unique way of communication. It's not so much that people are going to be interested in what you or I had for lunch, which some people like to say on Twitter."

According to the article on, researchers will be able data-mine the archive for interesting information.

"There have been studies involved with what are the moods of the public at various times of the day in reaction to certain kinds of news events," Lefurgy said. "There's all these interesting kinds of mixing and matching that can be done using the tweets as a big set of data."

He added,  "It's been difficult at times," Lefurgy said. "But we firmly believe that we have to do this kind of thing because we anticipate that we'll be bringing in large data sets again into the future. We don't know specifically what, but certainly there's no sign of data getting smaller or less complicated or less interesting."

The article also stated that the move to archive tweets fits within the context of a renewed push by the administration and the National Archives and Records Administration for federal agencies to better archive their own social media postings and emails as potential government records.

"We're basically in the same situation as the National Archives, only on a much larger scale," Lefurgy said. "We tend to have a much larger perspective in terms of what we collect."

According to a December article on "The US government has a digital team which reads 5 million tweets per day and scours other social networks to build intelligence reports for President Obama and other key personnel."

It adds, "Twitter is often criticized for its lack of storage, with user history stretching back less than two weeks on the service. At present, Twitter does not have an archiving system but a host of third party service can perform the task instead."

Creative Commons License


Post new comment

  • Web page addresses and e-mail addresses turn into links automatically.
  • Allowed HTML tags: <a> <em> <strong> <cite> <code> <ul> <ol> <li> <dl> <dt> <dd><img>
  • Lines and paragraphs break automatically.

More information about formatting options

This question is for testing whether you are a human visitor and to prevent automated spam submissions.
Enter the characters shown in the image.
Refresh Type the characters you see in this picture.
Type the characters you see in the picture; if you can't read them, submit the form and a new image will be generated. Not case sensitive.  Switch to audio verification.