Research projects

Here you'll find some brief summaries of the research I have assisted with. It's usually not up-to-date...

Projects

CrowdLogger 2.0 (2012–2015)
CrowdLogging: Distributed, private, and anonymous Search Logging (2010–2012)
Video retrieval (2012)
Search task assistant (2012)
Microsoft Research Internship (2012)
How users search for books (2012)
Book retrieval (2011)
Investigating Searcher Frustration (2009–2010)
Popularity Ranking using Query Logs (2009)
Detecting Task Boundaries in Domain-Specific Query Logs (2008)
Spell Correction using Domain-Specific Query Logs (2008)
Query Log Mining (2008)
Retrieving Document "Hot Spots" (2007–2008)
Testing of Text Retrieval Test Collection Generation Algorithms (2007)
Qauality Assessment using Language Processing (QALP) (2004–2007)
Slicing Research (2004)

CrowdLogger 2.0

Fall 2011–2015

Note: I've stopped maintaining CrowdLogger. Feel free to fork the CrwodLogger GitHub project.

We've been working to extend CrowdLogger into a platform for conducting Web interaction studies. CrowdLogger, available as a Chrome and Firefox extension, logs users' interactions with their browser and stores the data on users' computers—nothing is uploaded. Researchers can develop extensions to CrowdLogger, called apps (or CrowdLogger Remote Modules—CLRMS), which study participants can install within CrowdLogger. These apps can interact with a rich API, giving them access to user data and other useful resources, and they can, pending consent of the user, upload data to a server. The privacy mechanisms that were part of the original CrowdLogger (see below) are still there, but are now available as an API function.

Development of this new CrowdLogger has gone pretty well, thought not without its rough spots. As the codebase grows in complexity, we find that things break more often than before. In addition, Google has announced that Windows Chrome users will no longer be able to install extensions from outside of the Chrome Web store. We are thinking of some alternative ways to make CrowdLogger more available and more robust. If you have any thoughts or would like to contribute, shoot me an email!

If you want to give developing apps for CrowdLogger a try, see this tutorial.

Resources

Publications/Talks

H. Feild. "CrowdLogger: A platform for conducting remote web interaction studies," Invited talk at the University of North Carolina Chapel Hill, November 16, 2013, Chapel Hill, NC. [slides (pptx)] [slides (pdf)]
H. Feild and J. Allan. "Using CrowdLogger for in situ information retrieval system evaluation," In the proceedings of the 2013 CIKM Workshop on Living Labs for Information Retrieval Evaluation (LivingLab'13), San Francisco, CA, p. 15–18, 2013. [ACM w/ free pdf] [code] [UMass CrowdLogger page] [slides (pptx)] [slides (pdf)]
H. Feild. "CrowdLogger as a Community Platform for Searcher Behavior Experiments," DC-area Information Retrieval Experts (DIRE) workshop, 2012. [pptx] [pdf] [abstract]

CrowdLogging: Distributed, private, and anonymous Search Logging

Spring 2010–Fall 2011

This joint work with James Allan and Joshua Glatt focuses on a system for collecting search logs in a distributed manner—storing the data on users' computers rather than in a central database. This gives each user full control over his or her search data. Mining jobs, or experiments as we call them, can be run on each user's computer over their data. Experiments produce search artifacts—pieces of information, which can be something like a query or a query pair. These artifacts are then encrypted and uploaded to a server via a bank of anonymizers. However, they encryption is special; it uses a secret sharing scheme, which only allows a particular artifact to be decrypted by the server if it has been uploaded by at least k different users. Using a secret sharing scheme is not new for search logging; Eytan Adar suggested using such a scheme in this paper. On the server side, we can aggregate the encrypted artifacts and decrypt the ones that have sufficient support (that is, the artifact has been uploaded by at least k distinct users) and then get counts for each unique artifact.

You can see a demo of a Javascript implementation of Shamir's Secret Sharing Scheme in action here. You can download it here.

Details about this system can be found in our SIGIR 2011 paper, CrowdLogging: Distributed, private, and anonymous search logging. We have implemented CrowdLogging into a system called CrowdLogger. The live version of CrowdLogger can be found here ; the open source code can be found here; and a Google group can be found here.

Publications

H. Feild, J. Allan, and J. Glatt. "CrowdLogging: Distributed, Private, and Anonymous Search Logging," In the proceedings of the 34th International ACM Conference on Research and Development in Information Retrieval (SIGIR'11), July 24–28, 2011, Beijing, China. [ACM w/ free pdf] [slides (pdf)] [slides (pptx)]

Research projects

Projects

CrowdLogger 2.0

Resources

Publications/Talks

CrowdLogging: Distributed, private, and anonymous Search Logging

Publications

Video retrieval

Search Task Assistant

Publications

Microsoft Research Internship

Publications

How users search for books

Publications

Book Retrieval

Publications

Investigating Searcher Frustration

Publications

Popularity Ranking using Query Logs

Detecting Task Boundaries and Tasks in Query Logs

References

Spell Correction using Medical Domain Query Logs

References

Query Log Mining

Retrieving Document "Hot Spots" (a.k.a. Passage Retrieval)

References

Testing of Text Retrieval Test Collection Generation Algorithms

Quality Assessment using Language Processing (QALP)

QALP Scores I

Identifier Splitting

Sources

Identifier Makeup Study

QALP Scores II

Slicing Research