Projects
Contents
- Entities (Spring 2018–present)
- Alice (Spring 2017–present)
- CodeAnnotator (Spring 2015–present)
- Older projects
EntiTies
Spring 2018–presentEntiTies is a web application that allows a user extract entity networks from a text file (e.g., character networks from a novel) using one or more workflows: manual, semi-automated, or fully automated annotation. In the manual workflow, the user annotates entity mentions throughout the text and ties (relationships) between them (such as when the entities speak to one another or interact). In the fully automated workflow, the user chooses from a list of algorithms to automatically identify entities and their mentions throughout the text and the relationships between them. In the semi-automated workflow, users run the fully automated algorithm, and then correct the annotations. Other features of EntiTies include: a study interface built in to assist with user studies and dataset creation; the ability to fork annotations; private and public sharing of texts and specific annotations of a text.
In all cases, a list of entities is listed to one side of the annotated text and a network diagram, which is adjusted as the annotations are modified, is shown on the other side.
EntiTies currently offers a single algorithm for annotating entity mentions that is geared toward novels (via the BookNLP library). It also offers a single relationship annotation algorithm that uses a naive sliding window approach (two entity mentions that occur within n words of each other are deemed to be associated with each other).
The core of EntiTies backend is written in PHP, with the automated annotation and text processing algorithms written in Java. The frontend is written in JavaScript with heavy use of jQuery and d3.js for the network diagrams. Data is stored in a PostgreSQL database and the file system (Linux).
For students who are interested, there are lots of things to work on:
System implementation
- implementing state of the art entity and relationship identification algorithms from the literature
- improving UI/UX during manual and semi-automated annotation workflows
- improving the network diagram
- handling fork histories
- beta testing
- creating tests (unit, integration, UI, etc.)
- refactoring code
- expanding to other use cases beyond books
- creating dashboards of use (basic usage + detailed analysis of user studies)
Research
- improving state of the art entity and relationship identification algorithms from the literature
- creating test sets
- designing/running/analyzing user studies
- literature review
This is joint work with Phil Lombardo and inspired by the scholarship interests of Sam Alexander.
Students
- Tim Amello ('21)
Resources
Publications/talks/publicity
- "EntiTies: An Interface for Annotating Ties between Entities in Text" (CHIIR'20 demo), March 2020
- "A Sabbatical with Little Rest", part of the "2019: A Technology-Forward Year for the School of Arts & Sciences" recap on Endicott's website (December, 2019)
- Endicott Faculty Forum slides about Fall 2019 Sabbatical work, March 11, 2020 (pdf)
- Invited talk at the University of New Hampshire Computer Science Department, December 6, 2018, Durham, NH (pdf)
Alice
Spring 2017–presentAlice is a website built with Ruby-on-Rails for tracking datasets, anlayses, software that supports those analyses, and assignments that use all three. It was created as part of a two-year Davis Foundation grant awarded to Endicott College in 2016 and carried out between summer 2017 and spring 2019.
For students who are interested, here are the current priorities:
System implementation
- test and merge pending UI changes
- analyze UX and identify ways to improve it
- add features to allow some content to only be shown to logged in users with certain email domains
- automate configuration and deployment (e.g., with Docker)
Students
- Ross Theroux ('18)
- Makennah Henry ('21)
Resources
CodeAnnotator
Spring 2015–presentCodeAnnotator is a web application that was motivated by my need to provide feedback to students on code in a digital format. It began as a class project in the first run of CSC270 Software Engineering in 2015. Built with Ruby on Rails, CodeAnnotator allows users to upload code organized in projects, mark up that code by adding comments and alternative code snippets, and share projects with other users (for viewing, annotating, or managing) or publically (view only). I've used CodeAnnotator in at least one class most semesters since creating it. The nice thing with the public sharing is that I can share links with students without requiring them to sign up
For students who are interested, here are the current priorities:
System implementation
- analyze UX and identify ways to improve it
- automate configuration and deployment (e.g., with Docker)
- add support for Markdown in comments
- add support for comment lists (a way of saving and using common comments)
- improve code-comment alignment
Resources
- CodeAnnotator overview (video)
- CodeAnnotator homepage
- CodeAnnotator Github page
- Mock Senior Thesis (used for teaching Senior Thesis I & II, but gives a nice overview)