Skip to content

Blog

I'm in the process of re-uploading my old blog posts one by one. Apologies for any disruptions along the way.



Malayalam Pronominal Anaphoric Coreference Resolution

I recently dusted off my undergraduate degree project from 2020 and put it on GitHub with a proper DOI. This project won Best Paper at ABACon'20, but I never formally published it beyond the conference presentation. Five years later, it felt important to preserve this work because it was my first research project, and working on Malayalam NLP in 2020 meant operating in a profoundly low-resource environment. The challenges I faced as a beginner researcher mirrored the challenges facing the entire field: limited tools, no benchmark datasets, and little prior work to build on.

The Research Question

In early 2020, I began exploring coreference resolution as the focus of my undergraduate final project. Under the advisement of Prof. Mathews Abraham, I surveyed potential benchmarks and methodos available for application. Given the limited resources for Malayalam NLP at the time, I settled on Hobbs' Algorithm as a starting point—a classic rule-based approach for pronominal anaphora resolution originally designed for English. The central research question became: could this algorithm be successfully adapted for Malayalam text?

Coreference resolution addresses a fundamental challenge in natural language understanding: identifying when different expressions in a text refer to the same entity. Consider the sentence pair "The cat sat on the mat. It was sleeping." Human readers immediately recognize that "it" refers to "the cat," but teaching computers to make these inferences was a non-trivial computational problem in 2020.

The Malayalam Challenge

Malayalam presents significant structural differences that distinguish it from English beyond orthographic variation. As a Dravidian language, Malayalam exhibits rich morphological inflection, relatively free word order, and agglutinative properties. The resource constraints in 2020 complicated these linguistic challenges. Malayalam lacked annotated coreference corpora, benchmark datasets, and mature computational tools. The available infrastructure consisted primarily of a small number of research papers and limited open-source libraries supporting basic tokenization and part-of-speech tagging.

Implementation

I adapted Hobbs' algorithm using the available Malayalam NLP infrastructure, specifically leveraging Anoop Kunchukuttan's Indic NLP Library for morphological analysis and Devadath's shallow parser for syntactic processing.

Usage
from MalayalamCorefResolver import MalayalamCorefResolver

resolver = MalayalamCorefResolver()
text = "പൂച്ച മേശയ്‌ക്ക് മുകളിൽ ഇരിക്കുന്നു. അത് ഉറങ്ങുന്നു."
result = resolver.find_coref(text)

Document: പൂച്ച മേശയ്‌ക്ക് മുകളിൽ ഇരിക്കുന്നു. അത് ഉറങ്ങുന്നു.
In Sentence 2: 'അത്' → ['പൂച്ച']
Results

When tested on a sample of sentences from Wikipedia, the system achieved 65% accuracy. I presented this work at ABACon'20, a national conference on innovations in computing organized by Sahrdaya College of Engineering and Technology, where it received the Best Paper Award.

Closing Notes

While my research has since moved on to other very interesting questions in English NLP, Malayalam remains close to my heart—both as my native language and as a field with rich linguistic complexity and the exciting opportunity to build from the early days of development. The challenges of Malayalam NLP continue to allure me.

Code and Citation

The code is archived at Zenodo and available on GitHub.

If you use this work, please cite:

@software{enfa_fane_2025_17508334,
  author       = {Enfa Fane and
                  Abraham, Mathews},
  title        = {beingenfa/malayalam-coreference-hobbs: public
                   archive
                  },
  month        = nov,
  year         = 2025,
  publisher    = {Zenodo},
  version      = {1.0},
  doi          = {10.5281/zenodo.17508334},
  url          = {https://doi.org/10.5281/zenodo.17508334},
}

My Experience as a GPSC Grants Reviewer (And What It Taught Me)

GPSC Grants are one of my most recommended resources on campus for graduate students. I've had the privilege of being part of GPSC in an employment capacity and as an awardee for their travel grant. The latter supported my attendance at NAACL 2025. I'd wanted to volunteer as a reviewer for the longest time to give back to this service, and I finally received the opportunity this October (2025).

Training as a Reviewer

The grants team has a very detailed and thorough training module, and I especially appreciated the sample exercise they included. If the training video is available to students, I highly recommend watching it and reading the rubric before you apply. I especially appreciated the training's heads-up about potential cognitive and emotional biases, since we reviewers are graduate students ourselves. It's a good reminder to stay objective and focus on the rubric.

Grants Reviewer Training Module Screenshot
Screenshot from the GPSC Grants Reviewer training module

My Recommendations After Reviewing

I was assigned four reviews this cycle, and I took on 3 emergency reviews as well. All grants I reviewed were travel grants, which isn't surprising given how competitive they are. Outside the obvious advice, here are some of my specific takeaways and recommendations after reviewing.

  • Align your proposal directly to the rubric. Details, details, details—the easier it is to see that you match the criteria, the higher your chances of receiving the grant. Force yourself to map your narrative to the rubric items and use the provided examples as inspiration.
  • Organization matters. Remember that long paragraphs are hard to read. Organize your information clearly for flow and readability. Remove repetitive or unnecessary background context and focus on the main substance of your proposal.
  • Space allocation should match the rubric, not your personal priorities. While certain aspects of your research may be extremely important to you personally, distribute space based on what the rubric weighs most heavily. Mention what matters to you, but prioritize what matters to the grant.
  • Specificity is key. Generic information and filler language work against you. If you've already attended the conference, get as specific as possible: name the people you networked with, describe how connections have impacted your work (even in small ways), mention specific workshops or talks you attended. Avoid technical jargon that reviewers may not understand and take up valuable space without adding value for the review.
  • Honesty is critical. Anything that appears as an attempt to mislead the reviewer will sour the entire review, despite any reviewer's best attempts to remain objective.
  • Budget justification needs to be proactive and specific. Demonstrate multiple attempts to secure outside funding beyond easily accessible sources. If you're choosing more expensive options, justify the decision with specifics for example, the conference ends rather late and there is no public transportation at that hour. Demonstrate cost-consciousness in your choices, and include all relevant expenses you are paying out of pocket to strengthen your case for financial need.

Closing notes

Overall, this was a rewarding experience. One unexpected joy of reading the travel grant proposals was learning about different people's research and the recognition their work is receiving. Filled me with a quiet joy :)

United States Supreme Court hearing on Content Moderation in Social Media Platforms

Disclaimer : This blog is my simplified understanding of the cases

Today - February 26th, 2024 - The United States Supreme Court is set to hear Moody v. NetChoice and NetChoice v. Paxton, cases that spotlight content moderation on social media platforms and free speech in the US.

How did we get here?

The series of events that led to this hearing started in 2021 when Facebook, Twitter, and YouTube barred President Donald Trump from their platforms following the Jan 6th attack at the Capitol. Following this, the state of Florida prohibited large internet platforms from banning a candidate for office or a journalistic enterprise from their sites. Florida's brief says that this imposes neutrality provisions, hosting provisions, and disclosure obligations on these large internet platforms.

The state of Texas later passed a law prohibiting platforms from taking down any political content discriminating against any viewpoints. It also asked social media companies to explain when and why they moderate content.

Following this, two Tech industry groups, NetChoice and the Computer & Communication Industry Association, sued to block the laws from taking effect.

Two Sides

On one side, proponents of the Texas and Florida rules argue that content-moderation actions of these companies fall outside the protection of the First Amendment of the United States constitution because their action is censorship, not speech. Supporters also argue that social media companies are silencing viewpoints they disagree with, which is wrong and dangerous. They also want these platforms to be classified as common carriers - a special class of business that dominates a market provides a public service, and is regulated by the local and state governments.

Conversely, the tech industry groups argue that under the First Amendment, companies have the right to decide about their platforms and that the government shouldn't force them to carry content they don't want to. Supporters argue for moderation since,without any moderation, these platforms are likely to circulate hateful content, such as Neo-Nazi posts or misinformation. These have far-reaching consequences both for the individual and society as a whole.

Conclusion

In conclusion, the Moody versus NetChoice and NetChoice versus Paxton case is significant as it can set a precedent for the future of content moderation on social media platforms in the United States. It will be interesting to see how the Supreme Court rules on this case and what it means for the tech industry, social media users, and the future of free speech in the digital age.

Update : Feb 12, 2025

  • The Supreme Court rejected the states' claim that social media content moderation has no First Amendment implications and can be state-regulated for "viewpoint neutrality."
  • The Court affirmed that platforms, like traditional publishers (newspapers, bookstores), have First Amendment rights to choose which content to host and how to organize it.
  • Traditional First Amendment rules apply to these platforms' content moderation practices.
  • The Court left further analysis to the lower courts, instructing them to examine how specific laws apply to individual platforms based on their unique characteristics.

Sources