ACM SIGUCCS 2021 Conference Attendance

I attended ACM SIGUCCS 2021 Annual Conference held between March 15, 2021, to April 30, 2022, virtually. Unfortunately, I didn’t participate until April 7, 2021, because of my schedule. I attended the following series;

Lightning Talk.

In this series, Kevin Chapman presented “Slacking On Organizing 50 Student Staff Remotely” and “Too Much Too Soon vs Too Little Too Late”, Mo Nishiyama presented “Good Migrations: Finding New Home for Support Articles, Done with Minimal Resources” and “Everything Counts: Making a Difference with Inclusive Words”, and Kevin Tschopik presented “Board Games As Relationship Building Tools”

Slacking On: Organizing 50 Student Staff Remotely by Kevin Chapman

As with all IT departments, remote work brought a surge of support needs from our community. Faced with the increased demands on our time, the thought of trying to run the Helpdesk with a reduction or the elimination of our student staff was not a pleasant one. However, we were allowed to employ our staff remotely, providing we had specific plans for the type of work, scheduling of shifts, and in particular communication and tracking.

Fortunately, a number of our existing practices lent themselves to adaptation. With the judicious use of Slack, soft phones, and a drive to maintain existing working relationships, we modified our service approach and student focus such that we seemed to meet the campus needs. We thought that we’d share what changes we made, how we implemented them, and a little of what went well and what might still need some work.

Good Migrations: Finding New Home for Support Articles, Done with Minimal Resources by Mo Nishiyama

Is it possible to plan, schedule, and execute a document migration of over 350 support articles in a six-month timeframe with one dedicated full-time staff and a part-time student worker assigned to the task? And is it possible to do the migration work while encountering a pandemic situation which no one had planned for in our lifetime?

Learn how the Help and How articles at Oregon Health and Science University’s Information Technology Group were migrated from an aging Content Management System (CMS) into a modern platform–all with very minimal resources allocated. In this presentation, we will cover the importance of planning, placing trust in colleagues, adopting a Skunkworks mindset, and staying resilient for ensuring a successful project outcome.

Board Games As Relationship Building Tools by Kevin Tschopik

The modern board game scene has exploded in popularity in recent years. Many games are released every year and have a wide range of size and rules complexity. I have found that having a small supply of these modern games has been an invaluable relationship building tool. As an IT consultant for a mid-sized department I have a open policy to play games over lunch with anyone who requests a game. I keep a curated supply of games that take 15-45 minutes in my office. These games have led to many sessions playing and chatting with the professors and grad students in my department. As a new employee these sessions helped me establish peoples names and responsibilities, but they also have allowed me to form friendships and a welcome decompression time in the middle of the day.

Too Much Too Soon vs Too Little Too Late by Kevin Chapman

When the announcement was made to move to remote learning, schools across the country had to make fast and intense changes to curricula, teaching methods, technologies, and support practices. Carleton College was fortunate in that there were funds available to cover the costs of some of the additional training and new technology needs. Faculty workshops, computers for students, technology aids for faculty, implementation of remote labs, remote work equipment policies for staff, all of which will sound familiar.

While we got a number of things right, there were also some surprises. We initially overestimated the number of machines we would need to loan to students, while a number of students overestimated the capabilities on their own equipment. Our remote lab machines saw less use than expected. More faculty chose to fully flip their classes than expected, while some stuck to a strict lecture format. At the same time, there was a constant outflow of equipment and peripherals throughout the term as our community continued to adjust: webcams, headsets, monitors, iPads, ethernet cables, wireless cards, and a handful of cellular hotspots.

We thought it might be interesting to talk about our hits and our misses, and why we think things landed as they did on our campus. Perhaps more importantly, we’d also like to talk about how we see this informing (or how it informed) our approach to the following term, with a more hybrid remote & in-person demand.

This would be an interactive session, open to audience participation for comparing notes. It could even be a panel discussion if several presenters also wanted to talk about how they dealt with similar demands. This is all predicated on the notion that there will be much discussion of How We Survived COVID-19 in Higher Ed IT.

Everything Counts: Making a Difference with Inclusive Words by Mo Nishiyama

How would you feel if your workplace uses everyday language which triggers, offends, or otherwise makes you feel unwelcome? Do uncomfortable terminology affect the level of trust within your team and colleagues? And do you wonder why we don’t do anything about problematic language in professional workplaces?

In light of anti-racism movement of 2020, many corporations and organizations took action to focus inward and look at systemic and institutionalized racism within their entities. The Information Technology Group at Oregon Health & Science University was no exception, as we took action on oppressive, discriminatory, and exclusionary terminologies which we encountered in IT. Our efforts included going beyond changing the language used in documentation: we also educated our colleagues and implored our vendors to update their documents when we found problematic language.

Summary of findings in my first PhD program contribution: Towards semantic versioning of open pre-trained language model releases on hugging face

Published: 06 March 2025

Journal: Empirical Software Engineering Journal

The naming conventions of Pre-trained Language Models (PTLMs) on Hugging Face (HF) exhibit significant diversity and complexity. Our analysis reveals that the naming convention of PTLMs on HF encompasses 12 segment types, with identifiers (70.8% of model names), base model (39.8%), and size (34.3%) being the most prevalent. Interestingly, PTLMs mentioning only model size in their name have an average download rate more than 11 times higher than those mentioning only the base model. Through manual analysis, we identified 148 distinct naming conventions across HF repositories, highlighting the wide range of model naming strategies adopted by practitioners. Notably, the naming patterns {identifier}{base-model}{size}, {identifier}{size}, and {identifier}{training-mechanism} are associated with the highest download rates among prevalent conventions. However, our findings indicate no significant relationship between the length of model names and download rates.

Despite the diversity in naming conventions, only a small fraction of PTLMs incorporate version information in their names. Of the 52,227 PTLMs analyzed, merely 3,471 (6.64%) include a version segment. Among these, major versioning, using identifiers like v1, v2, etc., is the predominant strategy, reflecting a software development standard. Specifically, 67% of these PTLMs adopt the major versioning approach, indicating that when version information is present, it primarily reflects major versions only. This trend suggests that significant changes are often not communicated through version identifiers, as evidenced by the 1,282,874 changes observed across 52,227 PTLM repositories, with only 3,471 explicitly indicating changes through versioned model names.

Changes within PTLM repositories are frequent, particularly in model weight files, where 524,419 changes were recorded across the analyzed models. Security-focused tensor files are the most commonly utilized ML framework for storing model weights on HF, averaging 7.88 changes per model. Additionally, 98% of the studied PTLMs include configuration files that detail the base models adapted for PTLMs. Since 2022, the 52,227 PTLM variant releases on HF have all originated from just 299 base models. Among these, the top 15 base models account for 85.80% of releases, with Llama, Bert, and Mistral being the most frequently adapted. Notably, although Llama has seen the most rapid growth, the prevalence of Gemma and Mistral indicates that they are evolving more rapidly when considering their relative age.

Model names and tags on HF exhibit varied indications of variant types, though a substantial number of releases lack explicit variant type information. Specifically, only 12.76% of models indicate variant types in their names, and just 5.63% do so in their tags, leaving 70.72% of PTLM releases without clear variant type specification. Our analysis identified 14 distinct variant type indicators, categorized into Fine-tuning, Deduplication, Quantization, and Knowledge Distillation. Quantized PTLM releases constitute approximately 69.3%, while Fine-tuned PTLM releases account for around 29.6% of the 15,287 PTLM releases on HF.

Training dataset information remains inconsistently documented. Our automatic method for identifying PTLMs with training dataset metadata revealed that 33,964 (65%) out of 52,227 PTLMs have dataset metadata. However, only 24% (8,157) explicitly state their training datasets, while a further manual analysis of model cards indicated that just 12% of the remaining 25,807 PTLMs mention the dataset names, and only 2% provide dataset links. Notably, models resulting from deduplication (24%), fine-tuning (22.3%), and quantization (16.4%) rarely include training datasets, while knowledge distillation has a slightly higher inclusion rate (32.9%).

Moreover, model card documentation is insufficient, with 33% of the 52,227 PTLMs lacking model cards, impeding users’ ability to understand and responsibly utilize the models. Among variant types, deduped models most frequently include model cards (82.7%), followed by Quantized (74.5%) and Distilled models (74.1%), while Fine-tuned models have the lowest representation (68.6%). Our findings also indicate that model cards are subject to more frequent changes between major versions (71%) compared to other components, such as base models (13%). Even when comparing minor version updates, model cards remain the most frequently altered attribute (80%).

When examining version changes, our investigation revealed that major version updates generally encompass a broader range of modifications, with an average of 28 unique changes per release, compared to 8 changes for minor updates. Although there is no statistically significant difference in the prevalence of changes between major and minor versions, specific differences are notable in configuration, licensing, and other associated attributes. Analyzing change patterns further, we observed one very strong association, four strong associations, and three moderate associations between pairs of change categories in PTLMs. This nuanced understanding of versioning and change patterns highlights the challenges practitioners face in maintaining consistency and clarity in version documentation and model management on HF.

Conference Attendance

I attended the 2025 International Conference on Software Engineering (ICSE), held from April 27 to May 3, 2025, in Ottawa, Canada.

Adekunle A. Ajibode

ACM SIGUCCS 2021 Conference Attendance

ACM SIGUCCS 2021 Conference Attendance

Lightning Talk.

Slacking On: Organizing 50 Student Staff Remotely by Kevin Chapman

Good Migrations: Finding New Home for Support Articles, Done with Minimal Resources by Mo Nishiyama

Board Games As Relationship Building Tools by Kevin Tschopik

Too Much Too Soon vs Too Little Too Late by Kevin Chapman

Everything Counts: Making a Difference with Inclusive Words by Mo Nishiyama

Summary of findings in my first PhD program contribution: Towards semantic versioning of open pre-trained language model releases on hugging face

Published: 06 March 2025

Journal: Empirical Software Engineering Journal

Conference Attendance