ace news
our team
Join us
Please do NOT email Prof. Chattopadhyay directly. She would be unable to respond promptly owing to the high volume of emails she receives.
Recent Publications
- Beyond the Page: Enriching Academic Paper Reading with Social Media Discussions
π ACM Symposium on User Interface Software and Technology
Abstract: Researchers actively engage in informal discussions about academic papers on social media. They share insights, promote papers, and discuss emerging ideas in an engaging and accessible way. Yet, this rich source of scholarly discourse is often isolated from the paper reading process and remains underutilized. A natural question thus arises: What if we bring these peer discussions on social media into the reading experience? What might be the benefits of reading research papers alongside informal social insights? To explore the design space of such integration, we conducted a formative study with eight researchers. Participants recognized the value of social media in expanding their perspectives and connecting with fellow researchers. However, they also reported significant distraction and cognitive overload when confronted with streams of noisy, unstructured social media comments. Guided by the design goals derived from their feedback, we introduce SURF, a novel reading interface that enriches academic papers with Social Understanding of Research Findings. SURF organizes social media clutter into digestible threads and presents them contextually within the paper, allowing readers to seamlessly access peer insights without disrupting their reading process. In a within-subjects usability study (N=18), participants achieved significantly deeper comprehension and higher self-efficacy with SURF, while reporting lower cognitive load. They also noted SURF's various benefits beyond paper reading, such as facilitating literature review and fostering social engagement within the academic community. Some participants envisioned SURF and academic social media as a potential supplement to the traditional peerβreview process. - Exploring the Challenges and Opportunities of AI-assisted Codebase Generation
π IEEE Symposium on Visual Languages and Human-Centric Computing
Abstract: - ELI-Why: Evaluating the Pedagogical Utility of Language Model Explanations
π Annual Meeting of the Association for Computational Linguistics
Abstract: Language models today are widely used in education, yet their ability to tailor responses for learners with varied informational needs and knowledge backgrounds remains under-explored. To this end, we introduce ELI-Why, a benchmark of 13.4K 'Why' questions to evaluate the pedagogical capabilities of language models. We then conduct two extensive human studies to assess the utility of language model-generated explanatory answers (explanations) on our benchmark, tailored to three distinct educational grades: elementary, high-school and graduate school. In our first study, human raters assume the role of an 'educator' to assess model explanations' fit to different educational grades. We find that GPT-4-generated explanations match their intended educational background only 50% of the time, compared to 79% for lay human-curated explanations. In our second study, human raters assume the role of a learner to assess if an explanation fits their own informational needs. Across all educational backgrounds, users deemed GPT-4-generated explanations 20% less suited on average to their informational needs, when compared to explanations curated by lay people. Additionally, automated evaluation metrics reveal that explanations generated across different language model families for different informational needs remain indistinguishable in their grade-level, limiting their pedagogical effectiveness - Code Today, Deadline Tomorrow: Procrastination Among Software Developers
π IEEE/ACM International Conference on Software Engineering
Abstract: Procrastination, the action of delaying or postponing something, is a well-known phenomenon that is relatable to all. While it has been studied in academic settings, little is known about why software developers procrastinate. How does it affect their work? How can developers manage procrastination? This paper presents the first investigation of procrastination among developers. We conduct an interview study with (n=15) developers across different industries to understand the process of procrastination. Using qualitative coding, we report the positive and negative effects of procrastination and factors that triggered procrastination, as perceived by participants. We validate our findings using member checking. Our results reveal 14 negative effects of procrastination on developer productivity. However, participants also reported eight positive effects, four impacting their satisfaction. We also found that participants reported three categories of factors that trigger procrastination: task-related, personal, and external. Finally, we present 19 techniques reported by our participants and studies in other domains that can help developers mitigate the impacts of procrastination. These techniques focus on raising awareness and task focus, help with task planning, and provide pathways to generate team support as a mitigation means. Based on these findings, we discuss interventions for developers and recommendations for tool building to reduce procrastination. Our paper shows that procrastination has unique effects and factors among developers compared to other populations. - Trust Dynamics in AI-Assisted Development: Definitions, Factors, and Implications
π IEEE/ACM International Conference on Software Engineering
Abstract: Software developers increasingly rely on AI code generation utilities. To ensure that βgoodβ code is accepted into the code base and βbadβ code is rejected, developers must know when to trust an AI suggestion. Understanding how developers build this intuition is crucial to enhancing developer-AI collaborative programming. In this paper, we seek to understand how developers (1) define and (2) evaluate the trustworthiness of a code suggestion and (3) how trust evolves when using AI code assistants. To answer these questions, we conducted a mixed-method study consisting of an in-depth exploratory survey with (n=29) developers followed by an observation study (n=10). We found that comprehensibility and perceived correctness were the most frequently used factors to evaluate code suggestion trustworthiness. However, the gap in developersβ definition and evaluation of trust points to a lack of support for evaluating trustworthy code in real-time. We also found that developers often alter their trust decisions, keeping only 52% of original suggestions. Based on these findings, we extracted four guidelines to enhance developer-AI interactions. We validated the guidelines through a survey with (n=7) domain experts and survey members (n=8). We discuss the validated guidelines, how to apply them, and tools to help adopt them. - Generating Function Names to Improve Comprehension of Synthesized Programs
π IEEE Symposium on Visual Languages and Human-Centric Computing
Abstract: Despite great advances in program synthesis techniques, they remain algorithmic black boxes. Although they guarantee that when synthesis is successful, the implementation satisfies the specification, they provide no additional information regarding how the implementation works or the manner in which the specification is realized. One possibility to answer these questions is to use large language models (LLMs) to construct human-readable explanations. Unfortunately, experiments reveal that LLMs frequently produce nonsensical or misleading explanations when applied to the unidiomatic code produced by program synthesizers. In this paper, we develop an approach to reliably augment the implementation with explanatory names. We recover fine-grained input-output data from the synthesis algorithm to enhance the prompt supplied to the LLM, and use a combination of a program verifier and a second language model to validate the proposed explanations before presenting them to the user. Together, these techniques massively improve the accuracy of the proposed names, from 24% to 79% respectively. Through a pair of small user studies, we find that users significantly prefer the explanations produced by our technique (76% of responses indicating the appropriateness of the presenting names) to the baseline (with only 2% of responses approving of the suggestions), and that the proposed names measurably help users in understanding the synthesized implementation. - Generating Contextually-Relevant Navigation Instructions for Blind and Low Vision People
π IEEE International Symposium on Robot and Human Interactive Communication
Abstract: Navigating unfamiliar environments presents significant challenges for blind and low-vision (BLV) individuals. In this work, we construct a dataset of images and goals across different scenarios such as searching through kitchens or navigating outdoors. We then investigate how grounded instruction generation methods can provide contextually-relevant navigational guidance to users in these instances. Through a sighted user study, we demonstrate that large pretrained language models can produce correct and useful instructions perceived as beneficial for BLV users. We also conduct a survey and interview with 4 BLV users and observe useful insights on preferences for different instructions based on the scenario. - A Tale of Two Communities: Exploring Academic References on Stack Overflow
π The ACM Web Conference
Abstract: Stack Overflow is widely recognized by software practitioners as the go-to resource for addressing technical issues and sharing practical solutions. While not typically seen as a scholarly forum, users on Stack Overflow commonly refer to academic sources in their discussions. Yet, little is known about these referenced academic works and how they intersect the needs and interests of the Stack Overflow community. To bridge this gap, we conducted an exploratory large-scale study on the landscape of academic references in Stack Overflow. Our findings reveal that Stack Overflow communities with different domains of interest engage with academic literature at varying frequencies and speeds. The contradicting patterns suggest that some disciplines may have diverged in their interests and development trajectories from the corresponding practitioner community. Finally, we discuss the potential of Stack Overflow in gauging the real-world relevance of academic research. - NomNom: Explanatory Function Names for Program Synthesizers
π IEEE/ACM International Conference on Software Engineering
Abstract: Despite great advances in program synthesis techniques, they remain algorithmic black boxes. Although they guarantee that when synthesis is successful, the implementation satisfies the specification, they provide no additional information regarding how the implementation works or the manner in which the specification is realized. One possibility to answer these questions is to use large language models to construct human-readable explanations. Unfortunately, experiments reveal that LLMs frequently produce nonsensical or misleading explanations when applied to the unidiomatic code produced by program synthesizers. In this paper, we develop an approach to reliably augment the implementation with explanatory names. Experiments and user studies indicate that these names help users in understanding synthesized implementations. - Designing adaptive interventions for human-aware autonomous systems
π
Abstract: Shared autonomy aims to have humans and autonomous agents work together on a shared goal to enhance the systemβs performance and safety. To provide a seamlessly safe experience, the AI agent needs to estimate the human intentions to adaptively assist humans. We aim to let human collaborator work on their goals while monitoring them continuously for unsafe behavior and making interventions to enhance the safety and performance of the system. We plan to evaluate our proposed system using a within-subject design user study. We posit that adaptive interventions of varying intensities and modalities will enhance the interaction between autonomous systems and humans while improving the overall system and environment safety. - Make It Make Sense! Understanding and Facilitating Sensemaking in Computational Notebooks
π
Abstract: Reusing and making sense of other scientists' computational notebooks. However, making sense of existing notebooks is a struggle, as these reference notebooks are often exploratory, have messy structures, include multiple alternatives, and have little explanation. To help mitigate these issues, we developed a catalog of cognitive tasks associated with the sensemaking process. Utilizing this catalog, we introduce Porpoise: an interactive overlay on computational notebooks. Porpoise integrates computational notebook features with digital design, grouping cells into labeled sections that can be expanded, collapsed, or annotated for improved sensemaking. We investigated data scientists' needs with unfamiliar computational notebooks and investigated the impact of Porpoise adaptations on their comprehension process. Our counterbalanced study with 24 data scientists found Porpoise enhanced code comprehension, making the experience more akin to reading a book, with one participant describing it as It's really like reading a book. - Cognitive biases in software development
π Communications of the ACM
Abstract: Cognitive biases are hardwired behaviors that influence developer actions and can set them on an incorrect course of action, necessitating backtracking. Although researchers have found that cognitive biases occur in development tasks in controlled lab studies, we still do not know how these biases affect developers' everyday behavior. Without such an understanding, development tools and practices remain inadequate. To close this gap, we conducted a two-part field study to examine the extent to which cognitive biases occur, the consequences of these biases on developer behavior, and the practices and tools that developers use to deal with these biases. We found about 70% of observed actions were associated with at least one cognitive bias. Even though developers recognized that biases frequently occur, they are forced to deal with such issues with ad hoc processes and suboptimal tool support. As one participant (IP12) lamented: There is no salvation! - Developers Who Vlog: Dismantling Stereotypes through Community and Identity
π ACM SIGCHI Conference on Computer-Supported Cooperative Work and Social Computing
Abstract: Developers are more than `nerds behind computers all day`, they lead a normal life, and not all take the traditional path to learn programming. However, the public still sees software development as a profession for `math wizards`. To learn more about this special type of knowledge worker from their first-person perspective, we conducted three studies to learn how developers describe a day in their life through vlogs on YouTube and how these vlogs were received by the broader community. We first interviewed 16 developers who vlogged to identify their motivations for creating this content and their intention behind what they chose to portray. Second, we analyzed 130 vlogs (video blogs) to understand the range of the content conveyed through videos. Third, we analyzed 1176 comments from the 130 vlogs to understand the impact the vlogs have on the audience. We found that developers were motivated to promote and build a diverse community, by sharing different aspects of life that define their identity, and by creating awareness about learning and career opportunities in computing. They used vlogs to share a variety of how software developers work and live---showcasing often unseen experiences, including intimate moments from their personal life. From our comment analysis, we found that the vlogs were valuable to the audience to find information and seek advice. Commenters sought opportunities to connect with others over shared triumphs and trials they faced that were also shown in the vlogs. As a central theme, we found that developers use vlogs to challenge the misconceptions and stereotypes around their identity, work-life, and well-being. These social stigmas are obstacles to an inclusive and accepting community and can deter people from choosing software development as a career. We also discuss the implications of using vlogs to support developers, researchers, and beyond. - Reel life vs. real life: how software developers share their daily life through vlogs
π ACM International Conference on the Foundations of Software Engineering
Abstract: Software developers are turning to vlogs (video blogs) to share what a day is like to walk in their shoes. Through these vlogs developers share a rich perspective of their technical work as well their personal lives. However, does the type of activities portrayed in vlogs differ from activities developers in the industry perform? Would developers at a software company prefer to show activities to different extents if they were asked to share about their day through vlogs? To answer these questions, we analyzed 130 vlogs by software developers on YouTube and conducted a survey with 335 software developers at a large software company. We found that although vlogs present traditional development activities such as coding and code peripheral activities (11%), they also prominently feature wellness and lifestyle related activities (47.3%) that have not been reflected in previous software engineering literature. We also found that developers at the software company were inclined to share more non-coding tasks (e.g., personal projects, time spent with family and friends, and health) when asked to create a mock-up vlog to promote diversity. These findings demonstrate a shift in our understanding of how software developers are spending their time and find valuable to share publicly. We discuss how vlogs provide a more complete perspective of software development work and serve as a valuable source of data for empirical research. - Mental Models of Mere Mortals with Explanations of Reinforcement Learning
π ACM Transactions on Intelligent Systems and Technology
Abstract: How should reinforcement learning (RL) agents explain themselves to humans not trained in AI? To gain insights into this question, we conducted a 124-participant, four-treatment experiment to compare participantsβ mental models of an RL agent in the context of a simple Real-Time Strategy (RTS) game. The four treatments isolated two types of explanations vs. neither vs. both together. The two types of explanations were as follows: (1) saliency maps (an βInput Intelligibility Typeβ that explains the AIβs focus of attention) and (2) reward-decomposition bars (an βOutput Intelligibility Typeβ that explains the AIβs predictions of future types of rewards). Our results show that a combined explanation that included saliency and reward bars was needed to achieve a statistically significant difference in participantsβ mental model scores over the no-explanation treatment. However, this combined explanation was far from a panacea: It exacted disproportionately high cognitive loads from the participants who received the combined explanation. Further, in some situations, participants who saw both explanations predicted the agentβs next action worse than all other treatmentsβ participants. - What's Wrong with Computational Notebooks? Pain Points, Needs, and Design Opportunities
π ACM Conference on Human Factors in Computing Systems
Abstract: Computational notebooks - such as Azure, Databricks, and Jupyter - are a popular, interactive paradigm for data scientists to author code, analyze data, and interleave visualizations, all within a single document. Nevertheless, as data scientists incorporate more of their activities into notebooks, they encounter unexpected difficulties, or pain points, that impact their productivity and disrupt their workflow. Through a systematic, mixed-methods study using semi-structured interviews (n=20) and survey (n=156) with data scientists, we catalog nine pain points when working with notebooks. Our findings suggest that data scientists face numerous pain points throughout the entire workflow - from setting up notebooks to deploying to production - across many notebook environments. Our data scientists report essential notebook requirements, such as supporting data exploration and visualization. The results of our study inform and inspire the design of computational notebooks. - A tale from the trenches: cognitive biases and software development
π IEEE/ACM International Conference on Software Engineering
Abstract: Cognitive biases are hard-wired behaviors that influence developer actions and can set them on an incorrect course of action, necessitating backtracking. While researchers have found that cognitive biases occur in development tasks in controlled lab studies, we still don't know how these biases affect developers' everyday behavior. Without such an understanding, development tools and practices remain inadequate. To close this gap, we conducted a 2-part field study to examine the extent to which cognitive biases occur, the consequences of these biases on developer behavior, and the practices and tools that developers use to deal with these biases. About 70% of observed actions that were reversed were associated with at least one cognitive bias. Further, even though developers recognized that biases frequently occur, they routinely are forced to deal with such issues with ad hoc processes and sub-optimal tool support. As one participant (IP12) lamented: There is no salvation! - Supporting Code Comprehension via Annotations: Right Information at the Right Time and Place
π IEEE Symposium on Visual Languages and Human-Centric Computing
Abstract: Code comprehension, especially understanding relationships across project elements (code, documentation, etc.), is non-trivial when information is spread across different interfaces and tools. Bringing the right amount of information, to the place where it is relevant and when it is needed can help reduce the costs of seeking information and creating mental models of the code relationships. While non-traditional IDEs have tried to mitigate these costs by allowing users to spatially place relevant information together, thus far, no study has examined the effects of these non-traditional interactions on code comprehension. Here, we present an empirical study to investigate how the right information at the right time and right place allows usersβespecially newcomersβto reduce the costs of code comprehension. We use a non-traditional IDE, called Synectic, and implement link-able annotations which provide affordances for the accuracy, time, and space dimensions. We conducted a between-subjects user study of 22 newcomers performing code comprehension tasks using either Synectic or a traditional IDE, Eclipse. We found that having the right information at the right time and place leads to increased accuracy and reduced cognitive load during code comprehension tasks, without sacrificing the usability of developer tools. - Latent patterns in activities: a field study of how developers manage context
π IEEE/ACM International Conference on Software Engineering
Abstract: In order to build efficient tools that support complex programming tasks, it is imperative that we understand how developers program. We know that developers create a context around their programming task by gathering relevant information. We also know that developers decompose their tasks recursively into smaller units. However, important gaps exist in our knowledge about: (1) the role that context plays in supporting smaller units of tasks, (2) the relationship that exists among these smaller units, and (3) how context flows across them. The goal of this research is to gain a better understanding of how developers structure their tasks and manage context through a field study of ten professional developers in an industrial setting. Our analysis reveals that developers decompose their tasks into smaller units with distinct goals, that specific patterns exist in how they sequence these smaller units, and that developers may maintain context between those smaller units with related goals. - Explaining Reinforcement Learning to Mere Mortals: An Empirical Study
π International Joint Conference on Artificial Intelligence
Abstract: We present a user study to investigate the impact of explanations on non-experts? understanding of reinforcement learning (RL) agents. We investigate both a common RL visualization, saliency maps (the focus of attention), and a more recent explanation type, reward-decomposition bars (predictions of future types of rewards). We designed a 124 participant, four-treatment experiment to compare participants? mental models of an RL agent in a simple Real-Time Strategy (RTS) game. Our results show that the combination of both saliency and reward bars were needed to achieve a statistically significant improvement in mental model score over the control. In addition, our qualitative analysis of the data reveals a number of effects for further study. - Context in programming: an investigation of how programmers create context
π International Conference on Cooperative and Human Aspects of Software Engineering
Abstract: A programming context can be defined as all the relevant information that a developer needs to complete a task. Context comprises information from different sources and programmers interpret the same information differently based on their programming goal. In fact, the same artifact may create a different context when revisited. Context, therefore, by its very nature is a `slippery notion.` To understand how people create context we observed six programmers engaged in exploratory programming and performed a qualitative analysis of their activities. We observe that the interactions with artifacts and a mapping of meaning from those artifacts for a programming activity determines how one creates context. - What makes a task difficult? An empirical study of perceptions of task difficulty
π IEEE Symposium on Visual Languages and Human-Centric Computing
Abstract: Estimating the difficulty of tasks is imperative for project planning, task assignment, and cost calculation. However, little is known about how and for what purpose software practitioners estimate task difficulty in their day-to-day work. In this paper, we interviewed 15 professionals to understand their needs and perceptions when estimating task difficulty. We find that practitioners do estimate the difficulty of tasks for scheduling and prioritizing their work. Additionally, performing such estimation requires more than one metric, and across more than one domain (i.e. code metrics, process metrics, and task metrics). The use of metrics that encapsulate different aspects of a task allows developers to gain a holistic view of the task and its potential difficulty. - Context in exploratory programming: Towards a theoretical framework
π IEEE Symposium on Visual Languages and Human-Centric Computing
Abstract: Creativity theory states good designs are achieved by having a multitude of these designs [1]. Exploratory Programming is the process of trying out designs while writing software. Programmers have to evaluate these alternative implementations in order to implement new ideas [2]. These alternatives often have multiple objectives which might prompt a programmer to work towards multiple goals in episodes. Episodes are distinct periods when a programmer works towards a certain goal. These episodes may be interleaved where programmers compare different episodes [3]. However, there is little research which focuses on how programmers abstract meaningful and appropriate information from their exploration of alternatives to integrate into their current work. We refer to these meaningful abstractions as context.
ace projects
Science in Stack Overflow
Exploring references to academic literature on Stack Overflow (SO) to understand how scientific knowledge diffuses into this practitioner-centric forum.
Sustainable Software Engineering
Redefining the notion of sustainability for software projects, investigate various open-source and industrial projects to derive metrics, and help future engineers build sustainable software.
Trustworthy AI code generation
Developing a method to ensure trustworthy AI-generated code suggestions through statistical tools and program analysis techniques.
Codebase Generation Model Evaluation
The project examines the performance of AI in creating codebases, assessing developer experience and code effectiveness.
Expanding the Scope of Changes Made by Code Prompts
Exploring the limitations of current automatic code suggestion models in generating complete code bases from abstract prompts and proposing a human-in-the-loop framework to extend their capabilities
Cognitive control and intervention in autonomous systems
To design good collaboration in human-autonomous system, we need to understand the cognitive barriers in humans and provide good explanation/intervention systems. In these projects, we look at different autonomous systems and study how human cognition needs support for seamless and safe operations.