The main goal of the conference is to foster discussion around the latest advances in Artificial Intelligence

Note - the Programme is subject to change

Talks

Rethinking AI Bias: Why Discrimination Concerns Are Overstated in New Zealand Law and Policy
Dr. Benjamin Liu , The University of Auckland

In the evolving landscape of artificial intelligence, legal and policy discussions in New Zealand often overemphasise bias and discrimination as inherent risks, yet this framing overlooks the fact that there are no concrete examples of AI discrimination, be it in finance (e.g., credit applications), insurance, university admissions, facial recognition, border control, or visa granting, beyond a few highly publicised incidents like the Google Photos app mislabelling, which lack real economic consequences. In fact, initial research alleging bias in tools like the COMPAS recidivism predictor was rebutted by subsequent analyses showing no racial bias in predictive accuracy. Furthermore, AI discrimination, even if present, is easier to detect and rectify through algorithmic audits and data adjustments. Frameworks like the Human Rights Act 1993 can sufficiently address potential discrimination issues if they do arise.

Planning for AI implementation success: key learnings from the dual-site pilot of AI-powered self-rostering in healthcare in NZ
Sharon Sitters, Unitec Institute of Technology

Investment in AI-powered healthcare technologies is increasing rapidly, yet early evidence suggests AI project failure rates are twice as high as non-AI technologies. Moving beyond technological novelty towards clinical fit is essential to avoid non- and mal- adoption of AI technologies in healthcare settings. Importantly, there is a dearth of literature exploring health professionals’ perceptions of AI technologies in New Zealand. Particularly in relation to the sociotechnical challenges posed by agentic AI during implementation. Here, we report qualitative findings relating to clinical fit and receptivity from a dual-site pilot of AI self-rostering in New Zealand, guided by the Consolidated Framework for Implementation Research (CFIR). Our findings demonstrate how contextual differences predispose different perceptions towards AI, and the strategies that can be used to overcome these challenges.

Leveraging Large Language Models for Low Resource Languages
Surangika Ranathunga, Massey University

In the field of Natural Language Processing (NLP), the availability of electronic data resources is usually considered as the main descriptor of ‘resourcefulness’ of languages. Conversely, this means that the majority of the 7000+ languages in the world, get classified as low-resource languages (LRLs). However, this lack of data has led to subsequent issues in the era of Large Language Models (LLMs). Due to lack of linguistic resources and relative unfamiliarity of LRLs in big tech companies building these technologies, most LLMs do not have adequate coverage for most of the LRLs. As such, there is a risk of LRLs continuing to be neglected, thus further increasing the digital divide between communities. In this talk, I discuss different contributions made by my research to address this problem, using the LRL Sinhala as a case study. In particular, I will highlight the importance of data collection, annotation, language-specific LLM creation and robust evaluation. I will also highlight the importance of openness in NLP research.

AI-Driven Forensic Application for Detecting of Students’ Mental Health Risks in New Zealand Institutions Using Keyword and Sentiment-Based Methods
Anastasia Mozhaeva, Eastern Institute of Technology

Student mental health risks in New Zealand institutions demand tools that are scalable, transparent, and safe. We present an AI-driven Python application for forensic text assessment that combines keyword tracking with lexicon-based sentiment scoring to triage case reports. On 15 anonymised cases, the system categorised 6 as high risk (6/15, 40.0%), 5 as medium risk (5/15, 33.3%), and 4 as low risk (4/15, 26.7%). Sentiment polarity spanned extremes from −0.9952 to 0.9998. Among high-risk cases, 3 of 6 also showed strongly negative sentiment (compound ≤ −0.9), whereas 3 of 6 exhibited highly positive compounds ≥ 0.998, indicating that lexical crisis cues can elevate risk even when overall tone is positive. The most frequent top tracked term was help in 6 of 15 cases (40.0%), followed by imposter in 3 of 15 (20.0%). Medium-risk cases had a mean compound of ≈0.931, and low-risk cases averaged ≈0.9984.

A complementary safety analysis showed overlapping Student Safety Composite Scores between Unsafe and Safe categories around the 70 percent region, with a Safe outlier near 94.2 percent and a Neutral case at 66.55 percent. Median-centred testing detected significant distributional differences across risk groups (Kruskal–Wallis H = 6.84, p = 0.033), whereas mean-based ANOVA did not (p = 0.263), reinforcing the need for distribution-aware evaluation in high-stakes contexts . In a model comparison, GPT-4.0 delivered higher pastoral-care accuracy than GPT-4.1 Mini, with category-level advantages and a significant association with expert triage (Chi-square p = 0.018) . Notably, close model agreement in 13 of 15 cases did not guarantee correct escalation in Unsafe scenarios (χ² = 0.27, p = 0.875), underscoring that agreement does not equal safety. This pilot study contributes a reproducible forensic AI framework for student wellbeing, demonstrating both the potential and the ethical safeguards required when deploying AI in sensitive educational and mental healthcare.

AI assisted mussel assessments
Julian Maclaren, Nelson AI Institute

Current mussel harvest decisions in New Zealand’s Greenshell™ mussel (Perna canaliculus) industry rely on subjective human assessment of mussel condition, leading to inconsistent evaluations and potentially suboptimal harvest timing. We compare traditional by-eye assessment methods with an artificial intelligence based approach. This study utilizes a dataset of Greenshell™ mussels collected over 18 months from multiple locations across the South Island of New Zealand. The AI system analyses smartphone photographs of shucked mussels and predicts scores corresponding to dry weight yield. The system exhibited superior accuracy with respect to the traditional by-eye approach.

ALMA: Lessons from Aotearoa’s first population-level digital twin
Richard Dean, PHF Science

Since its launch in 2024, ALMA (Aotearoa’s Large Scale Multi-Agent simulator) has shown how custom AI systems can go far beyond generic chatbots or off-the-shelf assistants like Copilot—offering tailored, scalable support for complex public health and environmental challenges. Recently awarded the Trailblazer in AI Innovation at the 2025 Aotearoa AI Awards, ALMA integrates diverse national data sources—including census, health records, satellite-derived environmental indicators, and transport feeds—to create dynamic, AI-driven simulations of real-world systems.

This session will showcase ALMA’s current capabilities and near-term roadmap. We’ll demonstrate how the platform enables scenario testing for climate-related disruptions and supports faster, evidence-based decisions. Early work on LLM-augmented reporting tools shows how natural language can help translate complex outputs into clear, actionable summaries—augmenting, not replacing, expert interpretation.

ALMA’s architecture is designed to bring traditionally siloed datasets together, opening new ways to explore systemic issues like health inequities or gaps in pandemic preparedness. Guided by a vision of inclusive co-design, ALMA is being developed with future collaboration in mind—ensuring that AI tools serve communities by being shaped alongside them, not just deployed on their behalf.

By focusing on real data, local context, and practical utility, ALMA offers a grounded blueprint for AI that empowers public sector decision-makers. This talk will share what we’ve learned so far—and where this approach might lead next.

Interpretable Evolutionary Machine Learning for Healthcare and Cybersecurity Applications
Qurrat Ul Ain, Victoria University of Wellington

Artificial intelligence is transforming critical domains such as healthcare and cybersecurity, where decision-making requires not only high accuracy but also transparency and trust. This presentation will showcase recent advances in evolutionary machine learning, with a focus on genetic programming methods for feature construction, feature selection, and interpretable model design. Our research demonstrates how evolutionary computation can automatically generate novel, human-understandable features and models that deliver state-of-the-art performance while providing actionable insights.

In healthcare, we highlight applications in skin cancer detection, breast density classification, and multimodal biomedical image analysis, where interpretability directly supports clinical decision-making. In cybersecurity, we introduce emerging work on interpretable AI for advanced persistent threat (APT) detection, where evolving feature representations and models enable improved resilience against complex attacks.

The talk will further outline collaborative opportunities between academia and industry, emphasizing how interpretable AI methods can bridge the gap between performance and trustworthiness. By presenting both methodological contributions and applied case studies, this work underscores the potential of evolutionary machine learning to advance AI research while addressing pressing societal needs.

An AI framework for patient-specific 3D hip reconstruction and surgical planning from 2D X-rays
Chris  Rapson, Formus Labs Ltd

Surgeons and patients achieve better outcomes when planning total hip arthroplasty surgery in 3D than in 2D. However, planning in 3D currently requires Computed Tomography (CT) imagery which introduces disadvantages in terms of cost, accessibility and radiation exposure. Reconstructing 3D models of a patient's anatomy from 2D X-ray images would achieve the best of both worlds. This paper demonstrates the training of an AI model for 3D reconstruction from X-rays and its intended application to 3D surgical planning. The training process makes use of Digitally Reconstructed Radiographs (DRRs) - synthetic images that are close approximations to X-rays, with the advantages of more flexible augmentation possibilities and perfect alignment to CT (and the associated 3D ground truth). Reconstruction results are shown and compared to ground truth. The resulting meshes can be seamlessly integrated into existing 3D surgical planning software, and are demonstrated to give only small differences when compared to plans generated using 3D segmentation of a CT scan.

Automated Detection of Highway Bridge Surface Cracks Leveraging Deep Learning Models
Michael J. Watts, Media Design School

This research investigated the automation of surface crack detection on highway bridges through the implementation of state-of-the-art deep learning models. While traditional manual inspections remain labour-intensive, time-consuming, and susceptible to human error, this research built upon previous work employing YOLOv8x by utilising the next-generation YOLOv12 architecture. The newly trained model demonstrated substantial improvements in detection precision, computational efficiency, and robustness under diverse environmental conditions. High-resolution images collected via drones and fixed monitoring systems underwent enhanced preprocessing and adaptive data augmentation to increase dataset diversity and reduce overfitting. Comparative analyses across multiple model generations, including YOLOv6, YOLOv8x, and YOLOv12, revealed that YOLOv12 achieved superior performance with a mean Average Precision (mAP) of 98.3%, significantly improving both detection speed and accuracy. Comprehensive evaluations—encompassing confusion matrices, precision-recall curves, and k-fold cross-validation—confirmed the model’s stability and generalization across various bridge surface conditions. By integrating advanced detection algorithms, optimized image enhancement techniques, and refined data acquisition strategies, this research contributed to the evolution of intelligent bridge inspection systems, offering a faster, more reliable, and scalable solution for ensuring the structural integrity and safety of highway bridges.

Towards Fair AI with a Multi-Phase Framework Tested on NZ Data
Hamid Sharifzadeh, Unitec Institute of Technology

Artificial Intelligence (AI) is increasingly adopted across sectors such as healthcare, finance, and employment. Despite their predictive power, AI systems remain vulnerable to bias, particularly when trained on data that reflect historical inequalities. Such biases often perpetuate systemic disparities through automated decisions, disproportionately affecting marginalised populations. In Aotearoa New Zealand, ensuring fairness in AI carries additional weight due to obligations under Te Tiriti o Waitangi, which upholds equity for the Indigenous Māori population. While global and national initiatives, regulatory policies, data interventions, and model-based fairness techniques exist, their implementation remains fragmented and inconsistently effective.

This research proposes a reproducible, multi-phase fairness framework for bias mitigation across the machine learning (ML) lifecycle. It integrates three strategies: Reweighing (pre-processing), Adversarial Debiasing (in-processing), and Calibrated Equalised Odds (post-processing). Fairness is evaluated using Statistical Parity Difference (SPD), Disparate Impact (DI), Equal Opportunity Difference (EOD), and Average Odds Difference (AOD), alongside Accuracy and Balanced Accuracy (BA), leveraging IBM’s AIF360 toolkit. The framework is tested on multiple ML models, including Random Forest, XGBoost, LightGBM, and TabNet, applied to both global publicly available datasets (U.S. Adult Census, U.S. Diabetes, Taiwan Credit) and Aotearoa New Zealand datasets (2023 Census and ACC claims from Stats NZ IDI DataLab).

We also compared data balancing techniques against the proposed framework. Results show that common approaches such as SMOTE and GAN-based augmentation often failed to improve fairness and, in some cases, amplified disparities by distorting group representation. In contrast, the proposed framework consistently enhanced fairness in both attribute-specific and intersectional settings. For example, in the US Adult dataset, LightGBM achieved a 99.82% improvement in DI and a 91.67% reduction in EOD, while in the New Zealand Census dataset, Random Forest achieved a 72.55% DI improvement and an 89.45% reduction in EOD. Importantly, these fairness gains did not compromise performance: Accuracy and Balanced Accuracy increased across models, in some cases by more than 15%.

The findings demonstrate the practical effectiveness of the proposed multi-stage framework in advancing fairness across equity-sensitive environments. The study offers a scalable and culturally informed methodology for AI fairness, with strong relevance to Aotearoa New Zealand and applicability to comparable global contexts.

AI-based synthetic population for Heat and Air Pollution Risk Assessment in Auckland
Mohammad Dehghan Shoar, PHF Science

Extreme heat and degraded air quality frequently co-occur in urban environments, yet most risk assessments assume static populations and uniform exposures. We present a dynamic, agent-based framework that integrates synthetic population modelling with large language model (LLM)–generated daily movement diaries and multi-sensor remote sensing data to capture time-resolved exposure patterns across Auckland.

An SA2-level synthetic population stratified by demographic (age, gender, ethnicity, household composition) and socioeconomic (occupation, education, income, commute) attributes is constructed and assigned to roles such as student, worker, or caregiver. Role-aware daily activity chains are generated using an LLM constrained by census anchors and commuting patterns, producing realistic diaries that are spatially linked to workplaces, schools, and amenities. Agent trajectories are then intersected with gridded remote sensing–derived environmental layers—including urban heat metrics (e.g., Land Surface Temperature) and pollutants (NO₂, SO₂, PM₂.₅, CO) from GHAP (Global High Air Pollutants), Sentinel-5P, MODIS, and Landsat 8/9—to compute path-integrated, microenvironment-adjusted exposures that account for infiltration and dwell time.

Remote sensing datasets are interpolated to daily resolution, with spatial resolution varying by instrument, enabling exposure estimates to align directly with simulated movement patterns. These daily exposures are then aggregated into annual averages and cumulative metrics, capturing both short-term variability and long-term burdens. Preliminary results reveal pronounced diurnal patterns, with persistent co-exposure hotspots in dense urban centres and significant reductions in urban heat within vegetated and forested areas. Importantly, individual agents in Auckland exhibit highly heterogeneous annual exposures, with some substantially exceeding WHO health guidelines for pollutants such as NO₂ and PM₂.₅.

Applying WHO and HAPINZ-based guidelines, we estimate that these elevated exposures translate into measurable productivity losses and adverse health outcomes at both the individual and community scale. The framework also supports “what-if” analyses of targeted interventions allowing planners to identify strategies that most effectively reduce combined heat and pollution burdens. This approach demonstrates the value of combining synthetic populations, LLM diaries, and remote sensing for exposure science, and is scalable, generalizable, and suitable for near–real-time scenario analysis during extreme heat and smoke events.

Leveraging Machine Learning to Integrate Income, Expenditure, and Wealth Data from Household Economic Survey
Sijin Zhang, The Treasury New Zealand

This study showcases the current exploration of applying machine learning techniques to enhance social system modelling within the Treasury. Specifically, we detail the development of integrated wealth and expenditure datasets using the Household Economic Survey (HES) income data and demographic variables from the Integrated Data Infrastructure (IDI). By employing a tailored eXtreme Gradient Boosting (XGBoost) model, we simulate annual wealth and expenditure patterns across diverse population groups. The resulting datasets offer valuable insights for a wide range of policy analyses, supporting evidence-based decision-making across the Treasury and the broader New Zealand public sector.

Von, A Neuro Symbolic System for Research Role Automation
Michael Witbrock, University of Auckland

In small countries like New Zealand, one of the limiting factors in research success is the ability to support the peripheral practises around research. These processors include identifying potential collaborators, summarising, external research, finding and checking references, managing the process of preparing for and submitting grant applications, tracking PhD student applications and admissions, and many other activities not central to formulating and carrying out high impact research. In this paper we describe a prototype system, Von, intended to greatly speed our own lab's research process and to contribute to the productivity and effectiveness of New Zealand AI researchers more broadly.

Von uses a combination of explicit knowledge, representation in text and logical forms, and implicit knowledge representation in neural networks, ubiquitously to support lab processes, and autonomously creates representations of relevant concepts, relationships, workflows and theories. Although in prototype form, Von is already being used to support some processes within our Strong AI Lab (SAIL)at the University of Auckland, and will be open-sourced by the time of the conference. We hope to encourage other AI researchers around New Zealand, and then other researchers outside AI in New Zealand and more widely to use and contribute to the Von system.

 

This product has been added to your cart

CHECKOUT