27 OCT 2025 ISSUE 20
7. Department Summer Internship Programme

  • Census and Statistics Department, HKSAR
  • DFI Retail Group



Census and Statistics Department, HKSAR


Mr. Yu Chun Keung, Leo, JP, Commissioner for Census and Statistics and students of the Department of Statistics
and Data Science 



Chung Cheuk Yau
BSc in Statistics



During my internship at the Census and Statistics Department (C&SD), I had the privilege of working in the General Household Survey Section (1) of the Labour Statistics Branch (4), under the guidance of my supervisor, Mr Stanley Tsang. My responsibilities included automating the process of generating contact lists for telephone interviews and improving the efficiency of survey operations. I also developed a program to create maps visualising address locations, streamlining field visit planning. Additionally, I studied field visit data to identify effective visit time slots and presented my findings.

Through these hands-on tasks, I put my theoretical knowledge into practice, honing my skills in data analysis and programming, while gaining a deeper understanding of the operational framework of the General Household Survey. To communicate my findings effectively, I developed skills in creating clear visualisations and presenting my work in a coherent and concise manner.

Alongside my internship at C&SD, I was supervised by Prof Phillip Yam at CUHK to explore the Comonotone-Independence Bayes Classifier (CIBer). I applied CIBer to various datasets for classification tasks, comparing its performance with other machine learning models, such as support vector machines, multilayer perceptrons, and decision trees. In addition, I extended the application of CIBer to a regression context and evaluated its performance in predicting continuous outcomes.

Tackling data problems required a critical approach to each task, including evaluating available data, selecting the most appropriate methods, and refining solutions through iterative testing. Unlike coursework with clearly defined instructions, this process demanded independent problem-solving and analytical thinking. It helped me develop a proactive mindset, which is essential for success in advanced studies or a future career in data-driven fields.

This internship was a rewarding and transformative experience. The programme has strengthened my technical expertise and professional skills, while sparking my passion for exploring tools and methodologies in statistics and data science. I am deeply grateful for the exceptional opportunity provided by C&SD and the Department of Statistics and Data Science. Above all, I would like to extend my heartfelt gratitude to Mr Tsang and Prof Yam for their insightful mentorship and invaluable guidance throughout this journey.



Ko Cheuk Him
BSc in Statistics




This summer, I had the honour of working as a summer intern at the Census and Statistics Department (C&SD) in the Trade Research and Analytics Branch (Section 1). Under the guidance and supervision of my supervisor, Mr Harry Luk, I helped develop a deep learning model to automatically classify imported commodities according to descriptions in trade declarations.

During my two-month internship, I concentrated on two key areas to enhance the development of a more effective classification model. My first task focused on explainable AI, a powerful tool that allows us to understand the actions and decisions of a black box model. I thoroughly explored explainable AI techniques, such as LIME and SHAP, using Python to visualise the behaviours of the trained model and identify the causes of classification errors, thus effectively improving the model based on these insights. My second task focused on automatic hyperparameter tuning. Thanks to guidance from my mentors, I deepened my understanding of various hyperparameters, such as learning rate and batch size, and designed an automated workflow to search for the most optimised hyperparameters through cross-validation. The tuning process not only improved the model’s accuracy but also streamlined the manual steps.

In addition to my internship at C&SD, I worked on a simulation project related to star formation under the supervision of Prof Fan at CUHK. My task was to investigate how to improve the accuracy of the Bayesian approach to estimating the relationship between the magnetic field strength and volume density of a molecular cloud. I initially struggled with astrophysics and Bayesian inference. Under Prof Fan’s expert guidance, I learnt useful statistical computing techniques such as MCMC algorithms, as well as coding skills in R and Linux, which enabled me to successfully complete the task.

I would like to express my deep gratitude to Mr Harry Luk, Prof Fan, and to everyone who provided me with invaluable assistance and encouragement during my internship. This programme has enriched my understanding of statistics, data science, and physics. I believe that the research skills and work experience I have gained will play a key role in my future studies and career.


Kwok Tsun Yau
BSc in Statistics



This summer, I had the privilege of participating in the summer attachment programme at the Census and Statistics Department (C&SD), within the Trade Research and Analytics Branch (Section 1). Under the expert guidance of my supervisor, Mr Harry LUK, I embarked on a journey of statistical discovery and practical application.
 
My time at C&SD was marked by hands-on learning and the development of valuable skills. I executed comprehensive hyperparameter tuning for the black box model by using various methods, including Bayesian optimisation, hyperband optimisation, and random search. I tried to adjust different hyperparameters to increase the accuracy of the model to improve HS code identification. Second, I applied explainable AI techniques, including LIME and SHAP, to interpret the behaviour of the BERT model and identify the reasons for incorrect HS code predictions.
 
During my time at CUHK, I had the privilege of working under the guidance of Dr Liu. I delved deeper into the concept of “data nuggets”, a technique used to manage excessively large datasets by sampling the data while preserving their underlying structure. Dr Liu gave me clear instructions and detailed insights, which enhanced my ability to effectively use data nuggets. Additionally, I conducted comparative analyses, evaluating the results against other sampling methods such as random sampling. I also applied the clustering results derived from data nuggets to perform multinomial logistic regression and support vector machines, demonstrating their efficacy in maintaining the integrity of the data structure.
 
I would like to express my sincere gratitude to my supervisors, Mr Harry Luk and Dr Liu, as well as to everyone who has supported me throughout this journey. The knowledge and experience I have gained are invaluable and have laid a solid foundation for my future studies and career. Overall, I had a great time during this internship programme.


Leong Chun Kit
BSc in Statistics




During my internship at the Census and Statistics Department (C&SD) under the Social Data Development Branch – Census Planning Section (1), I worked on the development of an address matching system for unstructured data. My tasks included vectorising English addresses at the word level and Chinese addresses at the character level, standardising inconsistent expressions, breaking down addresses into semantic components, and applying weighted similarity measures. I also researched address patterns for different housing types, which, once incorporated into the algorithm, significantly reduced mismatches. In addition, I conducted desktop research on the practices adopted by other National Statistical Offices for the geospatial presentation of census statistics, focusing on user interface design, colour schemes, and output formats.

Through these tasks, I improved my technical skills in data preprocessing, feature engineering, similarity computation, and optimisation for large datasets using multiprocessing. More importantly, I learnt how statistical reasoning, natural language processing, and domain knowledge can be combined to solve practical, large-scale data challenges.

Alongside this internship, I also engaged in research at CUHK under the supervision of Dr Chan Chun Man on voice recognition, which allowed me to further explore natural language processing and machine learning techniques.

Overall, the internship at C&SD gave me invaluable hands-on experience in applying statistical and computational methods to real-world data problems, while also broadening my perspective on how technological innovation can enhance statistical services for society.


Leung Ka Yu
BSc in Statistics



During my internship at the Census and Statistics Department (C&SD), I had the honour of working in the Construction and Miscellaneous Services Statistics Section of the Sectoral Economic Statistics Branch (4) under the guidance of my supervisor, Mr Benjamin Chan. I was involved in conducting desktop research to resolve missing data issues, identify and summarise publicly available information on construction sites, such as average transaction prices, contract amounts, and building usage.

I have strengthened my VBA coding skills for data manipulation. Moreover, as desktop research often involves significant manual work, I used generative AI to conduct efficient, high-quality research to find publicly available information. Through this experience, I leveraged AI tools to optimise the efficiency and accuracy of desktop research through iterative improvements. I learnt to validate AI outputs to ensure their accuracy and relevance. My study of AI tools deepened my understanding of their practical applications, allowing me to use AI as a versatile tool to achieve my research goals. The results improved the efficiency and quality of desktop research and laid the groundwork for further developments.

In addition, under the supervision of Dr Ho Kwak Wah at CUHK, I worked on the nowcasting of GDP using the Mixed-Data Sampling (MIDAS) model. With no prior knowledge of the MIDAS model, I gained a clear understanding of this model thanks to Dr Ho’s guidance. This project involved developing models to nowcast GDP by integrating high- and low-frequency data in R, which enhanced my ability to analyse real-time economic indicators. Through this experience, I expanded my knowledge of the MIDAS model and gained valuable insights into macroeconomic forecasting.

I would like to express my deepest gratitude to my supervisors, Mr Benjamin Chan and Dr Ho, for their patience, support, and guidance, and to the department for providing me with this internship opportunity. This internship allowed me to acquire practical expertise and a solid foundation, both academically and professionally, honing my skills and deepened my understanding. Overall, this programme is an absolutely invaluable experience.


DFI Retail Group

Ho Ka Ho
BSc in Statistics



I had the honour of joining DFI Retail Group as an Ecommerce & Retail Digitisation Intern. This was a meaningful journey, not only for my career but also for my understanding of how digitalisation is transforming the retail industry.

Unlike school assignments, data in a professional setting are usually messy, unstructured, and constantly changing. I developed an automated commercial report by transforming raw data into structured reports and summary dashboards. The report shows all the relevant statistics with detailed calculations and formulas. I believe this will provide a reliable reference for the company to adjust its market strategies, thereby reducing the time spent analysing data when collecting new data.

Money is always the company’s main concern. I created a summary table detailing the cost savings, cost reductions, and return on investment to compare AI tools used for content generation. I believe this provides a clear financial comparison between different AI solutions, serving as a valuable guide for decision-making.

In addition, I forecasted both capital and operational expenditures for the ongoing Electronic Shelf Label project for the next five years using data provided by the DFI team. I believe this can help the company allocate its resources effectively, which will improve budget control and strategic decision-making.

The most important lesson I learnt during this journey is that not everything taught in school can be directly applied to the world of work. Continuous improvement is essential to maintain performance. I believe this experience will be a valuable and meaningful starting point for my future career.

Back to Issue
Table of Contents
1. Message from the Chair
2. Renaming of the Department of Statistics and Data Science
3. Staff Movement
4. Prizes and Awards

Staff Awards
Alumni Award
Student Awards
Recipients of Department of Statistics and Data Science Scholarships and Sponsorship
5. Departmental Activities

Student Activities
Talk by Dr. John Wright on “How to meet your Mr/Mrs Right" at The ELCHK Yuen Long Lutheran Secondary School
MSc Annual Dinner
The International Symposium on Statistics and Risk Management 2024
Delegation Visits from Mainland Institutions
Distinguished Lectures and Seminars in 2024-25
6. Exchange Sharing
7. Department Summer Internship Programme
8. Internship Sharing
 

Past Issue