02 JAN 2025 ISSUE 19
8. Department Summer Internship Programme

  • Census and Statistics Department, HKSAR
  • DFI Retail Group



Census and Statistics Department, HKSAR

Chun Cheuk Yin
BSc in Risk Management Science



This summer, I had the privilege of participating in the summer attachment programme at the Census and Statistics Department (C&SD), within the Trade Research and Analytics Branch (Section 1). Under the expert guidance of my supervisor, Mr. Ian Ng, I embarked upon a journey of statistical discovery and practical application.

My time at C&SD was marked by hands-on learning and the development of valuable skills. I delved into the world of web scraping using Python, mastering techniques to efficiently extract large datasets. The sheer volume of data involved necessitated the implementation of multiprocessing methods to optimise processing time, which served as a crucial lesson in practical data management. Furthermore, I deepened my understanding of Bayesian probabilistic deep learning models by applying them to classify text descriptions into specific codes. This involved extensive experimentation and optimisation, culminating in the development of a robust model that achieved high classification accuracy. Building on this success, I explored the use of total uncertainty to identify outliers within text descriptions, further enhancing my understanding of data analysis.

My academic journey at the university was also enriched by the guidance of Prof. Sit, who has been instrumental in my exploration of deep learning methods for forecasting price changes in limit order books. This work not only broadened my perspective on the stock market but also introduced me to innovative approaches for analysing and forecasting financial data. The large volume of data involved presented a unique opportunity to utilise a clustering account for remote program execution, which is a privilege rarely afforded to undergraduate students. This hands-on research experience provided invaluable insights into the world of academic research and its practical applications.

In closing, I extend my heartfelt gratitude to both Mr. Ng and Prof. Sit for their exceptional mentorship and unwavering support throughout this enriching summer attachment programme. Their guidance has been instrumental in shaping my understanding of statistics and its diverse applications, inspiring me to pursue a career in this dynamic field.


Kan Chun Yu Matthew
BSc in Statistics



I am deeply appreciative of the invaluable opportunity I had to work as a summer intern in the Wages & Labour Costs Statistics Section (2) of the Census and Statistics Department (C&SD), under the guidance and supervision of Ms. Ka Yan Ip. The primary focus of this section is to conduct the Annual Earnings and Hours Survey for research on labour-related topics, including analyses pertaining to the statutory minimum wage.

Throughout the 2 months of my internship, I focused on examining the monthly wages, hourly wages and weekly working hours of the labour force in Hong Kong, which allowed me to apply my data analytic skills in practical situations. I also conducted in-depth analysis on various imported labour schemes and successfully established a forecasting model that used neural network techniques and incorporated gradient descent and smoothing average methods implemented in Excel and Python. Additionally, I enhanced my proficiency in data visualisation using SAS Viya.

Furthermore, I served as a research assistant for Dr. Chun Man Chan at CUHK, where I worked on a manuscript focused on Hadoop, SQL and Spark. Under Dr. Chan’s mentorship, I not only expanded my knowledge of parallel processing for handling large-scale data processing but also acquired skills in big data analytics. I am thankful for having had this opportunity to apply theoretical concepts to practical scenarios and actively participate in research endeavours.

In conclusion, I wish to express my sincere gratitude to both the C&SD and CUHK for granting me an invaluable internship experience. It enabled me to apply my skills in data visualisation, data analysis and computer programming in real-world scenarios, laying a solid foundation for my future academic and professional pursuits. I also extend my heartfelt appreciation to my supervisors, Ms. Ip and Dr. Chan, as well as my colleagues, for their meticulous guidance and support, which made my internship experience fulfilling and enjoyable.


Lee Ching Man Jenny
BSc in Statistics



I am profoundly grateful for the invaluable opportunity to intern at the Census and Statistics Department (C&SD). During my internship, I was pleased to be assigned to the Construction and Miscellaneous Services Statistics Section of the Sectoral Economic Statistics Branch (4), which gave me the opportunity to contribute to critical research on the construction market. Under the guidance of my supervisors, I was involved in conducting desktop research to address data gaps in construction project information, developing Python programs to extract and process extensive land sale data and utilising artificial intelligence (AI) tools to filter and identify useful online texts for high-quality research.

My primary task was to gather key information, such as average transaction prices, block numbers and expected completion dates of estates. Traditionally, desktop research for such surveys involves manual searches of online resources. However, with the support of my supervisors, I undertook a pioneering study to incorporate web scraping and AI tools, which enhanced the efficiency and quality of the survey. Moreover, the programs I used can be used for future work in this annual exercise, which will significantly improve the overall process.

Additionally, I had the privilege of collaborating with Dr. Kin Yat Liu from my university on understanding a new survival analysis model. Under Dr. Liu’s guidance, I explored advanced analytical methods and conducted a simulation study. This experience deepened my understanding of models such as XGB, DeepSurvival and Enet-Cox. By applying our model to pseudo-random number sequences and validating its effectiveness, I gained insights into neural network architecture and the concept of explainable AI. One of the major challenges I faced was locating and resolving errors, as I was unfamiliar with the algorithms involved. However, with Dr. Liu’s excellent support, I was able to overcome those challenges and master various analytical techniques. This collaborative work increased my understanding of survival analysis and machine learning.

Through this internship, I honed my skills in Python tools, such as Tabula and Selenium, and I enhanced my knowledge of statistics and data science fields. The well-structured programme at the C&SD balanced guided learning and practical application, fostering open communication and continuous feedback. I extend my deepest gratitude to Mr Benjamin, Mr Henry, my colleagues in the section, the C&SD, Dr Liu and my school for this internship opportunity. It has laid a solid foundation for my future academic and professional pursuits, equipping me with a robust toolkit for any data-driven field.


Li Liangbang
BSc in Computational Data Science



I am honoured to have had the valuable opportunity to work in the Industrial Production section of the Sectoral Economic Statistics Branch (1) at the Census and Statistics Department (C&SD), under the guidance of my supervisors, Mr. Hiu Fung Lam and Mr. Kin Leung Chan. At the C&SD, I completed two main tasks. The first task involved crawling information of business firms available on the websites of trade associations and then cross-checking this information with the records in the Central Register of Establishments to find inconsistencies. To perform this task, I developed a workflow, programs and reports to facilitate follow-up for the maintenance of the survey frame. My second task was to compile a web report in English and Chinese based on the Statistical Digest of the Services Sector 2024 Edition. I needed to re-screen, summarise and visualise important statistics of 14 major service industries/domains to condense the content from the 330 pages of the original report. During this process, I mastered the use of web crawlers and dynamic crawlers; gained a comprehensive understanding of important statistics in various industries; and applied my programming skills, especially in processing text data and calculating text similarity using natural language processing.

During my time at CUHK, I worked under the supervision of Prof. Yingying Wei, who taught me patiently and guided me to start from the basic sampling method and model-based clustering. I also learnt Compute Unified Device Architecture (CUDA) parallel computing skills and used CUDA in clustering algorithms to improve their computing speed when applied to data with extremely large volumes and dimensions. Subsequently, I reproduced the batch effects correction with unknowns subtypes model, which is a hybrid model formed by a location-and-scale adjustment model and model-based clustering. I implemented this model in R and C languages, optimised the original C-based model using CUDA parallel operations and increased the computing speed. For example, with CUDA, the overall algorithm computing time was reduced by 30%. I also read several articles Prof. Wei gave to me, and I will continue my research in this area.

Finally, I would like to convey my appreciation to everyone I met during my work experience. My classmates, colleagues, supervisors and professors, as well as the office staff I met in the company and school, all gave me great help. In my future studies and career, I will follow the advice Prof. Wei gave me: walk slowly but steadily.


Wong Yin San
BSc in Statistics



I am honoured to have been given the opportunity to work as an intern in the Trade Research and Analytics Branch (Section 1) under the guidance of my supervisor, Mr. Ian Ng.

In real-world application of statistics, it is crucial to obtain data from various data sources. One of my major duties was to scrape data from the World Integrated Trade Solution website, which contains records of merchandise trading between different countries/regions across the globe. To scrape large amounts of data from various URLs, we had to write a Python script to perform the tasks automatically and efficiently. Eventually, we decided to use the multiprocessing package in Python to send web requests in parallel, thereby significantly reducing the scraping time.

Another major task was tuning a deep learning model called the unit value (UV) model. It is common for traders to make mistakes when filling in trade declarations, and the UV model aims to classify whether the unit values on declaration forms are mistakes. It uses information on a declaration form as predictors to predict the corresponding conditional distribution of the unit value of that commodity. To increase the performance of the UV model on some particular chapters of commodities, we tuned its hyperparameters and analysed the prediction results. The outcomes were then used to further improve the model.

In addition to working at the C&SD, I was also supervised by Prof. Phillip Yam at CUHK, where I worked on modelling cyber breaches using different machine learning models, such as Poisson regression, the comonotone-independence Bayes classifier and the Hawkes process. I also made presentation slides containing illustrative and intuitive examples of technical concepts. This experience deepened my understanding of statistics and gave me a taste of research in statistics-related fields.

I would like to express my sincere gratitude to my supervisors, Mr. Ng and Prof. Yam, and everyone who supported me throughout this journey. The knowledge and experience I’ve gained are invaluable and have laid a solid foundation for my future studies and career. Overall, I had a great time in this internship programme.



DFI Retail Group

Wang Chengming
BSc in Statistics



Life is about learning lessons. We gain basic knowledge and techniques during college and equip ourselves with experience and the ability to deal with real situations during our post-college careers in the workplace. The 3 months I spent working at the DFI Retail Group as an e-commerce and digitalisation intern served as the most important ‘lecture’ of my career.

In the workplace, things are a lot different from college. There’s no tidy database, no clear historical trend and no simple set of estimators that enable you to do fluent variable selection. The real business world is chaotic and filled with uncertainty. In my internship, I learnt to apply my theoretical knowledge in practical scenarios by combining it with business experience.

Things that happen every day have great effects on decisions in all businesses, from the retail industry to finance. For example, I learnt that a selling strategy in the DFI Retail Group’s industry needed to be designed according to the current market and to be adjusted periodically according to the reactions of our customers. Sometimes the statistical outcome may conflict with experience and the real situation. In these situations, competitiveness is maintained by knowing whether to rely on a statistical outcome or experience or a combination of both. For example, unlike in the academic world, in the business world, a statistical model with a 50% rate of successful prediction will not satisfy your supervisor. However, if we come up with a statistical model based on historical data and add details based on your experience, your model will have better performance.

You’ll never know how to solve problems or apply your novel ideas in real life unless you gain workplace experience. I obtained such knowledge during my internship at the DFI Retail Group, which served as a ‘workplace college’ for me.

Back to Issue
Table of Contents
1. Message from the Chair
2. Staff Movement
3. Prizes and Awards

Staff Awards
Alumni Award
Student Awards
Recipients of Department of Statistics Scholarships and Sponsorship
4. Departmental Activities

Visit to The Southern University of Science and Technology, Shenzhen
MSc Annual Dinner
Symposium on Data Science and Risk Analytics 2023 cum CUHK 60th Anniversary Alumni Homecoming
Science Faculty Distinguished Alumni Award - Engagement Session with Staff and Students
Delegation Visits from Mainland Institutions
Distinguished Lectures and Seminars in 2023-24
5. Sharing from Awardees of Overseas Research Award for PhD Students
6. Global Young Scientists Summit 2024
7. Exchange Sharing
8. Department Summer Internship Programme
9. Internship Sharing
 

Past Issue