Sitemap

A list of all the posts and pages found on the site. For you robots out there is an XML version available for digesting as well.

Pages

Posts

Future Blog Post

less than 1 minute read

Published:

This post will show up by default. To disable scheduling of future posts, edit config.yml and set future: false.

Blog Post number 4

less than 1 minute read

Published:

This is a sample blog post. Lorem ipsum I can’t remember the rest of lorem ipsum and don’t have an internet connection right now. Testing testing testing this blog post. Blog posts are cool.

Headings are cool

You can have many headings

Aren’t headings cool?

Blog Post number 3

less than 1 minute read

Published:

This is a sample blog post. Lorem ipsum I can’t remember the rest of lorem ipsum and don’t have an internet connection right now. Testing testing testing this blog post. Blog posts are cool.

Headings are cool

You can have many headings

Aren’t headings cool?

Blog Post number 2

less than 1 minute read

Published:

This is a sample blog post. Lorem ipsum I can’t remember the rest of lorem ipsum and don’t have an internet connection right now. Testing testing testing this blog post. Blog posts are cool.

Headings are cool

You can have many headings

Aren’t headings cool?

Blog Post number 1

less than 1 minute read

Published:

This is a sample blog post. Lorem ipsum I can’t remember the rest of lorem ipsum and don’t have an internet connection right now. Testing testing testing this blog post. Blog posts are cool.

Headings are cool

You can have many headings

Aren’t headings cool?

blog

intern

leisure

news

projects

publications

SimLOB: Learning Representations of Limited Order Book for Financial Market Simulation

Published in , 2024

We are the first to use LOB data to calibrate the PGPS model (previously, most studies relied on midprice). During the calibration process, to address the challenges of measuring similarity in time series data, we employed a representation learning approach. Specifically, we used a trained encoder to map the raw data into a latent space, where similarity metrics were computed for calibration. We also explored suitable encoder architectures, comparing contemporary common architectures and those frequently used in financial calibration, and designed a novel transformer-based architecture that significantly improved calibration performance.

Abstract:

Financial market simulation (FMS) serves as a promising tool for understanding market anomalies and the underlying trading behaviors. To ensure high-fidelity simulations, it is crucial to calibrate the FMS model for generating data closely resembling the observed market data. Previous efforts primarily focused on calibrating the mid-price data, leading to essential information loss of the market activities and thus biasing the calibrated model. The Limit Order Book (LOB) data is the fundamental data fully capturing the market micro-structure and is adopted by worldwide exchanges. However, LOB is not applicable to existing calibration objective functions due to its tabular structure not suitable for the vectorized input requirement. This paper proposes to explicitly learn the vectorized representations of LOB with a Transformer-based autoencoder. Then the latent vector, which captures the major information of LOB, can be applied for calibration. Extensive experiments show that the learned latent representation not only preserves the non-linear auto-correlation in the temporal axis, but the precedence between successive price levels of LOB. Besides, it is verified that the performance of the representation learning stage is consistent with the downstream calibration tasks. Thus, this work also progresses the FMS on LOB data, for the first time.

Download Paper

An Automatic and Speech-based Cross-Lingual Classification Framework for Early Screening of Cognitive Impairment

Published in , 2024

In this paper, we construct a novel framework that leverages several AI methods for automatically screening cognitive impairment (CI) based on the Cookie Theft picture description task with a multilingual dataset. It holds a high potential for clinical application in early AD detection as it’s fully automatic and has achieved high performance with 74% in accuracy and 75% in AUC in the external cross-lingual Chinese validation experiment, excels in distinguishing CI, and is beneficial for large-scale screening and self testing of CI, which will remind potential AD patients to undergo timely hospital-based examinations and therapies.

Abstract:

INTRODUCTION

The use of speech data for distinguishing cognitive impairment (CI) is efficient and convenient for early screening of potential AD. However, few studies have developed available automated frameworks with the external cross-lingual Chinese validation.

METHODS

This study utilized speech data from the Cookie Theft description task, employing the ADReSSo dataset and the local Chinese dataset of the STAR cohort. We constructed an automated framework for CI screening, leveraging AI methods, including ASR, LLMs, and multiple types of machine learning classifiers. We used datasets in multiple languages and addressed the issue of language inconsistency.

RESULTS

Our framework achieved 74% in accuracy and 75% in AUC in the external cross-lingual Chinese validation experiment. We conducted an ablation study to demonstrate the necessity of each module within the framework.

DISCUSSION

The proposed framework provides fully automated assessments in distinguishing CI, making it highly beneficial for large-scale early screening and self-testing.

You cen see our manuscript here

Download Paper

research