About
I am an Assistant Professor in Harvard John A. Paulson School of Engineering and Applied Sciences. I received my Ph.D. in Computer Science from Carnegie Mellon University. I am broadly interested in computer systems with particular focus on workload analysis, efficient, reliable and sustainable storage and machine learning systems. I perform in-depth measurement and analysis to get deep understanding of systems and algorithms in the real world. Leveraging insights from measurements, we design and build the next gen storage systems and distributed machine learning systems.
My works have received best-paper awards at NSDI'24, NSDI'21, SOSP'21, VALUETOOLS'24, and SYSTOR'16 and have been deployed in production at Google, VMware, Twitter, Redpanda, Momento with many open-source libraries contributed by the community. My research has been sponsored by Meta, Google Cloud, and AWS. I am a 2020 Meta Fellow, a 2023 Google Cloud Research Innovator, and a 2023 Rising Star in Machine Learning and Systems.
News [more]
I am looking for highly motivated students to join my lab.
Please read this page
if you are interested in working with me or asking for a recommendation letter.
Research Areas and Interests
Storage systems and machine learning systems with a focus on efficiency, scalability and robustness:
- Efficient and scalable cache management systems
- Robust and reliable cache/storage management and machine learning systems [OSDI'20][NSDI'22][VLDB'23]
- New approaches to make machine learning practical for storage systems (machine learning for systems) [FAST'23][SOCC'17]
- Performance optimization and sustainability of microservices and serverless architecture [SOCC'23]
- Reliable large model inference on wimpy hardware (system for machine learning)
Research Highlights
- SIEVE (NSDI'24): the first cache eviction algorithm simpler than LRU but yet more effective than state-of-the-art algorithms for web caches. Adopted by software and systems such as Android API, BIND 9, ImmuDB, TiDB, PostgREST Implemented in many open-source libraries, e.g., Golang, Python, JavaScript, Rust, Java, Swift, Ruby, Nim, and Zig. Find more details here.
- S3-FIFO (SOSP'23): a simple and scalable cache eviction algorithm composed of only FIFO queues. Implemented or deployed at companies including Google, VMware and Redpanda, and many open-source libraries. Find more details here.
- Segcache (NSDI'21): received a community best-paper award, and deployed at Twitter and Momento.
Bio
Juncheng Yang is an Assistant Professor in Harvard John A. Paulson School of Engineering and Applied Sciences. He received his Ph.D. in Computer Science from Carnegie Mellon University in 2024. His research interests broadly cover the efficiency, performance, reliability, and sustainability of large-scale data systems.
Juncheng's works have received best paper awards at VALUETOOLS'24, NSDI'24, NSDI'21, SOSP'21, and SYSTOR'16. His OSDI'20 paper was recognized as one of the best storage papers at the conference and invited to ACM TOS'21. Juncheng received a Facebook Ph.D. Fellowship in 2020, was recognized as a Rising Star in machine learning and systems in 2023, and a Google Cloud Research Innovator in 2023.
His work, Segcache, has been adopted for production at Twitter and Momento. The two eviction algorithms he designed (S3-FIFO, SIEVE) have been adopted for production at Google, VMware, Redpanda, and several others, with over 20 open-source libraries available on GitHub. Moreover, the open-source cache simulation library he created, libCacheSim, has been used by almost 100 research institutes and companies.