Takuya Yoshioka @_ty274

Speech technology researcher/manager @AssemblyAI linkedin.com/in/ty274/ Bellevue, WA Joined November 2016

Tweets

912
Followers

546
Following

57
Likes

3K

Shyam Gollakota @ShyamGollakota

2 years ago

Want to hear a friend in a noisy café? We designed deep learning-based headphones that let you isolate the speech from a specific person just by *looking* at them for a few seconds. CHI'24 honorable mention award. Paper: arxiv.org/abs/2405.06289 Code: github.com/vb000/LookOnce…

15 48 275 120K 106

View Details

Jeff Dean @JeffDean

2 years ago

I got an early demo of this when I visited @uwcse a couple months ago and the ability to isolate sounds in your environment was pretty great. Nice work, @b_veluri, Malek Itani, Tuochao Chen, Takuya Yoshioka, and @ShyamGollakota!

Shyam Gollakota @ShyamGollakota

2 years ago

15 48 275 120K 106

9 33 361 103K 80

View Details

Takuya Yoshioka @_ty274

3 years ago

@JonathanLeRoux @IEEEsps @IEEEorg Congrats!

0 0 1 231 0

View Details

Shinji Watanabe @shinjiw_at_cmu

3 years ago

Hi all, please let me know if you know large-scale speech data that can be used for training our Whisper reproduction (OWSM) model (arxiv.org/abs/2309.13876). We plan to move to OWSM v4.

13 27 96 15K 25

View Details

Takuya Yoshioka @_ty274

3 years ago

The code and project page are here. Code: github.com/uw-x/AcousticS… Project page: acousticswarm.cs.washington.edu

0 0 1 436 0

View Details

Takuya Yoshioka @_ty274

3 years ago

Creating speech zones with self-distributing acoustic swarms Our latest paper in Nature Communications unveils distributed microphones based on an autonomous acoustic robotic swarm, creating "speech zones" in real-world settings. Paper: nature.com/articles/s4146…

1 15 47 7K 12

View Details

Takuya Yoshioka @_ty274

3 years ago

@rdesh26 Congrats!

0 0 2 237 0

View Details

Takuya Yoshioka @_ty274

3 years ago

Last Friday marked the end of my 7-year journey at Microsoft, filled with rewarding challenges, both in research & production, and incredible colleagues. I'll be starting something new very soon. マイクロソフトを退職しました。まだずっとシアトル界隈にいます。

3 5 43 6K 2

View Details

AK @_akhaliq

3 years ago

SpeechX: Neural Codec Language Model as a Versatile Speech Transformer paper page: huggingface.co/papers/2308.06… Recent advancements in generative speech models based on audio-text prompts have enabled remarkable innovations like high-quality zero-shot text-to-speech. However, existing models still face limitations in handling diverse audio-text speech generation tasks involving transforming input speech and processing audio captured in adverse acoustic conditions. This paper introduces SpeechX, a versatile speech generation model capable of zero-shot TTS and various speech transformation tasks, dealing with both clean and noisy signals. SpeechX combines neural codec language modeling with multi-task learning using task-dependent prompting, enabling unified and extensible modeling and providing a consistent way for leveraging textual input in speech enhancement and transformation tasks. Experimental results show SpeechX's efficacy in various tasks, including zero-shot TTS, noise suppression, target speaker extraction, speech removal, and speech editing with or without background noise, achieving comparable or superior performance to specialized models across tasks.

4 87 310 74K 136

View Details

Takuya Yoshioka @_ty274

3 years ago

SpeechX from our new paper is a single generative model that edits, enhances & creates speech, enabling zero-shot TTS, spoken content editing (while preserving ambience), speaker extraction & speech/noise removal. Demo: aka.ms/speechx Paper: arxiv.org/abs/2308.06873

0 16 72 6K 12

View Details

Jonathan Le Roux @JonathanLeRoux

3 years ago

To everyone booking their @IEEE_WASPAA trip: please consider attending #SANE2023, which will take place at NYU on Thursday October 26, the day after #WASPAA2023. Register at saneworkshop.org/sane2023/

IEEE WASPAA 2025 @IEEE_WASPAA

3 years ago

Dear #WASPAA2023 authors, the review results are out now. Please go ahead and check out at cmt3.research.microsoft.com/WASPAA2023/. We appreciate your precious contribution and kind interest regardless of the acceptance decision!

0 1 17 6K 0

0 7 21 4K 1

View Details

Desh Raj @rdesh26

3 years ago

@ieeeICASSP Are there poster printing facilities at/near the conference venue?

1 2 0 994 0

View Details

Takuya Yoshioka @_ty274

3 years ago

Real-time target sound extraction with waveformer (to appear in ICASSP). Joint work with UW researchers. Paper (updated): arxiv.org/abs/2211.02250 Demo: waveformer.cs.washington.edu Code (both causal and non-causal): github.com/vb000/Waveform…

1 26 143 36K 52

View Details

IEEE WASPAA 2025 @IEEE_WASPAA

3 years ago

WASPAA 2023 calls for papers! The traditional intimate Mohonk Mountain House with exciting changes: double-blind review, an unprecedented amount of travel grants, and more. More information: waspaa.com/call-for-paper… #waspaa2023

0 15 34 6K 1

View Details

Shinji Watanabe @shinjiw_at_cmu

3 years ago

すごい！世界最大1万9千時間の音声コーパスと高精度日本語音声認識モデルがオープンソースで公開 - 窓の杜 forest.watch.impress.co.jp/docs/news/1471… via @madonomori

0 8 29 2K 1

View Details

Takuya Yoshioka @_ty274

4 years ago

@shinjiw_at_cmu Congratulations, Watanabe-san!

0 0 1 0 0

View Details

IEEE ICASSP @ieeeICASSP

4 years ago

The #ICASSP2023 paper submission site is now open! Submit your papers by 19 October 2022 to be considered. Learn more about the paper guidelines and submission requirements here: hubs.la/Q01nmxt_0

0 5 20 0 0

View Details

Takuya Yoshioka @_ty274

4 years ago

@SamueleCornell Yep, conventional ASR models should be good for the headset recordings.

0 0 0 0 0

View Details

Takuya Yoshioka @_ty274

4 years ago

How can we do streaming multi-talker ASR by best combining speech separation and overlap-robust ASR? t-SOT-VA does that and works for real meeting audio with any # of mics, achieving the best published WERs of 13.7%/15.5% for AMI-MDM dev/eval. Paper: arxiv.org/abs/2209.04974

2 4 27 0 4

View Details

Takuya Yoshioka @_ty274

4 years ago

@SamueleCornell Good question! We focused on the distant mic setup and didn't do headset experiments in such a way that the distant-mic vs. headset numbers can be directly compared. Let us consider how to do the experiment and report the additional result.