Adaptify SEO
Featured

Vibe Coder (Full-Stack AI/SEO) at Adaptify SEO

USD40,000+ • Remote (Worldwide)

C

AI Engineer- Speech/Audio

Centific · Washington, United States

Full-timeLeadPython

🔥7 people viewed this job

About the Role

About Centific AI ResearchCentific AI Research is at the forefront of developing cutting-edge AI solutions that bridge the gap between research innovation and real-world applications. Our team of scientists and engineers work collaboratively to create impactful technologies across speech, audio, and multimodal AI domains. We are committed to building responsible AI systems that deliver measurable impact while maintaining the highest standards of research quality. About the roleWe are seeking an AI Engineer: Speech/Audio to join our growing team and drive innovation in next-generation audio AI technologies. This role focuses on Large Audio Language Models (LALMs), Large Audio Reasoning Models, and Speech-to-Speech (S2S) systems that can understand, reason over, and generate audio with human-like capabilities.You will work at the intersection of cutting-edge research and production systems, developing Spoken Language Models (SLMs) that perform complex audio reasoning and engage in natural speech-based interactions. This position offers the opportunity to shape our technical direction in audio-native AI while collaborating with world-class researchers and engineers. Key Responsibilities • Design, develop, and deploy Large Audio Language Models (LALMs) capable of native audio understanding, reasoning, and generation. • Build Large Audio Reasoning Models that perform complex chain-of-thought reasoning over speech and audio inputs, including medical, technical, and conversational domains. • Contribute to Speech-to-Speech (S2S) system development, including speech understanding, dialogue management, and speech synthesis components. • Research and implement alignment mechanisms between speech encoders and LLM backbones using lightweight adapters, LoRA, and efficient fine-tuning strategies. • Design efficient speech tokenization and temporal compression techniques suitable for long-form audio reasoning and multi-turn spoken dialogue. • Build comprehensive evaluation frameworks for audio reasoning capabilities, including benchmarks for speech QA, audio understanding, and reasoning accuracy. • Optimize inference pipelines for low-latency, streaming applications in speech systems. • Collaborate with cross-functional teams to transfer research innovations into production systems and customer-facing applications. • Contribute to technical documentation, research write-ups, and publications at top-tier venues (NeurIPS, ICML, ACL, Interspeech). Minimum Qualifications • Master's degree (required) or Ph.D. (preferred) in Computer Science, Electrical Engineering, or a related field with a focus on speech, audio ML, or multimodal learning. • 2+ years of industry or applied research experience in speech/audio AI, Large Language Models, or multimodal systems. • Demonstrated applied research contributions through publications, patents, or shipped products in speech/audio AI or LLMs. • Strong proficiency in Python and PyTorch, with hands-on experience in GPU-accelerated training for large-scale models. • Solid understanding of speech and audio signal processing, acoustic modeling, and audio representations. • Working knowledge of modern LLM architectures (Transformers, SSMs) and training paradigms including instruction tuning and alignment methods. • Familiarity with modality alignment techniques: adapter-based integration, cross-modal attention, or audio-text fusion methods. • Strong experimentation habits: clean code, systematic ablations, reproducibility, and clear technical communication. Preferred Qualifications • Publication record at top-tier venues (NeurIPS, ICML, ICLR, ACL, Interspeech, ICASSP) in audio language models, speech reasoning, or multimodal learning. • Hands-on experience building or fine-tuning Large Audio Language Models (e.g., Qwen-Audio, SALMONN, LTU, Gemini Audio). • Experience with speech representation pretraining (HuBERT, Wav2Vec 2.0, Whisper, WavLM) and discrete speech tokenization. • Familiarity with Speech-to-Speech components: neural audio codecs (EnCodec, SoundStream), vocoders, or speech synthesis systems. • Experience with audio reasoning benchmarks (AIR-Bench, MMAU, AudioBench) or building evaluation harnesses for audio QA. • Hands-on experience with distributed training (FSDP, DeepSpeed) and inference optimization (ONNX, TensorRT, quantization). • Familiarity with speech frameworks such as ESPnet, SpeechBrain, NVIDIA NeMo, or Fairseq. • Experience with multilingual speech systems, code-switching, or domain adaptation for specialized applications (medical, legal, technical). • Background in evaluating safety, bias, hallucination, or adversarial robustness in audio language models. Technical Environment • Core: PyTorch, CUDA, torchaudio/librosa, Hugging Face Transformers • LLM Stack: Large language model backbones, lightweight adapters (LoRA, Q-Former), instruction tuning pipelines • Audio Models: Neural audio codecs, speech encoders, vocoders, discrete speech tokenizers • Infrastructure: Modern

💬 Developer Questions

Ask the team a question — answers show up here

🎯

What does the interview process look like?

🤖

What AI/vibe coding tools does the team use daily?

👥

How big is the engineering team?

Is the team fully async or are there required meetings?

🚀

What does onboarding look like for remote hires?

🔧

Can you share more about the tech stack and architecture?

📈

What does career growth look like in this role?

📅

What does a typical day look like?

💰

Is there a salary range you can share?

📊

Is equity or stock options part of the package?

🌍

Are there timezone requirements or preferences?

🛂

Do you sponsor work visas?

🏢 Is this your listing? Claim it to answer questions

Similar Jobs

Helpful resources

Hiring for a similar role? Post your job here — it's free →