A talk by Anna Connolly & Vanessa Yan. This talk will explore considerations around optimizing and preparing transformers and newly emerging large language models for deployment.Many companies struggle to productionize the cutting edge work of their data science and ML teams for two simple reasons: high inference costs and increased deployment complexity as applications scale up.Vanessa and Anna will discuss techniques MLOps teams are using to overcome both of these obstacles and how to begin extending these solutions to the LLMs just coming online.