Stepping into a new environment always feels a little disorienting at first. I remember the new watch that felt strange on my wrist for days, or the restless first nights after moving to Canada when even the bed and the air felt foreign. Yet, more often than not, these new changes bring a wave of profound learning with them. This internship wasn't just a much-awaited experience—it was a much-needed catalyst in my PhD journey.

(I am so hesitant to use these double dashes — ChatGPT has misused them and now they seem to give away LLM vibes. But, this blog is all human my friends!)

As my PhD progressed, the rapid pace of LLM research felt less like a wave and more like a firehose. I went through technical reports, config files from HuggingFace models and then kind tutorials from the lord, Andrej Karpathy but I started feeling disconnected from the actual industrial progress.

And, thanks to the cosmic superpowers of MITACS approvals [if you know you know :)], my internship began at the right moment! I found myself in an environment that didn't just discuss these concepts, it lived them. The academic firehose was replaced by a team-based toolkit, complete with access to the kind of compute that opens up a world of learning in itself. The hesitation I felt from reading papers alone was quickly replaced by hands-on experience, and the real lessons began to take shape.

1. From Theory to Reproducibility

In my PhD, I believe was way more focused on Section 3, the Methodology, than Section 4, the Experiments. I would get frustrated with methods that ran expensive updates for marginal performance gains and made a point to include a computational analysis in my own work. Every paper I touched—every single one—began with the same step: reproduce their claims about efficiency, throughput, and loss; in my case the team is focused on efficiency. We weren't just checking if the loss numbers matched; we were chasing specific promises. If a Mamba-based paper claimed higher throughput, our job was to reproduce that exact claim on the same machinery, using the same metrics.

2. No Learning is Not Learning

While it's easy to feel disconnected from LLM production setup given that newer LLMs and related paradigms are churned out nearly every day, I learnt that any everything you learn today, even if you feel or it seems unrelated, might pop up in the future! In my reading group with my labmates, I volunteered to read this LinkedIn popular paper - "Were RNNs All We Needed?" a seemingly ancient idea in the age of Transformers. However to read that one paper I had to watch tons of lectures and read papers as well as blogs to understand from scratch the S4 and Mamba and then Mamba-2 papers (I love learning top-to-bottom btw). And yet, this is something that forms the foundation of my research work here. It was a profound lesson in the timeless value of foundational concepts.

3. From Overwhelmed to Systematic

Initially I felt the imposter syndrome hitting me hard. Surrounded by cool researchers, Surrounded by brilliant and experienced researchers, it was easy to feel like I could never match but what helped here honestly was talking to my peers - other visiting researchers as well as my manager (+ her 2 graceful cats hinting me to develop some sass!) and I learnt that it happens to most of us and yet we think it is happening only to them.

4. A New Perspective on Writing

I always take notes from the multiple sources I read for a research paper when approaching the concept top-to-bottom and then my inferences on those sources and opinions. I have also followed awesome blogs from Lilian Weng for example. But I never considered contributing a blog of my own. The thought of adding to the overwhelming volume of LLM content always made me hesitant. "What," I wondered, "could I possibly add that isn't already out there?"

But, I now feel that my notes, condensed down from my understanding through various sources - Stanford CS25 Lectures, Reading Groups on YouTube and Literature Review - could actually be a good starting point for others walking the same path. We often referred to such blogs during our reading groups.

5. How The Internship Re-wired My Research Mindset

An internship, I realized, is the essential bridge to an industrial career. While my PhD gave me a strong foundation on owning my projects, the internship offered me an opportunity to learn a complementary skillset. Working with a team on a common goal enables you to learn so much from different skillsets people bring on the table! Having access to that sort of compute with multiple nodes of H100s and learning all the efficiency buzzwords was an experience I truly appreciate (which hasn't ended yet btw, haha :)).

While this journey hasn't concluded, I wanted to put out this blog right halfway through the internship and I am only late by a couple of months 🙂

Better late than never?

I am deeply grateful to my mentors and team members; especially those who welcomed every question and supported me through each learning curve — scheduling multiple check-ins and always so upfront to help. Their generosity and wisdom have been invaluable, and even if I wrote a few more blogs sharing these insights, it would hardly capture the breadth of what I have learned. Each brought along a valuable input - excitement for novel exploration; systematic calm approach to tackle a task and questioning a solution from first principles leading to chain of thoughts which seek questions to solve in real world.

In the end, if you’re (still) reading this as another grad student like me, feeling a little overwhelmed by the pace of LLMs, I would like you to know: community, collaboration, and a learning mindset make all the difference.


This blog reflects my personal journey and learning experiences during my internship as a Visiting Researcher at ServiceNow Research - Foundation Models Lab.