My work laptop recently started randomly crashing on me. The warranty is long since passed, and I’m the IT department for my little company, so I’m on my own. The symptoms were “hiccups” in mouse movement, followed shortly by a total freeze, followed in 30 seconds by a panic/reboot. My first suspicion was a new video driver I’ve been trying. It’s beta software, so problems aren’t that unusual. In any case, the problem sequence was leaving no evidence in my system logs, so I had nothing useful to report to the developers.
My linux kernel is custom-compiled: although I have many debugging tools compiled-in, I didn’t have anything that could save the messages from my dying laptop. Yesterday, I took the time to dig around in the documentation, and created a new kernel with netconsole turned on. I configured it to send my console log to my office server. As luck would have it, my laptop crashed about two minutes after I turned the remote logging on. And the remote log worked.
Surprise! It wasn’t a driver error! My laptop’s dying messages were reporting corrupted transfers between my cpu and my memory chips. Hardware. For the specific failure, there are only three possibilities: bad cpu, bad memory, or bad motherboard. First, I opened the case and swapped the two memory chips. This appeared to help, as I didn’t have another crash for the rest of the day, nor overnight. (My linux install does the virus-scan for my Windows partition every night, ensuring that any virus that does get into my Windows box can’t modify the scanner.)
But I’m not out of the woods, as it did crash one more time today. I have memory chips on order, so I can definitively rule out memory issues. If that doesn’t work, I guess I’ll be shopping for a new laptop.
After the fold, I describe how I set up remote logging to accommodate my laptop’s road warrior use case.