Linux-Watch
      . . . keeping an eye on the penguin   
Home  |  News  |  Forum  |  Blogs  |  Videos  |  Resource Library

Keywords: Match:
My Great Linux System Repair Adventure
May 18, 2007

Thunder storms in the Blue Ridge Mountains can come fast. That's why my main Linux desktop system was still up when one, two, three lightning bolts slammed near my home. Thus began my Great Linux System Repair Adventure.

Spread the word:
digg this story
Despite no fewer than three power surge protectors, including a master power protector for the entire house, just enough of a surge hit my Insignia 300a, an older Best Buy house-brand desktop PC with a 2.8GHz Pentium IV, GB of RAM, and an Ultra ATA/100 60GB hard drive, running SLED 10 (SUSE Linux Enterprise Desktop).

At first, everything looked OK. Then I began getting odd disk errors and programs started misbehaving. So, I used that master tool of all Linux/Unix file repair, fsck, to see what was wrong with my drive.

It wasn't pretty. There were file system errors here, errors there, errors just about everywhere. I use the ReiserFS, because I really like its speed and space performance. On the other hand, when things go badly wrong, getting the ReiserFS fsck file tree rebuild to work properly can be very tricky.

For simple file problems, ReiserFS restores the file system by replaying its transaction log journal. That's as it should be. The whole point of a journaling file system is that you can replay disk writes when something goes wrong.

I was well beyond simple problems, though. It was time to unmount the file system -- reiserfsck won't repair mounted systems -- and get serious. So, I ran the command:
    reiserfsck  -- rebuild-tree
This option forces reiserfsck to, just like it says; rebuild its b-tree map of the file system.

It was humming along, when bang, it stopped. OK, this can happen. Maybe there's a bad block. The simple-minded way to find out, without reaching for another tool, is to simply try the command again. If it breaks at the same spot, I've got bad blocks, actual sectors of the hard drive that can't hold data reliably.

If that's the case, I'd run:
    /sbin/badblocks [-b (reiserfs-block-size)] device
to get a list of bad blocks that reiserfsck can understand. After that, I'd run dd_rescue to create a backup of the file system without the bad blocks. Yes, you can try this with other tools -- dd comes to mind immediately -- but dd_rescue, unlike dd, doesn't abort on errors. I could program around that, but dd_rescue does such a good job; so, why bother?

Unfortunately, reiserfsck blew up at a new location... farther along in the rebuilding. OK, so it wasn't a bad block. This required some thought.

I decided to boot the system back up and see what the system looked like from my KDE 3.5 interface. From there, I planned on backing up my system, which I hadn't done in a week, with KDar, the KDE disk archiver to a DVD-R. Unfortunately, I didn't make it that far.

The system got about halfway to the desktop when the boot process failed. OK, now I was getting ticked. It was time to return to single user mode and the command line.

It was also time to get hard-core serious about this misbehaving drive. This time I ran:
    reiserfsck  -- rebuild-tree -S
This forces the b-tree to be rebuilt from any part of the directory and file system or b-tree leaves that may be lying anywhere on the partition. Unless you really -- I mean really -- know how file systems work, don't try this home. Go to a friend's house. It will be much safer there.

No, don't. That was meant to be funny. Don't try this anywhere, unless you really know what you're doing.

Believe it or not, I do know file system internals so I ran it... and I got most of the way through when the process stopped and I got the message:
    The problem has occurred looks like a hardware problem (perhaps memory).
Oh no. Could the memory also be sour? The hard drive was fouled up, no question about that, but remember, I'd also seen strange problems with applications. That, I now remembered, is often a sign of bad memory.

I ran the command again. Yes, there was the same error message, but at a different point in the repair process. This was looking more and more like I actually had two problems.

So, I got up, turned off the system, and went to have lunch. When I came back, I turned the PC back on... and it wouldn't boot at all.

This was turning into a really bad day.

So, now I pulled out my freshly burned copy of SystemRescueCd 0.35. SystemRescueCd, if you've never met it, is the best single CD bootable system repair disk I know.

This special purpose Linux distribution is based on the 2.6.20.7 Linux kernel. It includes:
  • GParted, a top-notch partition manager
  • PartImage, a great drive/partition imager tool
  • NTFS3 an open-source program that enables you to mount and read and write to a Windows NTFS
...and a host of file system repair tools and drivers. It also includes -- and for me this is the cherry on top of the Sundae -- network file tools like Samba and NFS. With those, you can send files from a near-dead machine to a network server for safe keeping.

So, I popped in SystemRescueCD, and, with its small memory footprint of 128MB, it appeared to load fine. This time I ran reiserfsck from SystemRescueCD and... it failed with a memory error, again. This time, at least, it almost completed the run.

OK, it was time to play with the hardware. When memory is going bad, you can sometimes keep it going for a while longer by slowing it down.

Normally, people only play with memory settings when they're trying to turbo-charge a gaming system or the like. The same techniques, applied in reverse, can sometimes get some useful life from sick systems like mine.

Now, playing tricks with RAM is a subject unto itself. For more on that subject, visit sites like Extreme Tech and Tom's Hardware and look for stories on overclocking.

I was going the other way; I was going to "underclock" my system's memory. To do this, I went to my PC's advanced BIOS section. For my purposes, I started with slowing down the CAS (column address strobe) latency. This setting determines how many clock cycles the system waits before issuing a CAS signal and outputting data from the memory chip. A higher value means more waiting, therefore a slower computer, and a bit more memory reliability.

After setting this up, I rebooted again with SystemRescueCD, ran reiserfsck with all the trimmings, and this time it worked. I once more had a viable file system.

Now, my problem was how to get the important files out of there before something else went wrong. Trying to repair the system was a task for another day. Today, I just wanted my files safe, snug and well away from that machine.

My new problem, though, was that my important files, in /home/sjvn, came to a whopping 22 gigabytes. Yes, I'm a file and email packrat.

22 gigabytes is way too much for burning to a DVD or a USB stick. For the first time, I found myself wishing for a Blu Ray disc burner on a PC. Even over my 100Mbps Fast Ethernet connection, I really didn't want to waste time sending all that data.

The solution was clearly to compress my files down and put them into a more conveniently sized archive for shipping across the network. Linux is full of tools to do that, but tar, that old faithful, was the first program that came to mind.

So, I mounted the repaired partition, headed over to the /home/sjvn, and zapped a lot of junk files with "rm." Then I hopped back up to the /home directory and ran:
    tar cvzf sjvn/sjvnhomedir.tar.gz sjvn
This created the compressed archive "sjvnhomedir.tar.gz" in /home/sjvn. The tar options were the basics: "c" for create; "v" for verbose (I wanted to know what was going on); "z" for compress files with gzip; and "f" to give the archive its name.

Now, I was left with only one final step: getting my important files, now zipped up in "sjvnhomedir.tar.gz," to a healthy computer. I decided once more to go with easy, over other alternatives.

This time, that meant setting up an SSH (secure shell) server on the sick machine. To do this, I had to give the machine a root password; anything will do. Then, log in with it and run:
    /etc/init.d/sshd start
That starts up the SSH server. And that was the last thing I had to do on that system.

I then moved to another Linux system. In my case, that just mean I used my IOGEAR KVM (keyboard, video, and mouse) switch to click over to the MEPIS 6.5 system sitting right next to the sick SLED 10 box.

Once logged in on the MEPIS PC, I logged into the SLED system's SSH server as root, and moved to the /home/sjvn directory. Once there, I used scp (secure copy) to copy sjvnhomedir.tar.gz to my MEPIS system, like so:
    scp sjvnhomedir.tar.gz sjvn@MEPIS:
At long last, I was done. I had my files safely stored away.

Today, the sick PC is back to working, albeit at a slower speed. I don't trust it as a front-line system, so I replaced it with an HP Pavilion a6040n. That PC is now my main SLED system. On it, safe and sound, is every file I rescued from the sick computer.

My point in telling you of my misadventure is that, with a little knowledge and Linux tools, which SystemRescueCD brings together for you, you can save your files even from apparently hopeless situations.

Oh, and a final note: SystemRescueCD can also work the same magic on your Windows systems. I can't recommend this mini-distribution enough for anyone who might face repairing any Unix, Linux, or Windows-based computer.


-- Steven J. Vaughan-Nichols



Do you have comments on this story?


Talkback here

NOTE: Please post your comments regarding our articles using the above link. Be sure to use this article's title as the "Subject" in your posts. Before you create a new thread, please check to see if a discussion thread is already running on the article you plan to comment on. Thanks!



(Click here for further information)


7 Advantages of D2D Backup
For decades, tape has been the backup medium of choice. But, now, disk-to-disk (D2D) backup is gaining in favor. Learn why you should make the move in this whitepaper.

4 Legal Reasons to Control Internet Access
The Internet is obviously a valuable resource for many organizations. However, many are exposed to legal liability concerns because they fail to control Internet access. Learn if you're safe in this white paper.

Rapidly Resolve J2EE Application Problems
Whether you are in the process of building J2EE applications or have J2EE applications already running in production, you must ensure that they deliver the expected ROI. Learn how in this white paper.

Load Testing 2.0 for Web 2.0
There are many unknowns in stress testing Web 2.0 applications. Find out how to test the performance of Web 2.0 in this white paper.

Build Better Games Online
For the game infrastructure providers, life is complex. Making money from games has become more complicated. Why? Find out in this white paper.

Building a Virtual Infrastructure from Servers to Storage
This white paper discusses the virtual storage solutions that reduce cost, increase storage utilization, and address the challenges of backing up and restoring Server environments.

Gaining Faster Wireless Connections with WiMAX
Welcome to what is quickly becoming the hyperconnected world where anything that would benefit from being connected to the network will be connected. Learn more in this white paper.

Is Your Desktop a Security Threat?
The new wave of sophisticated crimeware not only targets specific companies, but also targets desktops and laptops as backdoor entryways into those business’ operations and resources. Learn how to stay safe in this white paper.

Increasing SAN Reliability by 100 Percent
Storage area networks (SAN) are a strong part of storage plans. Learn how to increase your reliability and uptime by 100 percent in this case study.

 



Got a HOT tip?   please tell us!

ADVERTISEMENT
(Advertise here)

Latest Linux-Watch Posts

• Amid controversy, Microsoft launches open source foundation
• As open source surges, Microsoft admits Linux threat
• Open source lobbying group emerges
• Open source Linux device drivers submitted by -- Microsoft?
• Google names Chrome OS partners
• Google's new OS marries Linux and Chrome
• Debian plans draw sharp warning from GNU guru
• OpenSource World announces keynote speakers
• Linux 2.6.30 gets new filesystems
• Intel to buy Wind River for $884 million
More Linux-Watch posts

DesktopLinux headlines:
• ABI's Jeffrey Orr on rising Linux netbook sales
• Moblin v2.1 goes beta, adds 3G support
• Linux owns 32 percent of netbook market, says study
• Skype working on open source VoIP UI
• Ubuntu 9.10 final ships as IBM spins Ubuntu-based cloud distro
• CentOS rev's to version 5.4, tries on KVM
• Fedora 12 optimized for Atom-powered netbooks
• Puppy Linux 4.3 gains bugfix, rave reviews
• Hulu comes to Linux
• Reviews praise Ubuntu 9.10, knock Ubuntu Moblin Remix
More DesktopLinux news

LinuxDevices headlines:
More LinuxDevices news

Dev Shed Dev Shed
Powered By Dev Shed


Linux conquers smartphones!

...read all about 'em

Visit the...



news feed

Home  |  News  |  Forum  |  About  |  Contact
 

Ziff Davis Enterprise Home | Contact Us | Advertise | Link to Us | Reprints | Magazine Subscriptions | Newsletters
Tech RSS Feeds | White Papers | ROI Calculators | Tech Podcasts | Tech Video | VARs | Channel News

Baseline | Careers | Channel Insider | CIO Insight | DesktopLinux | DeviceForge | DevSource | eSeminars |
eWEEK | Enterprise Network Security | LinuxDevices | Linux Watch | Microsoft Watch | Mid-market | Networking | PDF Zone |
Publish | Security IT Hub | Strategic Partner | Web Buyer's Guide | Windows for Devices

Developer Shed | Dev Shed | ASP Free | Dev Articles | Dev Hardware | SEO Chat | Tutorialized | Scripts |
Code Walkers | Web Hosters | Dev Mechanic | Dev Archives | igrep

Use of this site is governed by our Terms of Service and Privacy Policy. Except where otherwise specified, the contents of this site are copyright © 1999-2009 Ziff Davis Enterprise Holdings Inc. All Rights Reserved. Reproduction in whole or in part in any form or medium without express written permission of Ziff Davis Enterprise is prohibited. Linux is a registered trademark of Linus Torvalds. All other marks are the property of their respective owners.