Fragmented hard drive, Ubuntu. An issue?

Discussion in 'General Audio Discussion' started by AllanMarcus, Feb 7, 2022.

  1. AllanMarcus

    AllanMarcus Friend

    Pyrate
    Joined:
    Oct 23, 2015
    Likes Received:
    2,969
    Trophy Points:
    113
    Location:
    Los Alamos, NM
    Home Page:
    Hello,

    Background (not needed to answer my question, but some might find it useful)
    I keep my media on a spinning hard drive (8TB, ext4) on my Ubuntu server. I have two back up 8TB drives that I rsync to nightly. Main drive to BU1, then BU1 to BU2. That give me three days of snapshots (live plus two days) Everything also gets Rcloned to my Google Drive. on my various clients, I use Plex, iTunes, and JRiver, depending on my state of mind. I manage the media in Plex (using Picard prior to import), then I have some python and VB scripts to move the ratings and metadata to iTunes, and JRiver has a menu item to import from iTunes. Yes, this is convoluted, but I like the access I get from Plex, the syncing to my idevices from iTunes, and DSP I get with JRiver.

    OK, on the real question. I run an fsck on the disks nightly and see very high level of non-contiguous space. 23%, 27%, and 37% respectively. I know I can defrag by simple wiping the disks and rsycning again. My question is: is it worth the time? Is there any other negative effect of fragmentation other than the drive running a little slower than they could run?

    Thanks.
     
  2. Lightbulb Sun

    Lightbulb Sun Friend

    Pyrate
    Joined:
    Dec 17, 2015
    Likes Received:
    45
    Trophy Points:
    18
    Location:
    United States of 'murica. F*ck yeah!
    Which tool(s) are you using to determine fragmentation? Are those percentages non-contig free space?

    In my opinion, online defrag for any of your existing spinning disks is not a priority if your use case is mostly large files (as is wont to be when you're dealing with lossless music, videos, and so on. I went out on a limb and assumed that's your case). Ext4 already decently tries to keep file content contiguous.

    You're already creating (and testing through use) multiple backups, which is far higher priority.
     
  3. Thad E Ginathom

    Thad E Ginathom Friend

    Pyrate
    Joined:
    Sep 27, 2015
    Likes Received:
    14,253
    Trophy Points:
    113
    Location:
    India
    Why? If it is just for nerdsmanship, then fine, but it really should not be necessary.
    By what yardstick are you judging that to be "very high?"

    Just curious about those questions.

    Here's my answer as as ex-techie, as in may be twenty years out of date! As a then Unix systems manager and a now Linux user, I have never given a toss about how fragmented my disc space is. Unix/Linux handles it: let it, and don't worry!

    As you are using rsync, I'm guessing many files on your disks are going to remain the same: rsync will only overwrite files that have changed (depending on your settings of course). With stuff like music archives, write-once/read-many stuff, most of it won't.
    Wait... are these on the same server? That would not count as backups.
     
  4. Armaegis

    Armaegis Friend

    Pyrate BWC
    Joined:
    Sep 27, 2015
    Likes Received:
    7,542
    Trophy Points:
    113
    Location:
    Winnipeg
    Tangential newbie question: why not something like a NAS/RAID instead of a bulk copy onto separate disks?
     
  5. Thad E Ginathom

    Thad E Ginathom Friend

    Pyrate
    Joined:
    Sep 27, 2015
    Likes Received:
    14,253
    Trophy Points:
    113
    Location:
    India
    Because RAID is not a backup. ISTR that the technical lingo is something like, availability, not security.

    The reasoning behind this is similar to multiple copies on the same machine not being a backup. Yes, if you screw up one file, you can immediately get it back from your copy; if one disc in a raid system fails, it can (if configured to do this) rebuild the contents of that disk, but these are both single systems. Power surges, fire, lightning strikes. thieves, etc rarely pick on single discs or files: they take the whole system out.

    That is why Backup Disc 1 should not even be connected to your PC except when in use, and Backup Disc 2 should not even be in the same building.

    I'm talking about the barest minimum data security for home use, here. Absolutely insufficient for business use. Computer and data security (and risk!), along with regulatory requirements has moved centuries in the two decades since I left the business behind.
     
    • Like Like x 4
    • Epic Epic x 2
    • List
  6. supertransformingdhruv

    supertransformingdhruv Almost "Made"

    Contributor
    Joined:
    Mar 21, 2018
    Likes Received:
    595
    Trophy Points:
    93
    Location:
    DCish
    You don't need to do this much work!

    You can also defrag an ext4 filesystem is with e4defrag, which is part of the e2fsprogs package & probably already installed on your system. You can use the -c option to get a fragmentation score to get a sense if your filesystem actually needs defragging. Generically, ext4 is really resistant to fragmentation and I'd trust e4defrag over fsck when deciding if you need to worry about it as it's got a more ext4-centric view on your situation. That said, it's definitely possible to find yourself in a situation where you can improve performance by defragging.

    Man here: http://manpages.ubuntu.com/manpages/jammy/man8/e4defrag.8.html

    I'm not saying you're fscking too much but that seems like a lot to me? I think Red Hat defaults to something like once every 180 days or 32 boots, (whichever comes first) on RHEL. If you're concerned about drive failures and the like, doing daily or weekly SMART short tests might get you better metrics about your HDD health mixed in with the occasional fsck for a filesystem checkup. Or, you know, do what's working for you. If the nightly fsck catches stuff for you & your drive isn't bad, I guess it's working.

    Non-contiguous doesn't necessarily mean fragmented! But also it can. Don't know how interested you are in this, but there's a famous old presentation on ext4 performance that has a lot of background about ext4 fragmentation and the impacts of defragging. The unsatisfying tl;dr is basically that depending on your situation, you may have anything from no impact whatsoever to significantly worse read performance.
     
    • Like Like x 3
    • Epic Epic x 2
    • List
  7. Thad E Ginathom

    Thad E Ginathom Friend

    Pyrate
    Joined:
    Sep 27, 2015
    Likes Received:
    14,253
    Trophy Points:
    113
    Location:
    India
    Nice: a deeper answer from @supertransformingdhruv.

    If I use fsck at all, manually, something has already gone wrong! Tune your file systems to run it as he suggests. I forget the command: he probably knows. The last couple of times that I was fscking with a consenting file system more than once it was an indication of underlying disk problem

    It is a good point that fragmentation of free space may be of more consequence than fragmentation of files. But I'm still of a mind to let the system get on with it. Frankly, without deep knowledge, the extent to which Linux allows one to mess with one's system can cause more harm than good.

    One more thought: on backup drives, performance is not such an issue. It hopefully doesn't matter too much if your restore takes five seconds or seven, because, hopefully, you don't do it often!

    But I'm still not worrying much about it. If I replace a disk, or move a partition, then (I usually don't use a block-level method) I regard the "tidying-up" with nerdish satisfaction but not a big deal.

    Solid state drives might be a different issue. I'm still a newbie to them. I have one, which has /, /home, and my current year of photos. Photos means quite a lot of writing, rewriting and deletion, after which they sit there for ever (or a year in that location for me). 2022 meant, do I delete the partition and start again, or just delete the files. I did the latter, and I'd say it is not quite as lightening fast as it was. Does the fact that the mount point for the directory is now on a slower hdd make a difference (WD green instead of black)? I don't know. It's still pretty fast! The way I'm using ssd might mean sub-optimal life. It's doing a job, that's my requirement.

    By the way: rsync probably my favourite tool. I do a tar backup of system files, but I rsync my data. Scripted or GUID grsync. So much easier to examine or restore individual files. And it can be interrupted without a problem: it won't recopy all the files it did already when started again.

    PS... e4defrag... Wow, very interesting! Now I'm going to nerdishly check all my file systems. My photos have only five fragmented files. But then that file system was moved (rsync) to a new disk recently.
     
    Last edited: Feb 7, 2022
  8. AllanMarcus

    AllanMarcus Friend

    Pyrate
    Joined:
    Oct 23, 2015
    Likes Received:
    2,969
    Trophy Points:
    113
    Location:
    Los Alamos, NM
    Home Page:
    Thanks for the info. Great stuff. To address a few points:
    1. My non-contiguous numbers come from fsck.
    2. Why do I fsck nightly? I was not doing that, and for whatever reason, corruption crept into one or more of the directories. Maybe because I also use Plex, which maintains over a million files in its own directory, but over time I would get directory errors (sorry, can't remember the exact errors), and eventual failure. Not SMART failures; director failures. I know a few of you will say there is some other problem, and maybe there is, but I cannot find it. I completely rebuilt the OS drive at one point, and the failures came back eventually. I don't do anything else out of the ordinary, but running fsck regularly seems to help. I could probably get away with running it monthly, but it's in the nightly script, and I see no harm in running it nightly. Since I started with the nightly fsck's there have been no issues for over 18 months.
    3. Why the scheme I chose of 1 disk with a local 1 day mirror, a local 2 day mirror, and a Google Drive clone? I originally used RAID 5 with 3 8TB disks for redundancy, and rebuilds took forever. At one point I lost 2 disks at once, and that wasn't fun (I had another backup to restore from). I switched to the scheme I outlined above which provides for plenty of local and of-site redundancy for my purposes. Note, this is just my media server. My PC with financial and photo data is backed up to an internal drive on the PC (Macrium Reflect), an internal drive on the server (Macrium Reflect), and two different external drives (one with robocopy and one with Macrium Reflect), and off-site to a Google Drive mirror. I used to keep one external drive off-site at any given time (at my office), but now I work from home, so it isn't easy to keep one drive off site. At least I can grab it easily. I have had to evacuate from my home twice in 25 years due to fires near my town, so I understand the benefits of off-site storage.
    4. I looked at e4defrag and was considering it. I will look again and get a score.
    5. "Non-contiguous doesn't necessarily mean fragmented! " - ok, that's quite interesting. I will read up on that.
    6. 23%, 27%, and 37% seems high because my other set of 8TB drives (I have two sets) have <1% each, so respectively, 23-37% is high.
    Again, thanks for the advice and conversation.
     
  9. supertransformingdhruv

    supertransformingdhruv Almost "Made"

    Contributor
    Joined:
    Mar 21, 2018
    Likes Received:
    595
    Trophy Points:
    93
    Location:
    DCish
    If it works, it works.
     
  10. Thad E Ginathom

    Thad E Ginathom Friend

    Pyrate
    Joined:
    Sep 27, 2015
    Likes Received:
    14,253
    Trophy Points:
    113
    Location:
    India
    Totally agree. Especially as @AllanMarcus is actually checking the output. Not checking the output can lead to stuff going badly wrong.

    Non-atheists should always include something like the following in their prayers:

    Oh Deity, Grant me that one last time the disk spins up and is readable so I can get the data off it before its final demise.

    I got that final chance once: actually, the disk had stopped. I put the machine on its side and thought I'd have one more go, and it span. For the final final time.

    Oh Deity, grant me the knowledge, wisdom and sense not to get to the point of needing that last chance!

    Ahh, it must be good having gods. Trouble is, they often don't deliver! :(

    Anyway, the whole backup system sounds very thorough.

    Thank you Deity, for taking care of Brother @AllanMarcus so that, should his data stumble upon a brick you will cause him to lie down in some pastures or other whilst it is safely restored...

    Oh wait. I'm an atheist
    :sail:
     
    Last edited: Feb 9, 2022
  11. fastfwd

    fastfwd Friend

    Pyrate
    Joined:
    Aug 29, 2019
    Likes Received:
    1,010
    Trophy Points:
    93
    Location:
    Silicon Valley
    You're already familiar with rsync. so you might want to look into rsnapshot for hourly/daily/weekly/monthly mirrors. You can get a LOT more than two snapshots on those backup drives.
    You weren't asking about backup strategies, so I'll just say that that seems like a recipe for inconsistent backup state and/or for quickly propagating any ransomware on your PC to all your backups.
     
  12. Gazny

    Gazny MOT: ETA Audio

    Pyrate Contributor
    Joined:
    May 11, 2020
    Likes Received:
    2,224
    Trophy Points:
    93
    Location:
    open sky
    Sounds like a dying drive, what does the S.M.A.R.T. data suggest? and heat induced or electric/sock failures you could pin it down to?
     
  13. AllanMarcus

    AllanMarcus Friend

    Pyrate
    Joined:
    Oct 23, 2015
    Likes Received:
    2,969
    Trophy Points:
    113
    Location:
    Los Alamos, NM
    Home Page:
    Not worried about ransomware. Internal drive back up is daily. Two externals are weekly, and they are disconnected except when used for back up. Server backup is weekly and it’s unmounted when not being used for backup. Reflect uses permissions to protect against ransomeare (as well as they can be used). Most of my files are google docs, which, as you know, aren't even files. I cannot thin

    Smart tests are fine. The issue eventually reappears even on new drives. Not a heat issue. It’s resolved with the fsck checks, so I’m not worried about it

    I was just worried about the non-contiguous space being reported by fsck. Thanks folks.
     
    • Like Like x 1
    • Epic Epic x 1
    • List
    Last edited: Feb 9, 2022
  14. Metro

    Metro Friend

    Pyrate
    Joined:
    Dec 27, 2016
    Likes Received:
    1,600
    Trophy Points:
    93
    Location:
    San Francisco
    The developers must have had a good laugh when they were naming the command.
     
  15. Thad E Ginathom

    Thad E Ginathom Friend

    Pyrate
    Joined:
    Sep 27, 2015
    Likes Received:
    14,253
    Trophy Points:
    113
    Location:
    India
    One of the things that I always liked about Unix was that the developers were human, there is a touch of humour in amongst the deep techie stuff, and they were absolutely not full of their corporate selves like MS.
    Sounds more like warning, or even just information. Fragmentation is not an error, it is just something that happens, especially if files change frequently.

    When you see stuff about things not matching, or missing, or "wrong value in blah blah" you can be a bit worried. And even then, a one-off that gets corrected is probably fine. Repeated errors from fsck is a sign to worry: the file systems are too good to be getting stuff wrong over and over. And things have improved: time was when an incorrect shutdown would produce errors on restarting.

    There's something deeper: fsdb. That allows one to read and write file system parameters manually. Very Scary Stuff. It isn't even on my machine: I haven't seen it in Linux, but then, thank Deity, I haven't needed to look.
     
  16. AllanMarcus

    AllanMarcus Friend

    Pyrate
    Joined:
    Oct 23, 2015
    Likes Received:
    2,969
    Trophy Points:
    113
    Location:
    Los Alamos, NM
    Home Page:
    Well, finally got around to running e4defrag to see if there really is an issue.

    e4defrag -c /media/content
    e4defrag 1.45.5 (07-Jan-2020)
    <Fragmented files> now/best size/ext
    1. /media/content/flexget/flexget.log 107/1 9 KB
    2. /media/content/flexget/flexget.2022-03-09_15-50-04_465460.log
    99/1 10 KB
    3. /media/content/flexget/flexget.2022-03-01_03-40-04_507471.log
    71/1 14 KB
    4. /media/content/flexget/flexget.2022-03-08_08-00-05_655154.log
    64/1 15 KB
    5. /media/content/flexget/flexget.2022-02-26_21-30-05_245668.log
    54/1 18 KB

    Total/best extents 145454/100872
    Average size per extent 25470 KB
    Fragmentation score 0
    [0-30 no problem: 31-55 a little bit fragmented: 56- needs defrag]
    This directory (/media/content) does not need defragmentation.
    Done.


    and fsck...
    *** Sat 12 Mar 2022 01:00:48 AM MST *** Verifying disk: /content (/dev/sda1)
    ...
    content: 108660/244191232 files (23.1% non-contiguous), 940892211/1953506304 blocks

    so interesting how the two tools report.
     
  17. Thad E Ginathom

    Thad E Ginathom Friend

    Pyrate
    Joined:
    Sep 27, 2015
    Likes Received:
    14,253
    Trophy Points:
    113
    Location:
    India
    If that means you are seeing "non-contiguous" as a problem, you could say that. Which takes us back to the beginning of the conversation!

    By The Way...

    I've finally had it with ntfs on external backup discs. Yes, just plugging them in to my Linux box was always so easy. Yes, portability: I could have just plugged them into someone's Windows box had I wanted.

    Guess what: I never actually did want, and every little fault needs Windows chkdsk to fix it, and no, I have no Windows. Now I have a considerable number of errors on a 5tb usb disk. I have to copy the data off, reformat it, and copy it back*. And it won't be ntfs this time.



    *At least I'll be guarding against bit rot, especially for the music files. o_O:D
     
  18. Lindyschoe

    Lindyschoe New

    Banned
    Joined:
    May 7, 2021
    Likes Received:
    3
    Trophy Points:
    3
    Location:
    Canada
    If you have extended knowledge in this field, you can dig deeper and see where the issue is. Just be careful and avoid deleting something essential. I worked with the production team on a news channel, and our guys lost almost a week's worth of work last year. We went to (link removed by mod) and started using more hard drives and backup methods. Your files seem secure from what I’ve read. Usually, performance should not be an issue for backup drives. If you’ve always worked this way and everything turned up fine, don’t stress too much. Keep to resync the information every night as you did before. Maybe you can have an error on your computer that shows the incorrect percentage of non-contiguous space.
     
    Last edited by a moderator: Apr 27, 2022

Share This Page