SSD smartctl attributes for wear indicators (was: mine-is-bigger-than-yours maildir competition)

I'm curious to know what your SSD wear indicators look like, from long-running Linux machines, and how long it looks like they'll last based on existing usage. You can query these with smartctl (if your drive db is too old, run sudo update-smart-drivedb first) I'll go first. These are just private machines, albeit ones doing reasonable work. Perhaps at some point in the future I'll be able to report on long-term results of enterprise SSDs, but I can't right now. Machine one: Power_On_Hours 4522 (188 days) Total_NAND_Writes_GiB 18846 Maximum_Erase_Cycle 199 Avg_Write_Erase_Ct 74 Total_Bad_Block 201 Perc_Avail_Resrvd_Space 100 This machine has been running for over 188 days non-stop, has logged nearly 19 TB of writes, and is about 2.5% of the way through it's expected minimum lifespan[1]. Estimated total lifespan time: 20.6 years. Machine two: Power_On_Hours 18326 (763 days) Used_Rsvd_Blk_Cnt_Tot 0 Wear_Leveling_Count 13 Total_LBAs_Written 2066747494 (ie. about 1080 GiB [2]) This has been running for 763 days non-stop. Like the first machine, it hasn't used any of the reserved blocks yet. It's about 1.3% of the way through its min expected lifespan.[3] Estimated total lifespan time: 160 years. -Toby 1: ie. 3000 write/erase cycles for MLC; in practice you seem to get quite a bit more though, according to testers. 2: This drive doesn't report the actual NAND writes, just LBAs written, but you can roughly convert those out; call each LBA 512 bytes, and then multiply the total by a conservative 1.1 to allow for write-amplification; we come up with about 1080 gigabytes. 3: This machine is running a cheaper type of TLC-based SSD, so theoretical amount of erase/write cycles are just 1000.

Oh, I mentioned that TLC SSDs supposedly were thought to only last for 1000 erase cycles, but the particular Samsung model I was referring to (840 non-Pro) seems to do a lot better! These people killed one after 24,000 erase cycles and 3 petabytes of writes: http://www.vojcik.net/samsung-ssd-840-endurance-destruct-test/ If we use 24000 as the cycle count, then with just 13 on the clock so far, machine #2 is going to last 386217 years before wearing out :) Machine #1 is using MLC so maybe it'll get to 500 years? :) On 16 January 2015 at 14:22, Toby Corkindale <toby@dryft.net> wrote:
I'm curious to know what your SSD wear indicators look like, from long-running Linux machines, and how long it looks like they'll last based on existing usage. You can query these with smartctl (if your drive db is too old, run sudo update-smart-drivedb first)
I'll go first. These are just private machines, albeit ones doing reasonable work. Perhaps at some point in the future I'll be able to report on long-term results of enterprise SSDs, but I can't right now.
Machine one: Power_On_Hours 4522 (188 days) Total_NAND_Writes_GiB 18846 Maximum_Erase_Cycle 199 Avg_Write_Erase_Ct 74 Total_Bad_Block 201 Perc_Avail_Resrvd_Space 100 This machine has been running for over 188 days non-stop, has logged nearly 19 TB of writes, and is about 2.5% of the way through it's expected minimum lifespan[1]. Estimated total lifespan time: 20.6 years.
Machine two: Power_On_Hours 18326 (763 days) Used_Rsvd_Blk_Cnt_Tot 0 Wear_Leveling_Count 13 Total_LBAs_Written 2066747494 (ie. about 1080 GiB [2]) This has been running for 763 days non-stop. Like the first machine, it hasn't used any of the reserved blocks yet. It's about 1.3% of the way through its min expected lifespan.[3] Estimated total lifespan time: 160 years.
-Toby
1: ie. 3000 write/erase cycles for MLC; in practice you seem to get quite a bit more though, according to testers. 2: This drive doesn't report the actual NAND writes, just LBAs written, but you can roughly convert those out; call each LBA 512 bytes, and then multiply the total by a conservative 1.1 to allow for write-amplification; we come up with about 1080 gigabytes. 3: This machine is running a cheaper type of TLC-based SSD, so theoretical amount of erase/write cycles are just 1000.
-- Turning and turning in the widening gyre The falcon cannot hear the falconer Things fall apart; the center cannot hold Mere anarchy is loosed upon the world

Toby Corkindale <toby@dryft.net> writes:
You can query these with smartctl (if your drive db is too old, run sudo update-smart-drivedb first)
The OS running my oldest SSDs is predates update-smart-drivedb :-) The last monthly self-test listed their lifetimes as 30177 and 30520 hours respectively; I don't remember when they were rolled out. I've had no problems with them, but they're on a router so the write load is approximately zero.
Machine one: Power_On_Hours 4522 (188 days) Total_NAND_Writes_GiB 18846 Maximum_Erase_Cycle 199 Avg_Write_Erase_Ct 74 Total_Bad_Block 201 Perc_Avail_Resrvd_Space 100
I don't get that block with "smartctl -a /dev/sda" on the M.2 drive that shipped with my Acer C720, running Debian 8. Should I pass some other option?

On 19/01/2015 10:16 AM, Trent W. Buck wrote:
Toby Corkindale <toby@dryft.net> writes:
You can query these with smartctl (if your drive db is too old, run sudo update-smart-drivedb first)
The OS running my oldest SSDs is predates update-smart-drivedb :-)
The last monthly self-test listed their lifetimes as 30177 and 30520 hours respectively; I don't remember when they were rolled out. I've had no problems with them, but they're on a router so the write load is approximately zero.
3.44 years ... not bad. Cheers A.

On 19 January 2015 at 10:16, Trent W. Buck <trentbuck@gmail.com> wrote:
Toby Corkindale <toby@dryft.net> writes:
You can query these with smartctl (if your drive db is too old, run sudo update-smart-drivedb first)
The OS running my oldest SSDs is predates update-smart-drivedb :-)
The last monthly self-test listed their lifetimes as 30177 and 30520 hours respectively; I don't remember when they were rolled out. I've had no problems with them, but they're on a router so the write load is approximately zero.
Machine one: Power_On_Hours 4522 (188 days) Total_NAND_Writes_GiB 18846 Maximum_Erase_Cycle 199 Avg_Write_Erase_Ct 74 Total_Bad_Block 201 Perc_Avail_Resrvd_Space 100
I don't get that block with "smartctl -a /dev/sda" on the M.2 drive that shipped with my Acer C720, running Debian 8.
Should I pass some other option?
Do you get a block of other attributes with different names but vaguely-similar meanings; or a block of "unknown attributes"; or no attributes at all?

Toby Corkindale <toby@dryft.net> writes:
Machine one: Power_On_Hours 4522 (188 days) Total_NAND_Writes_GiB 18846 Maximum_Erase_Cycle 199 Avg_Write_Erase_Ct 74 Total_Bad_Block 201 Perc_Avail_Resrvd_Space 100
I don't get that block with "smartctl -a /dev/sda" on the M.2 drive that shipped with my Acer C720, running Debian 8.
Should I pass some other option?
Do you get a block of other attributes with different names but vaguely-similar meanings; or a block of "unknown attributes"; or no attributes at all?
Oh, I didn't realize you were pulling bits out of the general "vendor attributes" block. I'll attach my full transcript. smartmontools 6.3+svn4002-2+b2, update-smart-drivedb run today. smartctl 6.4 2014-10-07 r4002 [x86_64-linux-3.14-2-amd64] (local build) Copyright (C) 2002-14, Bruce Allen, Christian Franke, www.smartmontools.org === START OF INFORMATION SECTION === Device Model: KINGSTON SNS4151S316G Serial Number: 50026B723B05A525 Firmware Version: S9FM01.1 User Capacity: 16,013,942,784 bytes [16.0 GB] Sector Size: 512 bytes logical/physical Rotation Rate: Solid State Device Form Factor: 2.5 inches Device is: Not in smartctl database [for details use: -P showall] ATA Version is: ACS-3 (minor revision not indicated) SATA Version is: SATA 3.1, 6.0 Gb/s (current: 6.0 Gb/s) Local Time is: Thu Jan 29 11:02:46 2015 AEDT SMART support is: Available - device has SMART capability. SMART support is: Enabled === START OF READ SMART DATA SECTION === SMART overall-health self-assessment test result: PASSED General SMART Values: Offline data collection status: (0x00) Offline data collection activity was never started. Auto Offline Data Collection: Disabled. Self-test execution status: ( 0) The previous self-test routine completed without error or no self-test has ever been run. Total time to complete Offline data collection: ( 30) seconds. Offline data collection capabilities: (0x1b) SMART execute Offline immediate. Auto Offline data collection on/off support. Suspend Offline collection upon new command. Offline surface scan supported. Self-test supported. No Conveyance Self-test supported. No Selective Self-test supported. SMART capabilities: (0x0003) Saves SMART data before entering power-saving mode. Supports SMART auto save timer. Error logging capability: (0x01) Error logging supported. General Purpose Logging supported. Short self-test routine recommended polling time: ( 1) minutes. Extended self-test routine recommended polling time: ( 2) minutes. SMART Attributes Data Structure revision number: 16 Vendor Specific SMART Attributes with Thresholds: ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE 1 Raw_Read_Error_Rate 0x000a 100 100 000 Old_age Always - 0 2 Throughput_Performance 0x0005 100 100 050 Pre-fail Offline - 0 3 Spin_Up_Time 0x0007 100 100 050 Pre-fail Always - 0 5 Reallocated_Sector_Ct 0x0013 100 100 050 Pre-fail Always - 0 7 Unknown_SSD_Attribute 0x000b 100 100 050 Pre-fail Always - 0 8 Unknown_SSD_Attribute 0x0005 100 100 050 Pre-fail Offline - 0 9 Power_On_Hours 0x0012 100 100 000 Old_age Always - 435 10 Unknown_SSD_Attribute 0x0013 100 100 050 Pre-fail Always - 0 12 Power_Cycle_Count 0x0012 100 100 000 Old_age Always - 941 167 Unknown_Attribute 0x0022 100 100 000 Old_age Always - 0 168 Unknown_Attribute 0x0012 100 100 000 Old_age Always - 0 169 Unknown_Attribute 0x0000 100 100 000 Old_age Offline - 2 170 Unknown_Attribute 0x0013 100 100 010 Pre-fail Always - 2 172 Unknown_Attribute 0x0032 100 100 000 Old_age Always - 0 175 Program_Fail_Count_Chip 0x0013 100 100 010 Pre-fail Always - 0 181 Program_Fail_Cnt_Total 0x0012 100 100 000 Old_age Always - 0 187 Reported_Uncorrect 0x0032 100 100 000 Old_age Always - 0 192 Power-Off_Retract_Count 0x0012 100 100 000 Old_age Always - 5 196 Reallocated_Event_Count 0x0000 100 100 000 Old_age Offline - 0 197 Current_Pending_Sector 0x0032 100 100 000 Old_age Always - 1684300900 199 UDMA_CRC_Error_Count 0x0032 100 100 000 Old_age Always - 0 233 Media_Wearout_Indicator 0x0013 100 100 000 Pre-fail Always - 1658319 240 Unknown_SSD_Attribute 0x0013 100 100 050 Pre-fail Always - 0 241 Total_LBAs_Written 0x0012 100 100 000 Old_age Always - 1402712 242 Total_LBAs_Read 0x0012 100 100 000 Old_age Always - 952506 243 Unknown_Attribute 0x0012 100 100 000 Old_age Always - 110382343808100 244 Unknown_Attribute 0x0000 100 100 000 Old_age Offline - 137 245 Unknown_Attribute 0x0000 100 100 000 Old_age Offline - 477 246 Unknown_Attribute 0x0000 100 100 000 Old_age Offline - 575699 SMART Error Log Version: 1 No Errors Logged SMART Self-test log structure revision number 1 Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error # 1 Short offline Completed without error 00% 0 - # 2 Short offline Completed without error 00% 0 - # 3 Short offline Completed without error 00% 0 - # 4 Short offline Completed without error 00% 0 - # 5 Short offline Completed without error 00% 0 - # 6 Short offline Completed without error 00% 0 - # 7 Short offline Completed without error 00% 0 - # 8 Short offline Completed without error 00% 0 - # 9 Short offline Completed without error 00% 0 - #10 Short offline Completed without error 00% 0 - #11 Short offline Completed without error 00% 0 - #12 Short offline Completed without error 00% 3 - #13 Short offline Completed without error 00% 0 - #14 Extended offline Completed without error 00% 0 - #15 Short offline Completed without error 00% 0 - Selective Self-tests/Logging not supported

On 29 January 2015 at 11:04, Trent W. Buck <trentbuck@gmail.com> wrote:
Oh, I didn't realize you were pulling bits out of the general "vendor attributes" block.
I'll attach my full transcript.
That's a lot of unknown attributes; did you run the smartdb update utility beforehand? Otherwise I guess they just must not have entered that drive into it yet..

Toby Corkindale wrote:
On 29 January 2015 at 11:04, Trent W. Buck <trentbuck@gmail.com> wrote:
Oh, I didn't realize you were pulling bits out of the general "vendor attributes" block.
I'll attach my full transcript.
That's a lot of unknown attributes; did you run the smartdb update utility beforehand?
Yep.
Otherwise I guess they just must not have entered that drive into it yet..
Yep, it says so at the top of the transcript. It's the M.2 that shipped with my Acer C720 chromebook, and it's an unusual form factor (44mm instead of 120mm), so basically only chromebooks have them.

Here's another one, that's seen a bit more use, with the SSD as a log/cache drive for a heavily used ZFS array. Disk: Samsung 840 Pro 128GB SSD Power_On_Hours 17391 Total_LBAs_Written 128518921972 Reallocated_Sector_Ct 0 Wear_Leveling_Count 2959 Used_Rsvd_Blk_Cnt_Tot 0 That's >61 terabytes written by the o/s; wear leveling is up to nearly 3000, which is getting on for a bit. Still no sectors getting remapped though, which implies no failures. -Toby

On Tue, 3 Feb 2015 04:02:00 AM Toby Corkindale wrote:
That's >61 terabytes written by the o/s; wear leveling is up to nearly 3000, which is getting on for a bit. Still no sectors getting remapped though, which implies no failures.
http://etbe.coker.com.au/2014/04/27/swap-breaking-ssd/ Last year I blogged about the amount of writes performed by workstations I run. The most was 128G in a day for atypical use (torrent download and filesystem balance) and the most for typical use was 24G in a day. If the SSDs I'm using are only capable of 61TB of writes then that would be 7 years of typical use or 1.3 years of atypical use before they have problems. What portion of hard drives survive 7 years of service? I've recently had 2*3TB disks give a small number of read errors (I now use them for backups) and a 2TB disk used for backups become almost unusable. Of the SATA disks that are 2G+ in size that I run I'm seeing significantly more than 10% failures so far - and it wasn't 7 years ago that 2G was the biggest disk available. Finally there's nothing stopping you from using a RAID-1 array over SSDs and/or having cron job backups. One of my servers has a single SSD for root and a cron job that backs it up to a RAID-1 of 4TB hard drives - for that system I don't mind a risk of a bit of down-time just as long as I don't lose the data. -- My Main Blog http://etbe.coker.com.au/ My Documents Blog http://doc.coker.com.au/

On 04.02.15 05:19, Russell Coker wrote:
What portion of hard drives survive 7 years of service?
Depends on how hard they're flogged, I guess. Those on my desktop are ten or twelve year old IDE jobs, but they only run about 4 hrs per day, spinning idly much of that time. It sounds like a good SSD would just about see me out. Incidentally, has anyone had trouble with the lurid red SATA cables from China? I haven't had mine long enough to experience any, but there are one or two claims of the copper corroding to powder after 3 years or so. (I've set "calendar" to prompt me annually, just in case that's the cause of future flakiness.) Erik -- Prediction is very difficult, especially of the future. - Niels Bohr

On 4 February 2015 at 16:19, Russell Coker <russell@coker.com.au> wrote:
On Tue, 3 Feb 2015 04:02:00 AM Toby Corkindale wrote:
That's >61 terabytes written by the o/s; wear leveling is up to nearly 3000, which is getting on for a bit. Still no sectors getting remapped though, which implies no failures.
http://etbe.coker.com.au/2014/04/27/swap-breaking-ssd/
Last year I blogged about the amount of writes performed by workstations I run. The most was 128G in a day for atypical use (torrent download and filesystem balance) and the most for typical use was 24G in a day. If the SSDs I'm using are only capable of 61TB of writes then that would be 7 years of typical use or 1.3 years of atypical use before they have problems.
What kind of lame SSD can only cope with 60TB of writes? By all accounts it sounds like you should get far, far more than that out of the decent ones! See previous post linking to people getting a couple of petabytes per drive :) True, not all will go that far, but in the endurance test that seemed the most thorough, even the earliest-to-die drive made it to 750TB. (And that was an Intel that had a set lifespan; it would probably have gone on a lot further otherwise) T

On 5 February 2015 at 00:12, Toby Corkindale <toby@dryft.net> wrote:
On 4 February 2015 at 16:19, Russell Coker <russell@coker.com.au> wrote:
On Tue, 3 Feb 2015 04:02:00 AM Toby Corkindale wrote:
That's >61 terabytes written by the o/s; wear leveling is up to nearly 3000, which is getting on for a bit. Still no sectors getting remapped though, which implies no failures.
http://etbe.coker.com.au/2014/04/27/swap-breaking-ssd/
Last year I blogged about the amount of writes performed by workstations I run. The most was 128G in a day for atypical use (torrent download and filesystem balance) and the most for typical use was 24G in a day. If the SSDs I'm using are only capable of 61TB of writes then that would be 7 years of typical use or 1.3 years of atypical use before they have problems.
What kind of lame SSD can only cope with 60TB of writes? By all accounts it sounds like you should get far, far more than that out of the decent ones! See previous post linking to people getting a couple of petabytes per drive :) True, not all will go that far, but in the endurance test that seemed the most thorough, even the earliest-to-die drive made it to 750TB. (And that was an Intel that had a set lifespan; it would probably have gone on a lot further otherwise)
Serendipitously an article on this subject has appeared in the latest edition of APC in the news section. They report that the TechReport have been testing a batch of SSDs for over a year. Using disks from Corsair, Intel's 335 series, Kingston HyperX 3K and Samsung's 840 series they've been hitting them with 24/7 reads and writes. The two left standing (Kingston and Samsung) have passed 2 *petabytes* of writes and are still going strong. The 2PB is apparently equivalent to 1000 years of real-world use. The article does say that luck of the draw maybe be factor with the first HyperX 3K dying after 720TB -- Colin Fee tfeccles@gmail.com

On Thu, 5 Feb 2015 09:08:26 PM Colin Fee wrote:
http://etbe.coker.com.au/2014/04/27/swap-breaking-ssd/
Last year I blogged about the amount of writes performed by workstations I run. The most was 128G in a day for atypical use (torrent download and filesystem balance) and the most for typical use was 24G in a day. If the SSDs I'm using are only capable of 61TB of writes then that would be 7 years of typical use or 1.3 years of atypical use before they have problems.
What kind of lame SSD can only cope with 60TB of writes?
We should make conservative assumptions. If the conservative assumptions are correct and a SSD lasts for 7 years then that's acceptable. If it lasts for more than 7 years then that's great!
By all accounts it sounds like you should get far, far more than that out of the decent ones! See previous post linking to people getting a couple of petabytes per drive :) True, not all will go that far, but in the endurance test that seemed the most thorough, even the earliest-to-die drive made it to 750TB. (And that was an Intel that had a set lifespan; it would probably have gone on a lot further otherwise)
Serendipitously an article on this subject has appeared in the latest edition of APC in the news section.
They report that the TechReport have been testing a batch of SSDs for over a year. Using disks from Corsair, Intel's 335 series, Kingston HyperX 3K and Samsung's 840 series they've been hitting them with 24/7 reads and writes. The two left standing (Kingston and Samsung) have passed 2 *petabytes* of writes and are still going strong. The 2PB is apparently equivalent to 1000 years of real-world use. The article does say that luck of the draw maybe be factor with the first HyperX 3K dying after 720TB
2PB would be ~230 years for my usage patterns. 720TB just over 80 years for my usage, that's still adequate. Really you should assume that "luck of the draw" won't be in your favor and plan to have SSDs fail after 720TB. -- My Main Blog http://etbe.coker.com.au/ My Documents Blog http://doc.coker.com.au/
participants (7)
-
Andrew McGlashan
-
Colin Fee
-
Erik Christiansen
-
Russell Coker
-
Toby Corkindale
-
Trent W. Buck
-
trentbuck@gmail.com