Answer by user2864740 for Select n random rows from SQL Server table

Here is an updated and improved form of sampling. It is based on the same concept of some other answers that use CHECKSUM / BINARY_CHECKSUM and modulus.

Reasons to use an implementation similar to this one, as opposed to other answers:

It is relatively fast over huge data sets and can be efficiently used in/with derived queries. Millions of pre-filtered rows can be sampled in seconds with no tempdb usage and, if aligned with the rest of the query, the overhead is often minimal.
Does not suffer from CHECKSUM(*) / BINARY_CHECKSUM(*) issues with runs of data. When using the CHECKSUM(*) approach, the rows can be selected in "chunks" and not "random" at all! This is because CHECKSUM prefers speed over distribution.
Results in a stable/repeatable row selection and can be trivially changed to produce different rows on subsequent query executions. Approaches that use NEWID(), such as CHECKSUM(NEWID()) % 100, can never be stable/repeatable.
Allows for increased sample precision and reduces introduced statistical errors. The sampling precision can also be tweaked. CHECKSUM only returns an int value.
Does not use ORDER BY NEWID(), as ordering can become a significant bottleneck with large input sets. Avoiding the sorting also reduces memory and tempdb usage.
Does not use TABLESAMPLE and thus works with a WHERE pre-filter.

Cons / limitations:

Slightly slower execution times and using CHECKSUM(*). Using hashbytes, as shown below, adds about 3/4 of a second of overhead per million lines. This is with my data, on my database instance: YMMV. This overhead can be eliminated if using a persisted computed column of the resulting 'well distributed'bigint value from HASHBYTES.
Unlike the basic SELECT TOP n .. ORDER BY NEWID(), this is not guaranteed to return "exactly N" rows. Instead, it returns a percentage row rows where such a value is pre-determined. For very small sample sizes this could result in 0 rows selected. This limitation is shared with the CHECKSUM(*) approaches.

Here is the gist:

-- Allow a sampling precision [0, 100.0000].declare @sample_percent decimal(7, 4) = 12.3456select    t.*from twhere 1=1    and t.Name = 'Mr. No Questionable Checksum Usages'    and ( -- sample        @sample_percent = 100        or abs(            -- Choose appropriate identity column(s) for hashbytes input.            -- For demonstration it is assumed to be a UNIQUEIDENTIFIER rowguid column.            convert(bigint, hashbytes('SHA1', convert(varbinary(32), t.rowguid)))        ) % (1000 * 100) < (1000 * @sample_percent)    )

Notes:

While SHA1 is technically deprecated since SQL Server 2016, it is both sufficient for the task and is slightly faster than either MD5 or SHA2_256. Use a different hashing function as relevant. If the table already contains a hashed column (with a good distribution), that could potentially be used as well.
Conversion of bigint is critical as it allows 2^63 bits of 'random space' to which to apply the modulus operator; this is much more than the 2^31 range from the CHECKSUM result. This reduces the modulus error at the limit, especially as the precision is increased.
The sampling precision can be changed as long as the modulus operand and sample percent are multiplied appropriately. In this case, that is 1000 * to account for the 4 digits of precision allowed in @sample_percent.
Can multiply the bigint value by RAND() to return a different row sample each run. This effectively changes the permutation of the fixed hash values.
If @sample_percent is 100 the query planner can eliminate the slower calculation code entirely. Remember 'parameter sniffing' rules. This allows the code to be left in the query regardless of enabling sampling.

Computing @sample_percent, with lower/upper limits, and adding a TOP"hint" in the query as might be useful when the sample is used in a derived table context.

-- Approximate max-sample and min-sample ranges.-- The minimum sample percent should be non-zero within the precision.declare @max_sample_size int = 3333333declare @min_sample_percent decimal(7,4) = 0.3333declare @sample_percent decimal(7,4) -- [0, 100.0000]declare @sample_size int-- Get initial count for determining sample percentages.-- Remember to match the filter conditions with the usage site!declare @rows intselect @rows = count(1)    from t    where 1=1        and t.Name = 'Mr. No Questionable Checksum Usages'-- Calculate sample percent and back-calculate actual sample size.if @rows <= @max_sample_size begin    set @sample_percent = 100end else begin    set @sample_percent = convert(float, 100) * @max_sample_size / @rows    if @sample_percent < @min_sample_percent        set @sample_percent = @min_sample_percentendset @sample_size = ceiling(@rows * @sample_percent / 100)select *from ..join (    -- Not a precise value: if limiting exactly at, can introduce more bias.    -- Using 'option optimize for' avoids this while requiring dynamic SQL.    select top (@sample_size + convert(int, @sample_percent + 5))    from t    where 1=1        and t.Name = 'Mr. No Questionable Checksum Usages'        and ( -- sample            @sample_percent = 100            or abs(                convert(bigint, hashbytes('SHA1', convert(varbinary(32), t.rowguid)))            ) % (1000 * 100) < (1000 * @sample_percent)        )) sampledon ..

Answer by user2864740 for Select n random rows from SQL Server table

Trending Articles

Practice Sheet of Right form of verbs for HSC Students

Download: FK ft Shenky – Nakuyewa ”Prod by: Shenky”

How to win at Markstrat (Markstrat Tips and Tricks) – Vodites

Ominde Commission Report and Recommendations – Ominde Report of 1964

Bureau of Internal Revenue: Regional Offices (Directory)

GO 53 on Enhancement of Ex-gratia upto 5 Lakhs Toddy Tappers in Telangana

Cakewalk CA-2A Leveling Amplifier v2.0.1.97 WiN, v2.0.1.96 OSX Incl Keygen

Mp3 Download: Mdu - Kunjenjenjena

How the kill the job , when DTP request running for long hours.

Microsoft Intune から展開しているアプリのアップデートについて

18-year-old girl was beaten for half an hour by two Northampton men in 'an...

Car crash in Dunton Bassett leaves driver in critical condition

Macky 2, Two Others In Road Accident

Application log 00000000000000089514: Could not convert queue DLVST90CLNT

Detroit mafia: D’Anna Brothers agree to plea deal

Delivery block field greyed out using VA02

Muloraki Au

【個人撮影】スマホのプライベート映像♪「中に出さないで///」カラオケ屋での生ハメ撮りが流出ｗ【リベンジポルノ】＠PornHub

BREAKING NEWS: Diamond Platnumz Is Reported Dead After Ghastly Car Accident

FIAT 500 B0111 B0112