-
Notifications
You must be signed in to change notification settings - Fork 31
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RGB2OPP is SLOW! #27
Comments
RGB2OPP and OPP2RGB are mainly developed for the plugin to work on its own, but they are not very well optimized. |
BTW, I have a question about the OPP colorspace: It seems it is almost identical to YCgCo (diff: signs and chroma components swapped, different weights for Y), opposing red vs. blue and green vs. magenta, whereas usual opponent colors are more like red vs. green and blue vs. yellow. Other OPP definitions found on the web are consistent with R-G, R+G-2*B. It looks like the G and B components have been swapped in BM3D. I don’t know if it is on purpose or unintentional. |
OT: There is no binary for r9 ^_^ EDIT: Sorry i should read more carefully!!! |
Interestingly enough, using FMTC is even slightly slower than using RGB2OPP. Script doing simple YUV-RGB-OPP conversion back-and-forth with RGB2OPP on 5K clip gives 9fps, and with FMTC, 8fps. |
Here. These functions are 7-12x faster than the other methods. Thanks to Godway. def RGB_to_OPP (c: vs.VideoNode, fulls: bool = False) -> vs.VideoNode:
if c.format.color_family != vs.RGB:
raise TypeError("RGB_to_YCgCoR: Clip is not in RGB format!")
bd = c.format.bits_per_sample
R = core.std.ShufflePlanes(c, [0], vs.GRAY)
G = core.std.ShufflePlanes(c, [1], vs.GRAY)
B = core.std.ShufflePlanes(c, [2], vs.GRAY)
b32 = "" if bd == 32 else "range_half +"
O = core.akarin.Expr([R, G, B], ex_dlut("x y z + + 0.333333333 *", bd, fulls))
P1 = core.akarin.Expr([R, B], ex_dlut("x y - 0.5 * "+b32, bd, fulls))
P2 = core.akarin.Expr([R, G, B], ex_dlut("x z + 0.25 * y 0.5 * - "+b32, bd, fulls))
return core.std.ShufflePlanes([O, P1, P2], [0, 0, 0], vs.YUV)
def OPP_to_RGB (c: vs.VideoNode, fulls: bool = False):
if c.format.color_family != vs.YUV:
raise TypeError("YCgCoR_to_RGB: Clip is not in YUV format!")
bd = c.format.bits_per_sample
O = core.std.ShufflePlanes(c, [0], vs.GRAY)
P1 = core.std.ShufflePlanes(c, [1], vs.GRAY)
P2 = core.std.ShufflePlanes(c, [2], vs.GRAY)
b32 = "" if bd == 32 else "range_half -"
R = core.akarin.Expr([O, P1, P2], ex_dlut("x y "+b32+" + z "+b32+" 0.666666666 * +", bd, fulls))
G = core.akarin.Expr([O, P2], ex_dlut("x y "+b32+" 1.333333333 * -", bd, fulls))
B = core.akarin.Expr([O, P1, P2], ex_dlut("x z "+b32+" 0.666666666 * + y "+b32+" -", bd, fulls))
return core.std.ShufflePlanes([R, G, B], [0, 0, 0], vs.RGB)
# HBD constants 3D look up table
#
# * YUV and RGB mid-grey is 127.5 (rounded to 128) for PC range levels,
# this translates to a value of 125.5 in TV range levels. Chroma is always centered, so 128 regardless.
def ex_dlut(expr: str = "", bits: int = 8, fulls: bool = False) -> str:
bitd = \
0 if bits == 8 else \
1 if bits == 10 else \
2 if bits == 12 else \
3 if bits == 14 else \
4 if bits == 16 else \
5 if bits == 24 else \
6 if bits == 32 else -1
if bitd < 0:
raise ValueError(f"ex_dlut: Unsupported bit depth ({bits})")
# 8-bit UINT 10-bit UINT 12-bit UINT 14-bit UINT 16-bit UINT 24-bit UINT 32-bit Ufloat
range_min = [ ( 0., 0.), ( 0., 0. ), ( 0., 0. ), ( 0., 0. ), ( 0., 0.), ( 0., 0.), ( 0., 0.) ] [bitd]
ymin = [ ( 16., 16.), ( 64., 64. ), ( 256., 257. ), ( 1024., 1028. ), ( 4096., 4112.), ( 1048576., 1052672.), ( 16/255., 16/255.) ] [bitd]
cmin = [ ( 16., 16.), ( 64., 64. ), ( 256., 257. ), ( 1024., 1028. ), ( 4096., 4112.), ( 1048576., 1052672.), ( 16/255., 16/255.) ] [bitd]
ygrey = [ (126.,126.), ( 502., 504. ), (2008.,2016. ), ( 8032., 8063. ), (32128.,32254.), ( 8224768., 8256896.), ( 125.5/255.,125.5/255.)] [bitd]
range_half = [ (128.,128.), ( 512., 514. ), (2048.,2056. ), ( 8192., 8224. ), (32768.,32896.), ( 8388608., 8421376.), ( 128/255., 128/255.) ] [bitd]
yrange = [ (219.,219.), ( 876., 879. ), (3504.,3517.688), (14016.,14070.750), (56064.,56283.), (14352384.,14408448.), ( 219/255., 219/255.) ] [bitd]
crange = [ (224.,224.), ( 896., 899.500), (3584.,3598. ), (14336.,14392. ), (57344.,57568.), (14680064.,14737408.), ( 224/255., 224/255.) ] [bitd]
ymax = [ (235.,235.), ( 940., 943.672), (3760.,3774.688), (15040.,15098.750), (60160.,60395.), (15400960.,15461120.), ( 235/255., 235/255.) ] [bitd]
cmax = [ (240.,240.), ( 960., 963.750), (3840.,3855. ), (15360.,15420. ), (61440.,61680.), (15728640.,15790080.), ( 240/255., 240/255.) ] [bitd]
range_max = [ (255.,255.), (1020.,1023.984), (4080.,4095.938), (16320.,16383.750), (65280.,65535.), (16711680.,16776960.), ( 1., 1.) ] [bitd]
range_size = [ (256.,256.), (1024.,1024. ), (4096.,4096. ), (16384.,16384. ), (65536.,65536.), (16777216.,16777216.), ( 1., 1.) ] [bitd]
fs = 1 if fulls else 0
expr = expr.replace("ymax ymin - range_max /", str(yrange[fs]/range_max[fs]))
expr = expr.replace("cmax cmin - range_max /", str(crange[fs]/range_max[fs]))
expr = expr.replace("cmax ymin - range_max /", str(crange[fs]/range_max[fs]))
expr = expr.replace("range_max ymax ymin - /", str(range_max[fs]/yrange[fs]))
expr = expr.replace("range_max cmax cmin - /", str(range_max[fs]/crange[fs]))
expr = expr.replace("range_max cmax ymin - /", str(range_max[fs]/crange[fs]))
expr = expr.replace("ymax ymin -", str(yrange[fs]))
expr = expr.replace("cmax ymin -", str(crange[fs]))
expr = expr.replace("cmax cmin -", str(crange[fs]))
expr = expr.replace("ygrey", str(ygrey[fs]))
expr = expr.replace("ymax", str(ymax[fs]))
expr = expr.replace("cmax", str(cmax[fs]))
expr = expr.replace("ymin", str(ymin[fs]))
expr = expr.replace("cmin", str(cmin[fs]))
expr = expr.replace("range_min", str(range_min[fs]))
expr = expr.replace("range_half", str(range_half[fs]))
expr = expr.replace("range_max", str(range_max[fs]))
expr = expr.replace("range_size", str(range_size[fs]))
return expr |
That's weird. DId you do the conversion in FP32 precision? I suppose FMTC is more optimized under INT16. |
Nice work! I'd try it if I get the time. |
Yes... but I don't know anyone who knows the math to do it |
I've been running benchmark tests on my script (on 5K video clip)
I wouldn't expect RGB2OPP to be way up there in the list! Above KNLMeansCL and above SMDegrain. Why is it so damn slow?
OPP gives quality gain, but when the entire script runs at .32fps instead of .44fps only for converting from YUV to RGB/OPP, I could set analysis settings higher instead.
The text was updated successfully, but these errors were encountered: