forked from TIBCOSoftware/snappydata
-
Notifications
You must be signed in to change notification settings - Fork 1
/
ReleaseNotes.txt
1409 lines (874 loc) · 63.2 KB
/
ReleaseNotes.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
####################################################################################################
# PLEASE KEEP THE WIDTH OF THE LINES BELOW WITHIN 100 CHARACTERS. #
# MOST RECENT CHANGE AT THE TOP. #
# KEEP THE DESCRIPTION OF EACH OF YOUR CHANGES THAT NEEDS TO BE PUT INTO THE RELEASE NOTES TO ONE #
# TO THREE LINES. #
# KEEP A LINE BLANK BETWEEN TWO NOTES. #
# ADD THE JIRA TICKET ID, IF APPLICABLE. #
####################################################################################################
Release 1.1.0
- New Features/Fixed Issues
[SNAP-2440] Support fully qualified column names in projection
(f7ecc73) Start hive-thriftserver by default in product
[SNAP-2982] Fix for exception while startup - "MetaException: Metastore contains multiple versions"
[SNAP-2902] Mismatch in the expected and actual inserted rows, after inserting data from a column
table to another column table
(5fade0b) Quote table and schema name in some commands before executing on GemFireXD connection.
This allows support for reserved keywords in GemFireXD parser like "default" as schema name.
(1942811) Default glibc malloc settings to avoid memory defragmentation
[SNAP-2959] Increase the weightage of lead to avoid it being thrown out of distributed system in
case network partitioning
[SNAP-2975] Fix crash due to SEGV in putInto operations
(ce1874e) Including SparkR library as part of distribution
[SNAP-2962] Set default value for spark.sql.files.maxPartitionBytes to 32mb
(41594f5) Use Spark convention to return Catalog listDatabases/listTables output in lower-case
[SNAP-2860] PUT INTO <> SELECT v1, v2... throws IllegalArgumentException for Column table
[SNAP-2474] Partition pruning support added for row table scan
[SNAP-2956] Wrap non-fatal OOME from Spark layer in LowMemoryException
[SNAP-2900] UI Enhancements: Provide an ability/control, on Dashboard UI, to expand or collapse
whole row per member node entry in single click
[SNAP-2890] Exception occurs when the maximum column projection corresponds to 128th column
position
(7607064) Spotfire Apache Spark compatibility changes:
- SHOW DATABASES as an alias for SHOW SCHEMAS
- Support in SnappySqlParser for spark.sql.variable.substitute to substitute ${var} in query string
(568b7e1) Fix for JDBC driver Jar running with Spark 2.3 + versions (java.lang.NoSuchFieldError:
MAX_ROUNDED_ARRAY_LENGTH)
[SNAP-2368] Handling the case when SnappyDataBaseDialect is used to determine table schema with the
table name not containing schema name
(5911532) A "--config <directory>" option can be passed to the snappy-start-all script now to
take config files from that folder instead of default conf folder. Please note log4j.properties
file is still taken from the default conf folder.
(1630148) Fix for filter push down to scan level when IN list has constants but in cast node.
(7a7e902) Fix backward compatibility issues with sample tables
[SNAP-2761] Use off-heap cache(if configured) for streaming-sink
[SNAP-2818] Trim the JOB_DESCRIPTION property in Spark jobs
[SNAP-2790] Key_columns option does not check for the validity of column names at time of table creation
[SNAP-2789] Enabled broadcast and exchange reuse for column and row tables
[SNAP-2751] Cannot connect to secure SnappyData cluster via Spark's Thrift server
[SNAP-2773] Improve GUI display of put into
(80f2d30) Store the temporary join result in offheap for put into operation
[SNAP-2661] Provide Snappy UI User a control over auto update
[SNAP-2381] Global lock to serialize concurrent puts
[SNAP-2712] Some in built functions returning wrong results due to tokenization and plan caching
(5601184) Fixing some meta-data query inconsistencies:
- Add support for SHOW VIEWS
- Use "schemaName" for the column instead of Spark's "database" in SHOW TABLES
- Show CHAR/VARCHAR types instead of STRING for those types of columns in meta-data queries
[SNAP-2389] NPE during lead failure/restart
[SNAP-2462] Enable common-subexpression elimination for ParamLiterals. This improves performance
of TPC-H Q19
[SNAP-2602] On UI add column named "Overflown Size"/ "Disk Size" in tables
(dd590a2) Perf improvement for limit query on external tables
[SNAP-2985] Fixes for multiple inconsistency issue in snapshot isolation
(91a9a63) Error in select if table(replicated) is created with backticks as
delimiter
(c12fa66) Scalability improvements for snapshot isoalation
(72052fc) Change the default auto reconnect setting to false. For locator it is still set to true
(1f9d934) Fix occasional NPE in ClientService init
(be351df) Disable load-balance by default on servers
[SNAP-2591] Implemented failover in ODBC
[SNAP-2654] GemFireCacheImpl.oldEntryMap causes memory leak
(1d5e0fc4) Always route statements like "show tables/views...", "describe" to lead from
snappy-shell to give consistent results
(d0a1002) Fix for "show tables" command. When the standard table type "TABLE" is given,
then show all of row, column, external, sample, stream and topK tables
[SNAP-2934] Avoid double free of page that caused server crash due to SIGABORT/SIGSEGV
[SNAP-2908] UI: Display sparklines for cpu and memory usage for individual members
[SNAP-2926] UI: Changing default page size for all tabular lists from 10 to 50, sorting
members list tabular view on member type for ordering all nodes such that all locators
first, then all leads and then all servers.
[SNAP-2457] Enabling plan caching for hive thrift server sessions
(b825fd6) Property to set if hive meta-store client should use isolated ClassLoader
[SNAP-2909] Streaming micro batch thread keeps running forever when snappy-job fails due
to failure from outside streaming query
[SNAP-2237] SQL expression alias in projection cannot be used in GroupBy
[SNAP 2634] Planner is not used in IncrementalExecution
Release 1.0.2.1
- New Features/Fixed Issues
[SNAP-2646] Make a copy for non-primitive aggregations. (#1195)
(81f3cbf) Changed scripts to work with old bash 3.x versions, for cases like MacOSX that ships
with bash 3.x by default.
[SNAP-2659] Reset the pool at the end of collect to avoid spillover of lowlatency pool setting
to latter operations that may not use the CachedDataFrame execution paths. (#1191)
[SNAP-2503] Kill the VM on OutOfMemoryError by adding command line argument. (#1192, #1187)
[SNAP-2657] Added caching for hive catalog lookups.
Meta-data queries with large number of tables take quite long because of nested loop joins between
SYSTABLES and HIVETABLES for most meta-data queries. Additionally each row in SYSTABLES looks up
hive-metastore separately to determine if the table is a column table or not. Overall this
results in close to a million hive-metastore lookups for each meta-data query when there are
tables in the range of hundreds. (#1190)
[SNAP-2656] Check for underlying Attribute with joins on aggregate columns. (#1189)
(0354e8b) Fix GUI plans for CACHE QUERY SQL.
[SNAP-2630][SNAP-2625] Allow deleteFrom to work as far as the dataframe contains matching column
to key columns. Make deleteFrom API behaviour consistent for row and columns tables. (#1184)
[SNAP-2645] Making startingOffsets property optional as Spark's Streaming API takes care of that
by using committed offset or Kafka param "auto.offset.reset" if no explicit value is specified for
startingOffsets. (#1186)
(4390d58) Fixes to joinType to apply to a JOIN result.
(707c55b) Fix build and run with newer IDEA releases.
(f2bcf4f) SortMergeJoinExec extension to avoid shuffle when join key columns are a superset of
child partitioning.
(0f7a01a) Fix plan-level query hints like joinType on RHS of join to be applied on the relations
rather than Join operator. Fix catalog inconsistency in row tables with CREATE TABLE .. SELECT ...
when the insert fails for some reason.
(829534a) Fix for occasional failures due to hive client disconnect.
[SNAP-2569] Support Spark's HiveServer2 in SnappyData cluster. Enable starting an embedded Spark
HiveServer2 on leads in embedded mode. The default session type has been switched to be
SnappySession though user can force Spark's hive session using a property. (#1161)
(9ad806d) Use JNA Platform support to skip agent on non-linux platforms. Also use the same to
determine 64-bit support instead of custom code. Update snappy script to include JNA jar with
launcher and gemfire-shared.
(128ad7b) Created new class for Snappy sink example instead of replacing existing structured
streaming example. (#1183)
[SNAP-2508] Make condition for replicatedTableJoin stricter while creating HashJoinExec. (#1173)
(736b029) Create a separate module for pooled JDBC driver. (#1170)
Example usage:
import io.snappydata.implicits._
spark.snappyQuery("select count(*) from table1").show()
spark.snappyExecute("create table table2 (id int, data string) using column").show()
df.write.snappy("table3")
Above examples assume that spark.snappydata.connection property is already set at conf level or
can be passed explicitly to first two methods too).
(01f800b) Disabling conflation by default in default snappy streaming callback. (#1181)
[SNAP-2575] Default Sink Callback - Support multiple events with same key column in same streaming
batch. Conflating events with same key columns while processing streaming batch in default sink
callback. (#1176)
[SNAP-2491] Fixed: Column added using 'ALTER TABLE ... ADD COLUMN ...' through snappy shell does
not reflect in spark-shell. (#1175)
[SNAP-2568] Allow access to store-side system tables/virtual tables from Spark SQL. Also, expanded
the "SHOW" statements to include SCHEMAS, COLUMN, MEMBERS as well as expanded "DESCRIBE"
statements. (#1160)
(83a4253) Structured streaming - Added default sink callback. (#1157)
[SNAP-2582] Allow NONE as a valid policy for server-auth-provider. (#1167)
(c1cf502) Fixing occasional failures in serialization using CachedDataFrame if node is just
starting/stopping. Also, fix a hang in shutdown for cases where hive client close is trying to
boot up the node again, waiting on the locks taken during shutdown.
(db7d568) Fixed issue of snappy-env.sh not being loaded.
[SNAP-2576] Update, delete on table should not get policy filter applied. Fixed. (#1163)
[SNAP-2577] Simple column table creation using jdbc API fails in single VM tests. Fixed. (#1165)
(47a8608) Fixing buffer handling in compression.
[SNAP-2571] Add support for query hints to force a join type for cases where result is known to be
small, for example, but plan rules cannot determine so. (#1159)
[SNAP-2566] Offset argument to Lead and Lag window functions were not being marked as foldable
causing analysis error. Added them in Foldable functions list. (#1162)
(56420fb) Fixing some issues in startup scripts.
1. Moved snippet to get AWS public host name inside "snappy" script since it must be done on the
actual server rather than where "snappy-start-all.sh" has been invoked.
2. Use hostname-for-clients to set the SPARK_PUBLIC_DNS so that both JDBC/ODBC and WebUI point to
the same address if required; any explicit SPARK_PUBLIC_DNS is ignored because it will be common
for the cluster and will not work with multiple leads.
3. Likewise, use bind-address to set SPARK_LOCAL_IP on every node instead of reading the global
environment variable.
(f870cd7) SnappyData specific GfxdDataSerializable types were not getting registered when validate
tool was getting launched through SnapyUtilLauncher. Fixed. (#1144)
(858436a) Mark numLeadsOnNode as negative if heap-size or memory-size has been configured
explicitly.
(b84254d) Enable default off-heap memory-size when possible.
[SNAP-2555] Clear non-default configs for internal hive meta-store client. (#1154)
[SNAP-2511] Enable lazy initialization in SortMergeJoin and SnappyHashJoin. (#1146)
[SNAP-2487] Properly fetching the current active snappy streaming context and closing it when
"streaming.batch_interval" is not passed as part of configuration. (#1143)
(484ecab) Add CACHE, UNCACHE, RECACHE and RESET for routing.
(9fd2e01) Checking if the entry causing constraint violation is destroyed or removed in a separate
thread with synch taken on region entry, to avoid dead lock. (#441)
(b704ce2) Catch and throw any errors like OutOfMemoryError during region initialization.
(ada81d3) Fix schema in ResultSet metadata.
(a97b40e) Conditional listing of regions in a diskstore only when the validate disk store tool has
been invoked.
(ece2329) Temporary workaround for SNAP-2627. If the unique constraint violation is due to removed
or destroyed AbstractRegionEntry, attempt is made to remove it from index and another try is made
to put the new value against the index key. (#438)
(16532da) Handle CancelException in ClientTracker cleanup.
[SNAP-2562] Fix validate-disk-store tool for co-location check. (#436)
(33508bf) JDBC Client Pool Driver, stores the pool of connections internally. For pooling purpose
internally we have used Tomcat connection pooling library. Also, configured the jdbcInterceptor to
reset the autocommit, readOnly and isoloationlevel to default values whenever a connection
borrowed from the pool. Also, for the cleanup purpose we have run eviction thread to remove idle
connection from the pool. (#433)
(a689c80) Adding jvmkill.c and also merged it into libgemfirexd64.so and libgemfirexd.so (#435)
(1a0ec43) Fixing connection property handling in TomcatConnectionPool.
(5eba064) JDBC Client with In-built Pool Driver. With support of connection pool in the driver
itself. (#428)
(f34c8b3) Fixed an occasional hang due to connection pool exhaustion when reconnects happen too
quickly in the retry loop before membership VIEW can be updated; in such cases connections were
being released at the end after failure but this got stuck in the retry itself due to pool being
exhausted. Correct a rare case of method to get colocated regions going into an infinite loop; now
explicitly skip over regions already seen in the colocation chain.
(5587195) Update bucket stats for SerializedDiskBuffer in sync block.
(fc79f5a) Enable default off-heap size on machines having large amount of RAM (> 14G) in
enterprise product.
1. Adjust the default heap-size for lead/server to range from 2GB-8GB depending on the amount of
available RAM and number of cores.
2. Check if configured memory-size is available (no swap) at startup and fail if not. System can
still fail at runtime due to lazy allocation by modern OSes.
[SNAPPYDATA] (ab71801) Corrected the URL paths for RDDs to use /Spark Cache/ instead of /storage/.
[SNAPPYDATA] (67596fc) Increase hive-thrift shell history file size to 50000 lines.
[SNAPPYDATA] (12dc507) Generate spark-version-info.properties in source path
src/main/extra-resources.
[SNAPPYDATA] (336c021) Fix default bind-address of ThriftCLIService. ThriftCLIService uses
InetAddress.getLocalHost() as default address to be shown but hive thrift server actually uses
InetAddress.anyLocalAddress(). Honour bind host property in ThriftHttpCLIService too.
[SPARK-24950][SQL] (205c133) DateTimeUtilsSuite daysToMillis and millisToDays fails w/java 8 181-b13
Author: Chris Martin <[email protected]>
Closes #21901 from d80tb7/SPARK-24950_datetimeUtilsSuite_failures.
(cherry picked from commit c5b8d54c61780af6e9e157e6c855718df972efad)
Signed-off-by: Sean Owen <[email protected]>
Release 1.0.2
- New Features/Fixed Issues
[SNAP-2459] Adding an API to get primary key or Key Columns of a SnappyData table. (#1123)
[SNAP-2433] Fixed incorrect server status shown in the UI by storing cluster members stats in map
against their DiskStoreUUID(as key) only and removing members ID as key. (#1126)
[SNAP-2470] Fixed missing SQL tab on SnappyData UI in local mode. (#1122)
[SNAP-2463] Fixed SELECT query results on ROW tables. Removed complex OR conditions.
[SNAP-2451] Fixed JOIN query results on ROW tables. Handled the case of DelegateRDD where BaseRDD
itself is a DelegateRDD. (#1120)
(be094ef) Support for displaying VIEWTEXT for views in SYS.HIVETABLES.
[ENT-40] Fixed recursion in policy filter application.
[SNAP-2349] Fixed the deadlock by reducing the scope of synchronized blocks in ColumnFormatValue
to only when reading/writing fields. (#1045)
[SNAP-2364] Skip batch if the stats row is missing. This is already handled for in-memory batches
and the same has been added for on-disk batches.
(3a6a8ae) Row Level Security support. (#1084)
(3bc03ae) Changes to split the view definition string and store it as split properties in the Map
to fix the issue of view string size exceeding 37200 chars. (#1118)
[SNAP-2351][SNAP-2443] Fix a problem with row tables and hash joins. Skip plan caching of Range
based DataFrames. (#1112)
(aed173b) Fixes for Catalog Repair Procedure. (#1113)
[ENT-21] Set security handlers to all ServletContextHandlers which are part of Dashboard, SQL
tabs and Auto-refresh feature web services. (#1115)
[SNAP-2453] Added a fix to query with optional n arguments. In "FETCH FIRST ROW ONLY" it was
expecting to pass n number of first rows. Now made it optional, if n is not passed then it
fetches first row only. (#1117)
[SNAP-2302][SNAP-2303][SNAP-2307] Sqlfixes (#1106)
[SNAP-2432] Added a system property to avoid pulling jar info from snappydata cluster, if not
required, logging a warning if required jar info could not be pulled. (#1104)
[SNAP-2400] To speed up the dashboard data auto-refresh, removing unnecessary empty checks on
table stats info and external table stats info. (#1101)
[ENT-34] Avoid setting snappydata user name in all pool connections obtained in system. (#1094)
[SNAP-2438] currentschema should be considered for plan caching. (#1096)
[ENT-29] Make sure that snapshotTxIdForRead is reset properly. (#1095)
[SNAP-2421] Fixed concurrent query performance issue by fixing incorrect output partition choice.
Due to numBucket check, all the partition pruned queries were converted to hash partition with
one partition. This was causing an exchange node to be introduced. (#1087)
[SNAP-2434] Fixed syntax error when projection mentioned as schema.tablename.* (#1092)
[SNAP-2351] Added support for PreparedStatement.getMetadata() JDBC API. (#1038)
[SNAP-2422] HTML code changes for displaying error message if loading Google charts library fails.
[SNAP-2144] Display only total CPU cores count and remove cores count break up. (#1083)
[SNAP-2306] Added StreamingQuery.stop on job server session.stop() method call. This will be
called from JobManagerActor's JobKill. (#1078)
[SNAP-2128] Support Store specific DDLs in SnappyDDLParser. (#1003)
(984c8ce) Added check for boot time manager. If boot time UMM cannot find any memory, it is best
to return from that point. This means the system itself is configured with very low memory, which
is not capable of handling overhead for keys & region structures. (#1075)
[SNAP-2398] Smart Connector side changes for deploy jars/packages functionality. (#1071)
[SNAP-2144] Adding CPU cores details in the ClusterDetails and MemberDetails. (#1068)
[SNAP-2071] Fixed SnappyData UI becoming unresponsive on LowMemoryException. (#1067)
[SNAP-2390] Fixed NullPointerException at updateMemberStatistics when cluster was getting
restarted. (#1070)
[SNAP-2387] Fix ParamLiteral handling for common sub-expressions. (#1059)
(c674c94) Added Spark SQL test suites to SnappyData. (#994)
(b3c0e56) Changes to impose limit on results fetched from External relation (GemFire) for
"select *" query. (#1051)
[SNAP-2332] Fixed exception in querying caused by tokenization of constants in aggregate functions
by removing param literals from the aggregate functions in prepare phase for prepared statements.
(#1041)
[SNAP-2347] Fix for row tables getting dropped. (#1058)
[SNAP-2382] Fix for COLUMN table mysteriously shown as ROW table on dashboard after LME in data
serever. (#1064)
[SNAP-2341] Table shown in dashboard even after 'CREATE TABLE ...' is killed. (#1052)
[SNAP-2388] Change to let 'lead' restart with warnings if deployed jars/coordinates not present
during restart. (#1060)
[SNAP-2186] Fixed off-heap size for Partitioned Regions. (#1053)
[SNAP-2363] Fixed failure when query on view does not fallback to Spark plan in case Code
Generation fails. (#1042)
[SNAP-2348] Fix invalid decompress call on stats row. (#1044)
[GITHUB-982] Fixed negative bucket size with eviction. (#1048)
[SNAP-1334] Added Auto Refresh feature for Dashboard UI and other enhancements. (#1005)
[SNAP-2356] Release compressed buffers that are skipped. (#1040)
[SNAP-2308] The fix involves allowing query with decimal numbers suffixed by 'BD' to be parsed
correctly and recognized as numeric literals. (#1008)
(5fbb47a) Taking credentials from globalSparkContext to shutdown the store. (#1024)
[SNAP-2339] Do not cache UnsafeProjection instance. (#1034)
[ENT-27] Changes to push down filter predicate to Scan Level for GemFireRelation and reverting
ParamLiterals to TokenLiteral (hence Literal) in case of LogicalRelation other than Column Tables
Scan. (#1020)
[SNAP-2338] Added task cancellation checks at the start of new batch in ColumnTableScan. (#1033)
[SNAP-2312] Handled int overflow case. For a large number of distinct keys ObjectHashSet might
have entries near to 1 << 30. Multiplying that to 8 causes int overflow to a negative number.
[SNAP-2329] Restarting zeppelin interpreter when a deploy happens if the lead hosts a zeppelin
interpreter server. Added deploying of jar files and made them persistent too. (#1029)
[SNAP-2297] Support deployment of packages as a DDL command. (#1021)
[SNAP-2321] Moved the caching of join dataset after the count operation, as an action will clear
the session context object. (#1025)
[SNAP-1529] Added support for reading maven dependencies using --packages option in our job server
scripts (snappy-job.sh). (#1004)
[SNAP-2296] Make sure that connection is closed when task completes so that transaction is also
committed. (#1018)
[SNAP-2241] Handled catalog database creation if it's not present during function creation. (#998)
[SNAP-2080] Allow creation of index on VARCHAR column added through ALTER TABLE. (#996)
[SNAP-2255] Closing the connection so as to return the pooled connection back to pool. (#995)
(ce530a9) Eagerly clear shuffle data after a bulk insert/update/put.
[SNAP-1932] Cleaning up tokenization handling and fixes. Main change is addition of two separate
classes for tokenization, a) ParamLiteral and b) TokenLiteral. Both classes extend a common trait
TokenizedLiteral. Basic idea being that tokenization will always happen (unless explicitly turned
off) independently of plan caching. (#989)
[SNAP-2244] Stats for delta column batches (#980)
[SNAP-2243][SNAP-2188] Procedure for smart connector iteration and fixes. Includes fixes for perf
issues as noted for all iterators (disk iterator, smart connector and remote iterator). (#979)
[SNAP-2225] Fixed different results of nearly identical queries, due to join order. (#971)
[SNAP-2220][SNAP-2157] Corrected row count updated/inserted in a column table via putInto. (#974)
[SNAP-1931][SNAP-1932][SNAP-1906] Fixed the issue of incomplete plan info in UI due to plan
caching changes. (#973)
(adf4664) Miscellaneous fixes and performance improvements.
(794d03f) Miscellaneous fixes, added bucket count column to dashboard. (#969)
(7069522) Corrected the logic of existence join, which was looking for null values earlier. (#966)
[SNAP-2217] Fixed thread-unsafe paths in stats service and other cleanups. (#965)
[SNAP-2215] Split out argument value for -log-file specified in conf/locators. (#964)
[AQP-292] Evaluate grouping keys explicitly if aggregates are using them. (#963)
Release 1.0.1
- New Features/Fixed Issues
[SNAP-2214][SNAP-2036] Fixed OOME after restart with heap, projection pushdown. (#960)
Fixed putInto inner join cache perf and related issues. (#958)
[SNAP-2212] Fixed failure in TPCH Q21, by re-evaluating check condition for all joins. (#959)
[SNAP-2205] Fixed scala.MatchError in SnappyEmbeddedTableStatsProviderService on cluster restart.
[SNAP-2175] Handle no GemFireCache in smart connector mode. (#956)
Explicit Action for put innerjoin cache. Materialized cache for intermediate inner join in a put
operation. (#955)
[SNAP-2204] Search through aliases (e.g. for VIEWs) for colocated join keys. (#954)
[SNAP-2200] Fixed ClassCastException when reading from overflowed update deltas.
[SNAP-2178] Increase the time to wait for servers to join.
[SNAP-2191] Disable zeppelin interpreter from within lead process when security is enabled. (#946)
[SNAP-2194] Add partition pruning for column tables to smart connector. (#952)
[SNAP-2180] Fixed snappy pulse UI showing zero memory usage on data server, on active lead node
restart by explicitly initializing memoryMap on UMM start. (#951)
[SNAP-2192] Delay rollover in column updates to pre-commit. (#950)
[SNAP-2124] Fixed rows missing in update due to incorrect stats row read. (#945)
[SNAP-1283] LATERAL VIEW support in SnappyParser. (#944)
[SNAP-1840] Fixed TPCH Q22 in Smart Connector mode due to NPE in CollectAggregateExec.
(5955ce7) Fixed some snappy-spark failures and miscellaneous changes.
[SNAP-2042] Added GRANT/REVOKE support from SnappySession.
(41ed1ca) Use power of 2 for number of buckets in tests/docs.
[SNAP-2178] Wait for servers to join in LeadImpl start and start stats service only after some
servers have joined. Likewise for creating the global SnappyContext.
[SNAP-2170] Reduced the scope of global lock in SnappyContext.stopSnappyContext to fix deadlock
in lead shutdown.
[SNAP-2088] Fixes for queries with filters on columns with null values. (#937)
Parser performance improvements to recover the regression over 0.9 release. Also, optimized and
enhanced numeric/decimal literal handling. (#936)
[SNAP-2086] Snappy Pulse displays list of external tables.
(1d5cfd4) Some enhancements towards snappydata security. (#930)
[SNAP-2102] Added memory+disk optimized column batch iterators. (#933)
[SNAP-2118] Allow reading previous variable length value again. (#929)
[SNAP-1501] Set overflow-to-disk as the only evict-action for tables. (#924)
By default, column and row tables will have heap-based eviction enabled with overflow-to-disk as
evict action. Allow OVERFLOW=false to disable eviction if EVICTION_BY is absent in DDL.
[SNAP-2114] Plan caching is now attempted only for snappy tables. (#922)
[SNAP-2125] Added setter commands to disable plan cachin on current session and on all sessions.
[SNAP-2093] Support ColumnTable PutInto & DeleteFrom API. (#906)
[SNAP-2141] Fix updates on complex types. (#925)
[SNAP-2146] Avoid prefixing zeppelin properties with "snappydata.store".
(b139935) Refactored ByteBufferHashMap into a generic base class.
(1c9e661) Instead of an explicit property to acquire read or write locks (which is supposed to be
set by scripts), if some other server has already initialized the hive metastore, then
automatically drop to read-lock to avoid servers unnecessarily blocking each other.
[SNAP-338] Improvements in cluster startup time.
* Rreduce discovery/join timeouts for first locator.
* A faster launcher that avoids loading any other classes (other than gemfire-shared and JNA)
* Jobserver startup (and thus the global SnappySession initialization) in background
* Initialize the hive catalog in background. (#911)
* Updated SnappyData type registration. (#910)
(fd33f31) Avoid infinite retries in Utils.mapExecutors. In case an executor goes away then
retries in Utils.mapExecutors can get stuck in infinite retry loop so break it after a few
attempts. Changed PooledKryoSerializer to use direct buffers for Output.
(0b34233) Fixing a couple of issues seen in ODBC testing.
[SNAP-2127] Use separate delta disk-stores for row buffer regions. (#918)
[SNAP-2084] Handled dropStorageMemoryForObject in DefaultMemoryManager. (#892)
[SNAP-2121] Mark delta regions to use the delta diskstore. (#916)
[SNAP-2122] Use a canonical representation of DistributedMembers in query routing comparisons.
[SNAP-2120] Use "spark.sql.codegen.cacheSize" for Snappy caches. (#915)
[SNAP-338] Changes related to locator startup time improvements. (#909)
[SNAP-1743] Compress column batches when storing to disk or sending over network. (#905)
Changes to ColumnFormatValue serialization/deserialization to deal with compression transparently
when storing to disk or sending over network.
[SNAP-2063] Thrift servers were getting started in rowstore mode instead of DRDA server. (#907)
[SNAP-2116] Auto-configuration for AWS and local clusters. (#908)
[GITHUB-900] Fix case-sensitivity of columns in CREATE INDEX. (#904)
Fix the case of remote pull from smart connector. (#896)
[SNAP-2101] Smart connector performance fixes and related issues. (#895)
With above changes (+ the store ones), the performance for smart connector mode in
ColumnCacheBenchmark has improved by more than 2X and now within expected range: from 12-13ns per
row to 5-6ns per row. It is now 3-4X faster than Spark caching and 2-3X faster than direct Parquet
scan having compression=none and entirely in OS buffers.
(0b09eea) Fixing failure in QueryRoutingDUnitSecurityTest; dropTable should always throw back
SQLExceptions and not proceed with unresolved relation.
[SNAP-2072][SNAP-2073] Fix external connectors and support VIEWs. (#887)
This commit fixes primarily two issues:
1. External connectors not working in smart connector mode since the required libraries may not
be available in the embedded cluster. This happens because the BaseRelation is attempted to be
resolved in both "CREATE EXTERNAL TABLE" and "DROP TABLE". Now resolve all required information
(schema, inbuilt or not) at the driver connector JVM and send that in the procedure calls for
external providers.
2. Support for VIEW, VIEW...USING (temporary, global and persistent) in the parser.
(916cea3) Fix UDT reads/writes for row buffer. Use the "inner" sqlType for UDTs in schema mapping.
Same in the CodeGeneration row buffer/table fragments for PreparedStatement set or read.
Read underlying data as byte array directly if incoming type is SerializedRow/Map/Array.
Added efficient serialization for SerializedMap (like already done so for SerializedRow/Array).
[SNAP-2077] Modified the parser to understand FETCH FIRST syntax also. FETCH NEXT will be taken
with OFFSET support if required. (#876)
(d1987c7) Removed unused ExternalEmbeddedMode and "snappydata.embedded" property.
[SNAP-2044] Integrate Snappy python tests with precheckin (#879)
[SNAP-1986] Use a global lock throughout hive client initialization which ensures no two hive
client initializations end up trying to create the hive directory. (#878)
[SNAP-2068] Added ThreadFactory to SnappyExecutor to cleanup thread artifacts on close with
ConnectionTable.releaseThreadsSockets() as done by other pool threads. (#872)
[SNAP-1960] Fix the RUNNING status being set prematurely by removing the override of running in
LeadImpl which is no longer required. (#871)
[SNAP-2056] Use Spark JacksonGenerator with separate JSON generators per column to convert type
to JSON format. (#866)
Release 1.0.0
- New Features/Fixed Issues
[SNAP-953] Add RPM/DEB installer packaging targets using the Netflix Nebula ospackage gradle
plugin.
[SNAP-2039] Correct null updates to column tables. (#861)
Use concurrent TrieMaps in SnappySession contextObjects, and queryHints map. Reason being that
SnappySession can be read concurrently by multiple threads from same query for sub-query/broadcast
kind of plans where planning for the BroadcastExchangeExec plan happens in parallel on another
thread.
[SNAP-2029] Added new "snappydata.preferPrimaries" option to prefer primaries for queries. (#852)
Avoid double memory at the cost of reduced scalability but still having a hot backup.
See discussion on Slack: https://snappydata-public.slack.com/archives/C0DCF0UGG/p1505460492000378
Fixed a parser issue where AS can be optional in namedExpression rule. This fixes Q77 of TPCDS.
[SNAP-2030] Now routed update and delete query on row table would return number of affected rows.
[SNAP-2028] Snappy Python APIs fixes. (#851)
A) Some of the SparkSession python APIs used to pass SQLContext to DataFrameWriter and
DataFrameReader APIs.
B) Fixed truncate table API.
Fixed a couple of issues in parser. (#849)
1. Order by and sort by clauses after partition by can be optional.
2. INTERVAL non reserved key word was being treated as an identifier because of optional clauses
ordering.
[SNAP-2022] Remove the check which tested if any lead is already stopped, in snappy-stop-all.sh
(#845). This was causing the script to skip shutting down of other running leads, if any. Added
a check for rowstore, so that 'sbin/snappy-stop-all.sh rowstore' doesn't see the message.
[SNAP-2020] Track in-progress insert size to avoid data skew. (#844)
With many concurrent inserts/partitions on a node, significant data skew in inserts was still
observed (on machines with large number of cores like 32) due to same smallest bucket being
chosen by multiple partitions. This change now tracks the in-progress size for bucket and adds
that to determine smallest bucket.
[SNAP-2012] Skip locked entries in evictor. (#839)
Fix as suggested by @rishitesh to use Unsafe API to try acquire monitor on RegionEntry.
Hiding commands not applicable to snappydata (will be continued to be displayed for GemFireXD and
RowStore mode). (#838)
[SNAP-2003] Fix for 'stream to big table join CQ returning incorrect result'. (#829)
HashJoinExec's streamPlan and buildPlan RDDs are computed on each CQ execution.
[AQP-293] Changes for JNI UTF8String.contains. (#832)
Convert UTF8Strings in ParamLiteral to off-heap when snappydata's off-heap is enabled.
Changes in SnappyParser. Also, updated parboiled2 to latest release.
[SNAP-1995] Added a python example showcasing KMeans usage. (#827)
Fix an issue in collect-debug-artifacts script with extraction. Skip any configuration checks in
collect-debug-artifacts for extraction (-x, --extract=).
[SNAP-1993] Fixes for data skew when no partition_by is provided. (#825)
With these changes, distribution in ColumnCacheBenchmark test, for example, is nearly equal most
of the time among the buckets. Other cases like those reported originally with 7M rows have only
~50% difference between min and max (as compared to ~4X originally)
Remove ParamLiteral for LIKE/RLIKE/REGEXP. If expression foldable is false, then LIKE family
generates very suboptimal plan (if not converted to Contains/StartsWith/EndsWith) that will
compile the Regex for every row.
[SNAP-1984] Changes to retain UnifiedMemoryManager state across executor restart by copying the
state in a temporary memory manager, which is created when store boots up but Spark environment is
not ready. (#821)
[SNAP-1981] For prepare phase, avoid rules that do not handle NullType since that is what is used
as placeholder for params. (#815)
[SNAP-1851] Properly closing the connection in case when connection commit fails. (#796)
[SNAP-1976] Changes to set isolation level. (#813)
Allow operations on row and column tables if isolation level is set to other than NONE and
autocommit is true (query routing is enabled). If autocommit is false, query routing will be
disabled and transactions on row tables will be supported. Queries on column tables will error out
when query routing is disabled.
[SNAP-1973][SNAP-1970] Avoid clearing hive meta-store system properties. (#816)
The hive meta-store system properties are required to be set for static initialization of Hive and
should not be cleared because a concurrent hive context initialization (from some other path) can
see inconsistencies like system property found but not available when actually read.
[SNAP-1979] Added MemoryManagerStats for capturing different stats for UnifiedMemoryManager.(#814)
Smart Cconnector mode will not have these stats as GemFireXD cache will not be available.
[SNAP-1982] Change batch UUID to be a long (#812)
Now using region.newUUID to generate the batch UUID. Use colocatedRegion of column table (the row
buffer) to generate the UUID since that is what smart connector and internal rollover uses.
[SNAP-1611] Increased spark.memory.fraction from 92% to 97% (#808)
We want to give a little buffer to JVM before it reaches the critical hep size.
Make SnappySession.contextObjects as transient to fix the serialization issues reported on
spark-shell when SnappySession gets captured in closures (e.g. import session.implicits._ with
toDF)
[SNAP-1955] Fixes for issues seen in parallel test runs (#805)
[SNAP-1660] Remove password from product logging.
[SNAP_1948] Added an option to specify streaming batch interval during streaming job submission.
e.g. bin/snappy-job.sh submit --lead localhost:8090 --app-name appname --class appclass \
--app-jar appjar --conf logFileName=demo.txt --stream --batch-interval 4000
[SNAP-1893] Changed locator status to RUNNING after stopped locator is restarted with
snappy-start-all.sh
[SNAP-1877] GC issues with large dictionaries in decoding and other optimizations (#787)
1. Performance issues with dictionary decoder when dictionary is large. 2. Data skew fixes. 3.
Using a consistent sort order so that generated code is identical across sessions for the same
query. 4. Reducing the size of generated code.
Fix issues seen during concurrency testing (#782)
[SNAP-1884] Fixed result mismatch in join between snappy table and temp table.
Overridden two methods from Executor.scala. (#783) These methods have been added in Spark
executor to check store related errors.
[SNAP-1917] Properly comparing datatype of complex schema.
[SNAP-1919][SNAP-1916] Added isPartitioned flag to determine partitioned tables (#784)
[SNAP-1904] Use same connection for rowbuffer and columnstore.
[SNAP-1883] Parser change for range operator.
Fixed: After new job classloader changes executors are not fetching driver files. (#777)
[SNAP-1894] Codegen issue for query with case in predicate expression (#772)
[SNAP-1888][SNAP-1886] Fixed parser error in two level nested subQuery, works with Spark (#774)
[Snap 1833] Fixed the synchronization problem with sc.addJar() (#728)
[SNAP-1377][SNAP-902] Proper handling of exceptions in case of Lead and Server HA (#758)
[Snap 1871] Remove custom built-in jdbc provider and instead use spark's JDBC provider (#757)
[SNAP-1882] Changes done for routing update and delete queries on column table to lead node.
Also handled prepared statement on update and delete queries for column table.
[SNAP-1885] Fixed Semijoin returning incorrect result (#768)
[SNAP-1787] - Handling Array[Decimal] in both embedded and split mode (#754)
[SNAP-1892] .show() after table creation using CreateExternalTable api gives empty/null
entries, caused due to empty UserSpecifiedSchema instead of None (#764)
[SNAP-1734] Query plan shows 0 number of output rows at the start of the plan. (#761)
Snappy's execution happens in two phases. First phase the plan is executed to create a rdd
which is then used to create a CachedDataFrame. In second phase, the CachedDataFrame is then
used for further actions. For accumulating the metrics for first phase,
SparkListenerSQLPlanExecutionStart is fired. This keeps the current executionID in
_executionIdToData but does not add it to the active executions. This ensures that query is not
shown in the UI but the new jobs that are run while the plan is being executed are tracked
against this executionID. In the second phase, when the query is actually executed,
SparkListenerSQLPlanExecutionStart adds the execution data to the active executions.
SparkListenerSQLPlanExecutionEnd is then sent with the accumulated time of both the phases. For
consuming SparkListenerSQLPlanExecutionStart, Snappy's SQLListener has been added. Overridden
withNewExecutionId in CachedDataFrame so that the above explained process also happens when the
dataset APIs are used.
[SNAP-1878] Proper handling of path option while creation of external table using API (#760)
[SNAP-1850] Remove connection used in JDBCSourceAsColumnarStore#getPartitionID v2 (#750)
[SNAP-1389] Update and delete support for column tables (#747)
[SNAP-1426] Fixed the Snappy Dashboard freezing issue when loading data sets (#732)
Making background start of multi-node cluster as default
[SNAP-1860] Close the connection if \commit/rollback is not done (#746)
Made changes to make sure to commit/rollback the snapshot tx in case of exception. e.g Security
related while trying to iterate over the region.
[SNAP-1656] Security support in snappydata (#731)
Enable LDAP based authentication and authorization in Snappydata cluster.
Support for snapshot transactional insert in column table (#718)
[SNAP-1825][SNAP-1818] DDL routing changes (#742)
Fix for ALTER TABLE ADD column does not work in case of row table when the table is altered
after inserting data and CREATE ASYNCEVENTLISTENER doesn't work with lead node.
Removing old 2.0.x backward compatibility classes.
Fixes the "describe table" from Spark and shows the full schema.
[SNAP-1268] Code changes to start SnappyTableStatsProviderService service only once. (#738)
[SNAP-1838] skip plan cache clear if there is no SparkContext
Fixes for issues found during concurrency testing (#730)
[SNAP-1815] Disallow configuration of Hive metsatore using hive.metastore.uris property in
hive-site.xml (#714)
[SNAP-1708] collect-debug-artifacts script won't need both way ssh now. (#723)
[SNAP-1723] When foldable functions are there in the queries and literals are there in their
argument then identify case where Tokenization should be stopped. Added a bunch of such functions
with corresponding relevant argument numbers for that. (#706)
[SNAP-1806] Changed the exception handling in SnappyConnector mode. (#719)
Support for setting scheduler pools using the set command (#700)
[SNAP-671] Added support for DSID to work for column tables (#716)
Added a task context listener to explicitly remove the obtained memory. (#713)
[SNAP-1326] SnappyParser changes to support ALTER TABLE ADD/DROP COLUMN DDLs (#711)
[SNAP-1808] Create cachedbatch tables in user's schema instead of the earlier common schema
SNAPPYSYS_INTERNAL. Changes from Sumedh @sumwale (#712)
[SNAP-1805] Fixed Query Execution statistics are not getting displayed in SQL graph, caused
because function to withNewExecutionId was executed before it was passed as argument (#703)
[SNAP-1777] Increasing default member-timeout for SnappyData (#704)
[SNAP-1610] Removing the code related to split cluster mode (that was disabled for users in 0.9
release) (#696)
[SNAP-1363] Performance degrades because of PoolExhaustedException when run from connector mode.
Increasing max connection pool size since there is an idle timeout in the pool implementations
(default: 120s), so cleanup of unused connections will happen in any case.
[SNAP-1794] Modified code generation of DynamicFoldableExpression such that even the
initMutableState splits into multiple init() functions, code will be generated properly. (#699)
Changes for Apache Spark 2.1.1 merge (#695)
[SNAP-1451] set default startup-recovery-delay to 102s for Snappy tables to avoid interfering
with initial bucket creation.
[SNAP-1722] Test to validate support for long, short, tinyint and byte datatypes for row tables
(#689)
Spark 2.1 Merge (#501)
Fixing NoSuchElementException "None get" in dropTable. Using the global SparkContext directly
instead of getting from active SparkSession (which may not exist) in hive meta-store listener.
[SNAP-1688] CachedDataFrame memory allocation should be accounted with execution memory rather
than storage memory.
[SNAP-1748] Fixed: Without persistence, data loading is unsuccessful with eviction on (#682)
[SNAP-1721] Avoid code generation failure in WorkingWithObject.scala example (#685)
Changes for SNAP-1678 Smart connector should emit info logs that indicate the cluster to which it is connecting (#676)
[SNAP-1760] Correct null bitset expansion and reduce copying in inserts. (#678) Fixes
ArrayIndexOutOfBounds exception in queries with wide schema having nulls.
Corrected the scaladoc examples in SnappySession. (#672)
Allow for spaces at start of API parser calls
[SNAP-1737] While passing value to GemFireXD, it should ve converted from catalyst type to scala
type.(#669)
[SNAP-1735] use single batch count in stats row (#664)
Renamed "-b" option to "-bg" to match convention used in other POSIX commands
[SNAP-1725] Fix start and collect-debug scripts for Mac.
[SNAP-1714] Correcting case-sensitivity handling for API calls (#657)
[SNAP-1792] Snappy Monitoring UI now also displays Member Details View which shows member specific
information, various statistics (like Status, CPU, Memory, Heap & Off-Heap Usages, etc) and
members logs in incremental way.
[Snap-1890] Snappy Monitoring UI displays new Pulse logo. Also product and it's build details are
shown under version in pop up window.
[Snap-1813] Pulse (Snappy Monitoring UI) users need to provide valid user name and password if
SnappyData cluster is running in secure mode.
Release 0.9
- New Features/Fixed Issues
[Snap-1286] Thin Client Smart Connector implementation.
[SNAP-1235] Overhaul SnappyUnifiedMemoryManager to work properly for overflow.
[SNAP-1454] Support for Off-Heap in column store.
[SNAP-1413] install_jar does not work for Streaming jobs. Handled classloader in case of
Streaming factory as well.
[SNAP-1424] Add a "shouldStop()" call to EncoderScanExec. The "shouldStop()" check is necessary
because if the target is a RowWriter (e.g. the parent is an EXCHANGE) then the same row gets
reused.
[SNAP-1304] Implementation of Snapshot Isolation in snappydata.
[SNAP-990] Column wise storage in region for better perf instead of full cachedbatch.
[SNAP-1346] Plan caching ignoring constant values.