0004487: MSSQL: Initial load batching iterates through whole table - SymmetricDS - Issue Tracker

ID	Project	Category	View Status	Date Submitted	Last Update

0004487	SymmetricDS	Bug	public	2020-07-24 22:01	2020-09-16 17:47

Reporter	j.mackay@backinmotion.com.au	Assigned To	elong
Priority	normal
Status	closed	Resolution	no change required
Product Version	3.12.1

Summary	0004487: MSSQL: Initial load batching iterates through whole table
Description	When running an initial load it iterates through the whole table. I have a table with ~400k rows trying to push an initial load on the reload channel. I've set the reload channel to 1000 max_batch_size to test this. It seems to iterate through every single row not just batching every 1000. Here are two entries of 1000's that have batch 891. 2020-07-25 01:08:01,386 INFO [] [DataProcessor] [m072-dataloader-4] Batch '891', for node 'M072', for process 'data load from stage' has been processing for 60 seconds. The following stats have been gathered: {STATEMENTCOUNT=631, LOADMILLIS=59889, ClinicalTreatmentNoteField UPDATECOUNT=631, STARTTIME=1595610420858, FALLBACKUPDATECOUNT=631, FILTERMILLIS=0, LINENUMBER=631, UPDATECOUNT=631} 2020-07-25 04:04:10,911 INFO [] [DataProcessor] [m072-dataloader-4] Batch '891', for node 'M072', for process 'data load from stage' has been processing for 10629 seconds. The following stats have been gathered: {STATEMENTCOUNT=111430, LOADMILLIS=10619894, ClinicalTreatmentNoteField UPDATECOUNT=111430, STARTTIME=1595610420858, FALLBACKUPDATECOUNT=111430, FILTERMILLIS=6441, LINENUMBER=111430, UPDATECOUNT=111430}
Tags	No tags attached.

j.mackay@backinmotion.com.au 2020-07-25 09:26 reporter ~0001737	Note I am using: initial.load.use.extract.job.enabled = true

j.mackay@backinmotion.com.au 2020-07-25 09:35 reporter ~0001738	The reason I've mentioned this is because a computer I'm running this on has a bad internet connection and the whole table sync keeps restarting.

elong 2020-07-25 13:56 developer ~0001739	Try using version 3.12.2. Phil found race conditions in the initial load and fixed them in issues 0004471 and 0004472 .

j.mackay@backinmotion.com.au 2020-07-26 11:59 reporter ~0001740	Ok, I've upgraded to 3.12.2 and it still seems to not segment in batches. These all have the same batch number and my max_batch_size is set to 1000. 2020-07-26 20:54:48,986 INFO [M028] [DataProcessor] [m028-dataloader-1] Batch '1154', for node 'M028', for process 'data load from stage' has been processing for 60 seconds. The following stats have been gathered: {STATEMENTCOUNT=395, LOADMILLIS=60004, STARTTIME=1595762628404, FALLBACKUPDATECOUNT=395, Appointment UPDATECOUNT=395, FILTERMILLIS=10, LINENUMBER=395, UPDATECOUNT=395} //////// REMOVED LINES /////////// 2020-07-26 21:20:51,215 INFO [M028] [DataProcessor] [m028-dataloader-1] Batch '1154', for node 'M028', for process 'data load from stage' has been processing for 1622 seconds. The following stats have been gathered: {STATEMENTCOUNT=10848, LOADMILLIS=1621210, STARTTIME=1595762628404, FALLBACKUPDATECOUNT=10848, Appointment UPDATECOUNT=10848, FILTERMILLIS=640, LINENUMBER=10848, UPDATECOUNT=10848} FYI This is an initial load, it says UPDATECOUNT instead of INSERTCOUNT because when the internet connection drops it fails one batch and then restarts the whole table from LINENUMBER 0. Am I misunderstanding how this should work all work?

j.mackay@backinmotion.com.au 2020-07-26 22:04 reporter ~0001741	I think I misunderstood how it segments the batch. It still has the same batch_id, but is segmented within that. In regards to the batch not progressing due to a bad internet connection. Is there a solution for this?

elong 2020-07-27 12:38 developer ~0001742	If you initial load a table with 400000 rows and a sym_channel.max_batch_size of 1000, then you will get 400 batch numbers in the sym_outgoing_batch table. Each batch will have 1000 rows. During a push/pull connection, it will send multiple batches, up to the max of sym_channel.max_batch_to_send. The logging reflects the push/pull connection. It tells you the current batch number it is working on and how long the push/pull has been running, along with row statistics. It does not mean that one specific batch took that long and had that many rows. It might be more clear if we reset the log statistics each time the batch number changes. If the connection fails, it will resend the batches it was working on. If sym_channel.max_batch_to_send is 50, then it will have to send 50 batches again. It does recognize that it's already sent some of those batches and will issue a "retry" command in the protocol instead of sending all the data. The "retry" tells the target to look in its staging area to find the batch. If the target has already loaded the batch, then it skips the batch. At the end of the push/pull, it will send back batch status acknowledgements so both sides know which batches are loaded. Couple things to try: 1. There is currently an open issue to investigate with the upgrade to HTTP2. It seems that connections can get a timeout and our keepalive protocol is not working. Try editing conf/symmetric-server.properties and set https2.enable=false. 2. Decrease the sym_channel.max_batch_to_send so it can recover from a failed connection quicker.

j.mackay@backinmotion.com.au 2020-07-29 23:52 reporter ~0001743	Ok thank you very much that sounds like it will fix the problem.

Date Modified	Username	Field	Change
2020-07-24 22:01	j.mackay@backinmotion.com.au	New Issue
2020-07-25 09:26	j.mackay@backinmotion.com.au	Note Added: 0001737
2020-07-25 09:35	j.mackay@backinmotion.com.au	Note Added: 0001738
2020-07-25 13:56	elong	Note Added: 0001739
2020-07-25 13:56	elong	Status	new => feedback
2020-07-26 11:59	j.mackay@backinmotion.com.au	Note Added: 0001740
2020-07-26 11:59	j.mackay@backinmotion.com.au	Status	feedback => new
2020-07-26 22:04	j.mackay@backinmotion.com.au	Note Added: 0001741
2020-07-27 12:38	elong	Note Added: 0001742
2020-07-29 17:39	elong	Status	new => feedback
2020-07-29 23:52	j.mackay@backinmotion.com.au	Note Added: 0001743
2020-07-29 23:52	j.mackay@backinmotion.com.au	Status	feedback => new
2020-09-16 17:47	elong	Assigned To	=> elong
2020-09-16 17:47	elong	Status	new => closed
2020-09-16 17:47	elong	Resolution	open => no change required