View Issue Details

IDProjectCategoryView StatusLast Update
0004555SymmetricDSBugpublic2020-09-25 20:51
Reportermyg388 Assigned To 
Priorityhigh 
Status newResolutionopen 
Product Version3.12.2 
Summary0004555: poor performance when update table with to many rows
DescriptionHello I am testing the symmetricds to see if it is useful for the company where I work.

Test with a bi-directionally replicated (BENEFICIARIES_DEL) table of 7184684 million records.

Initially, I did the initial load and I was able to load it without problems.

When performing an update on a complete column, the NAME column, the update of 50 thousand records was performed on the other side and then no more records were updated.

When checking the log I see the following

2020-09-17 12:44:45,615 INFO [desa19c] [DataProcessor] [desa19c-dataloader-1] Batch '89', for node '002', for process 'data load from stage' has been processing for 1441 seconds. The following stats have been gathered: {STATEMENTCOUNT=177, LOADMILLIS=958, STARTTIME=1600356044195, INSERTCOUNT=177, FILTERMILLIS=41, BENEFICIARIOS_DEL UPDATECOUNT=0, LINENUMBER=177, FALLBACKINSERTCOUNT=177, UPDATECOUNT=0, BENEFICIARIOS_DEL INSERTCOUNT=177}
2020-09-17 12:45:52,929 INFO [desa19c] [DataProcessor] [desa19c-dataloader-1] Batch '89', for node '002', for process 'data load from stage' has been processing for 1508 seconds. The following stats have been gathered: {STATEMENTCOUNT=186, LOADMILLIS=1001, STARTTIME=1600356044195, INSERTCOUNT=186, FILTERMILLIS=43, BENEFICIARIOS_DEL UPDATECOUNT=0, LINENUMBER=186, FALLBACKINSERTCOUNT=186, UPDATECOUNT=0, BENEFICIARIOS_DEL INSERTCOUNT=186}
2020-09-17 12:46:53,715 INFO [desa19c] [DataProcessor] [desa19c-dataloader-1] Batch '89', for node '002', for process 'data load from stage' has been processing for 1569 seconds. The following stats have been gathered: {STATEMENTCOUNT=194, LOADMILLIS=1042, STARTTIME=1600356044195, INSERTCOUNT=194, FILTERMILLIS=43, BENEFICIARIOS_DEL UPDATECOUNT=0, LINENUMBER=194, FALLBACKINSERTCOUNT=194, UPDATECOUNT=0, BENEFICIARIOS_DEL INSERTCOUNT=194}
2020-09-17 12:47:53,983 INFO [desa19c] [DataProcessor] [desa19c-dataloader-1] Batch '89', for node '002', for process 'data load from stage' has been processing for 1629 seconds. The following stats have been gathered: {STATEMENTCOUNT=202, LOADMILLIS=1075, STARTTIME=1600356044195, INSERTCOUNT=202, FILTERMILLIS=44, BENEFICIARIOS_DEL UPDATECOUNT=0, LINENUMBER=202, FALLBACKINSERTCOUNT=202, UPDATECOUNT=0, BENEFICIARIOS_DEL INSERTCOUNT=202}
2020-09-17 12:48:58,945 INFO [desa19c] [DataProcessor] [desa19c-dataloader-1] Batch '89', for node '002', for process 'data load from stage' has been processing for 1694 seconds. The following stats have been gathered: {STATEMENTCOUNT=210, LOADMILLIS=1120, STARTTIME=1600356044195, INSERTCOUNT=210, FILTERMILLIS=46, BENEFICIARIOS_DEL UPDATECOUNT=0, LINENUMBER=210, FALLBACKINSERTCOUNT=210, UPDATECOUNT=0, BENEFICIARIOS_DEL INSERTCOUNT=210}


I am not an expert in symmetrics but if the STATEMENTCOUNT that I see that it is increasing, it means that it is setting up a batch with the data of the sym_data, it runs too slow.

When verifying in the DB what the symmetricsds is doing, I see that it is making the following query:

SELECT
    MAX(create_time)
FROM
    sym_data
WHERE
        table_name = :1
    AND ( ( event_type = 'I'
            AND row_data LIKE :2 )
          OR ( event_type IN (
        'U',
        'D'
    )
               AND pk_data LIKE :3 ) );


i attach de explain plan to.

Verifying the execution plan I see that it is performing a full scan of the sym data, it currently does not have too many 11863 records.

It is verified that the sym_data table has an index on the sym_id column but this query does not use this column and that is why the delay may be generated.

Statistics of the complete DB were updated, an attempt was made to create indexes in the sym_data table but it was not possible to optimize this query.
TagsNo tags attached.

Activities

myg388

2020-09-17 15:43

reporter  

image.png (13,829 bytes)   
image.png (13,829 bytes)   
image-2.png (28,823 bytes)   
image-2.png (28,823 bytes)   
image-3.png (149,165 bytes)   
image-3.png (149,165 bytes)   

elong

2020-09-17 16:30

developer   ~0001772

That is the conflict manager resolving all the rows you put into conflict at the same. Is it a real world scenario for two databases to experience mass updates to the same set of rows? Why not update just one side and let the replication update the other side?

It detects the conflict when the incoming row does apply the same way on the target as it did on the source. To resolve the conflict, it searches sym_data to get the last update time for the row. The newer timestamp between the source and target is the winning row. It was designed for occasional conflicts when the same row is updated on multiple databases at the same time.

myg388

2020-09-17 16:58

reporter   ~0001773

Thanks for your answer, please update a whole column on one side.

I don't understand what conflicts would I be solving?

The initial load had already been carried out and both tables were in sync before updating a complete column.

myg388

2020-09-23 18:20

reporter   ~0001782

I was doing more tests with update with the symmetricds.

And I have the same drawbacks, any update I do for example:

update MMENENDEZ.BENEFICIARIOS_DEL set ADDRESS = 'PEPE1234' where ID_LUGARALTABENEF = 1000;

It takes a long time, when verifying how symmetricds passes this update to the other side I see the following:


update "MMENENDEZ". "BENEFICIARIOS_DEL" set "ADDRESS" =: 1 where "ID_LUGARALTABENEF" =: 2 and "ID_BENEFICIARIO" =: 3 and "OS_ADHE" =: 4 and "NAME" =: 5 and "ID_TIPODOC" =: 6 and "NUMDOCUMENTO" =: 7 and "SEX" =: 8 and "DATE" =: 9 and "TELEPHONE" =: 10 and "ID_LUGARATENCIONMEDICA" =: 11 and "ID_LUGARALTATITULAR" is NULL and "ID_TITULAR" is NULL and "CUIL "=: 12 and" FECHASITUACION "=: 13 and" ESTADO "=: 14 and" ID_SITUACIONBENEF "=: 15 and" ID_PLAN "=: 16 and" DOMICILIO "=: 17 and" ID_LOCATENCIONMEDICA "=: 18 and" DATE " =: 19 and "ID_CARACTER" =: 20 and "ID_TIPOCARACTER" =: 21 and "ID_TIPOBENEFICIARIO" =: 22 and "ID_LOCALIDAD" =: 23 and "EXPIRATION_DATE" is NULL and "ID_CAJAAPORTE" =: 24 and "NROBENEFICIO" =: 25 and "NROEXPEDIENTE" =: 26


Therefore, if it works this way, everything that is update delete will be very few performant since it filters by absolutely all the columns to make each update generating a cost of 43000. The inserts worked ok.

myg388

2020-09-23 18:21

reporter   ~0001783

i attach capture of log file,
image-4.png (114,419 bytes)   
image-4.png (114,419 bytes)   

myg388

2020-09-25 20:51

reporter   ~0001789

I could see what was happening, a pk was missing in the table

close the issue plz

Issue History

Date Modified Username Field Change
2020-09-17 15:43 myg388 New Issue
2020-09-17 15:43 myg388 File Added: image.png
2020-09-17 15:43 myg388 File Added: image-2.png
2020-09-17 15:43 myg388 File Added: image-3.png
2020-09-17 16:30 elong Note Added: 0001772
2020-09-17 16:30 elong Status new => feedback
2020-09-17 16:58 myg388 Note Added: 0001773
2020-09-17 16:58 myg388 Status feedback => new
2020-09-23 18:20 myg388 Note Added: 0001782
2020-09-23 18:21 myg388 Note Added: 0001783
2020-09-23 18:21 myg388 File Added: image-4.png
2020-09-25 20:51 myg388 Note Added: 0001789