Azure Data Factory, ADF, and exception MySql.Data.MySqlClient.MySqlException,Message=Got a packet bigger than 'max_allowed_packet'
My StackOverflow developer profile specifies "I'd prefer to not work with" and honestly, the only thing I don't want to deal with is MySQL. I don't like Visual Basic or Access or plenty of other things but good grief, I find working MySQL to be an absolute cesspit after every other RDBMS I've worked with. Which brings me to an overdue client project, consolidating various MySQL instances to a single reporting server. They have a standard schema on all the boxes (no really, that was my biggest fear but they're good at ensuring the nearly 200 sites have the exact same point release of code) and I needed to bring it to a single server so it can be fed into reports.
It seemed like a great fit for Azure Data Factory but I kept getting an error dealing with some packet size issue. What do I know about packet sizes? Error details Error code 2200 Failure type User configuration issue Details 'Type=MySql.Data.MySqlClient.MySqlException,Message=Got a packet bigger than 'max_allowed_packet' bytes,Source=MySqlConnector,''Type=MySql.Data.MySqlClient.MySqlException,Message=Got a packet bigger than 'max_allowed_packet' bytes,Source=MySqlConnector,'
What's the internet got to say about all this? I checked the setting on a server that worked and one that didn't SHOW VARIABLES LIKE 'max_allowed_packet';
but they both listed 4194304 (bytes).
Beyond changing configuration settings, and no guarantee that solves the issue, the idea of inconsistent table definition sounded promising. But no dice. I tried making all the fields nullable, but to no avail. I ran the mysqldump utility from the commandline to see if I could reproduce the packet issue. Nothing.
After a lot of frustration, I looked hard at the custom integration logs. Before I move data, I copy over the source information_schema.tables for the database and store the TABLE_ROWS for each table. That number is a approximately the number of rows in the table. In the copy activity itself, I log the actual rows transferred and that's when I noticed something. The largest set of data from a single source was 6k rows. ALL OF THE HOSTS THAT GENERATED THE MAX PACKET EXCEPTION HAD MORE ROWS THAN 6K.
Well, what if the issue is one of volume? That's easy enough to test. I put a LIMIT 1000; on the query and pointed ADF at the server that never transferred data for that table. It worked. Sonofa. Ok, LIMIT 5000; Worked. Removed the limit - Failed, got a packet bigger than 'max_allowed_packet'
The error is not being generated from the source as I assumed. Normally, ADF says whether it's the source or the sink that caused the error. The exception makes sense if the error is on the sink. "You're sending too much data in one shot" would be a more useful error.
How do we fix it
The default Write Batch Size for a Copy Activity is 10,000. I dropped the size to 5000 and ran through all the troublesome hosts. Of the 51 hosts that would never transfer the suspect table, every.single.one.worked.
No comments:
Post a Comment