Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SNOW-264052: Data inconsistency when writing to Snowflake using multithreading #433

Closed
vaibhavsingh007 opened this issue Jan 15, 2021 · 1 comment

Comments

@vaibhavsingh007
Copy link

vaibhavsingh007 commented Jan 15, 2021

Hi, I heard concurrent write to snowflake using same jdbc connection is threadsafe, as it ought to be however, I encountered the following issue:

"
Hi, so I just ran into an issue where k records from across n Spark dataframes are written as k+y where y is non-deterministic, using concurrent write using same SF jdbc connection (drivers: spark-snowflake_2.11-2.5.2-spark_2.4.jar, snowflake-jdbc-3.9.1.jar).

write config:

SNOWFLAKE_SOURCE_NAME = "net.snowflake.spark.snowflake"
    
    # Set options below
    sfOptions = {
        "sfURL" : "****",
        "sfAccount" : "****", # Also needed create stage privilege.
        "sfUser" : username,
        "sfPassword" : password,
        "sfDatabase" : database,
        "sfSchema" : schema,
        "sfWarehouse" : warehouse,
        "sfRole" : role
    }
    
    sparkDF.write \
    .format(SNOWFLAKE_SOURCE_NAME) \
    .options(**sfOptions) \
    .option("dbtable", f"MY_WORKSPACE.{table}") \
    .mode('append') \
    .save()

How can I synchronize this to achieve data integrity?
"

ref: #3 (comment)

@github-actions github-actions bot changed the title Concurrent write to Snowflake not threadsafe SNOW-264052: Concurrent write to Snowflake not threadsafe Jan 15, 2021
@vaibhavsingh007 vaibhavsingh007 changed the title SNOW-264052: Concurrent write to Snowflake not threadsafe SNOW-264052: Data inconsistency when writing to Snowflake using multithreading Jan 21, 2021
@vaibhavsingh007
Copy link
Author

Issue resolved using latest drivers, spark-snowflake_2.11-2.8.3-spark_2.4.jar, snowflake-jdbc-3.12.17.jar
The data is now consistent.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant