I have a column in a mongodb collection called 'markdown' that looks like this:
Title
**Section 1**
Introduction
**Section 2**
Paragraph 1
**Section 3**
Paragraph 2
**Section 4**
Conclusion
## Background / Description
**Hypothesis**: Hypothesis
## Project Goals
* Goal 1
* Goal 2
* Goal 3
but when I convert this into a column in a parquet file (using pyarrow), it becomes this:
'\n\nTitle\n\n**Section 1**\n\nIntroductiont\n\n**Section 2**\n\nParagraph 1\n\n**Section 3**\n\nParagraph 2\n\n**Section 4**\n\nConclusion\n\n## Background / Description\n\n**Hypothesis**Hypothesis\n\n## Project Goals\n\n* Goal 1\n* Goal 2\n* Goal 3\n
which becomes this when I store the contents in a md file. which defeats the purpose of even having markdown information.
Is there a way to preserve markdown in a parquet file?
Edit: I am converting the mongodb documents into a parquet file using the following code:
# pa = pyarrow lib, pq = parquet from pyarrow lib
bson_data = list(db[MONGO_COLLECTION].find())
logging.info(f"{len(bson_data)} documents found.")
df = pd.DataFrame(bson_data)
table = pa.Table.from_pandas(df)
pq.write_table(table, "/tmp/output.parquet")
logging.info("Conversion to parquet completed successfully.")