Welcome toVigges Developer Community-Open, Learning,Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
322 views
in Technique[技术] by (71.8m points)

c# - How would I write JSON to a S3 file from an input stream without loading it all into memory?

Let's say I want to do something simple like convert a CSV file to a JSON file. One easy way to do that would be to read the entire CSV file into memory and then serialize the results with JSON.NET.

First,Last,Age
Jane,Doe,45
John,Smith,60

would become:

[
  {
    "First": "Jane",
    "Last": "Doe",
    "Age: 45
  },
  {
    "First": "John",
    "Last": "Smith",
    "Age: 60
  }
]

But, if I have limited resources and a very large dataset, it would be nice to read, say 1000 rows at a time from the CSV file and write that to the JSON output file. Then continue appending to the file as I go without having to read the entire dataset into memory.

I can think of some low-level ways like manually adding/removing brackets and commas here and there. But I'm hoping someone has a more elegant approach to propose.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Answer

0 votes
by (71.8m points)

Assuming you have two streams, a Stream csvStream for the CSV file to be read and a Stream jsonStream for the JSON file to be written, you can stream from CSV to JSON by combining Microsoft's TextFieldParser with Json.NET's JsonTextWriter like so:

public static partial class JsonExtensions
{
    const int buffersize = 4096;
    static readonly UTF8Encoding defaultEncoding = new UTF8Encoding(false, true);
    
    public static void CopyCSVToJson(Stream csvStream, Stream jsonStream, Formatting formatting = Formatting.Indented, Encoding encoding = default)
    {
        encoding = encoding ?? defaultEncoding;
        
        using (var textReader = new StreamReader(csvStream, encoding, true, buffersize, true))
        using (var textWriter = new StreamWriter(jsonStream, encoding, buffersize, true))
            CopyCSVToJson(textReader, textWriter, formatting);
    }
    
    public static void CopyCSVToJson(TextReader csvTextReader, TextWriter jsonTextWriter, Formatting formatting = Formatting.Indented)
    {
        using (var parser = new TextFieldParser(csvTextReader) { Delimiters = new[] { "," } })
        {
            if (parser.EndOfData)
                return;
            var headers = parser.ReadFields();
            using (var jsonWriter = new JsonTextWriter(jsonTextWriter) { Formatting = formatting })
            {
                jsonWriter.WriteStartArray();
                while (!parser.EndOfData)
                {
                    var fields = parser.ReadFields();
                    jsonWriter.WriteStartObject();
                    foreach (var (name, value) in headers.Zip(fields))
                    {
                        jsonWriter.WritePropertyName(name);
                        // Check if the value is an integer, a decimal, or a Boolean.
                        // Could add BigInteger if needed
                        if (long.TryParse(value, out var l))
                            jsonWriter.WriteValue(l);
                        else if (decimal.TryParse(value, out var d))
                            jsonWriter.WriteValue(d);
                        else if (bool.TryParse(value, out var b))
                            jsonWriter.WriteValue(b);
                        else
                            jsonWriter.WriteValue(value);
                    }
                    jsonWriter.WriteEndObject();
                }
                jsonWriter.WriteEndArray();
            }
        }
    }
}

And then you can call the routine above e.g. as follows:

using (var csvStream = File.OpenRead(csvFileName))
using (var jsonStream = File.OpenWrite(jsonFileName))
{
    JsonExtensions.CopyCSVToJson(csvStream, jsonStream);
}
  • TextFieldParser lives in the Microsoft.VisualBasic.FileIO namespace of Microsoft.VisualBasic.Core.dll and Microsoft.VisualBasic.dll. It is perfectly usable from c#.

  • The copy method assumes the first row corresponds to the property names (as is shown in your question), and attempts to parse each cell as an integer, decimal or Boolean before writing the cell's value as a string.

  • The copy method assumes both streams have the same encoding. You could obviously generalize that if required.

Demo fiddle here.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome to Vigges Developer Community for programmer and developer-Open, Learning and Share
...