ChatGPT, Samsung Workers’ Error, Italian Connection, and Personal & Corporate Privacy: What you should know?
Published on: 7th April 2023, (c) Yash Chawla
Privacy has become a significant problem for individuals and businesses in the age of digital change. The recent Samsung employee mistake and the involvement of language models such as ChatGPT in privacy breaches have underlined the importance of improved awareness and proactive actions to secure personal and business data. In this blog article, we will look at the privacy problems raised by ChatGPT, the recent Samsung workers' error, and the fear around it, and offer advice on protecting personal and business data.
ChatGPT and Privacy Concerns
ChatGPT, a large language model trained by OpenAI, can generate human-like text in response to prompts. While this technology can potentially revolutionise many industries, it poses privacy risks. For example, ChatGPT can be used to impersonate individuals, generate fake news or manipulate public opinion. The ethical and legal issues surrounding language models like ChatGPT are complex and have yet to be fully resolved.
There are several prospects where ChatGPT can be used for malicious purposes, such as generating fake news, impersonating individuals, fraud, manipulation, social engineering, and cybercrime. Europol's Innovation Lab has warned that cybercriminals are exploiting GPT Chat's evolving capabilities to boost their fraudulent activities, including phishing and disinformation campaigns. Furthermore, the chatbot has made it easier for non-experts to create malware, leading to increased attacks. While GPT Chat's security measures are still in their infancy, there is room for improvement. These concerns highlight the need for caution when using language models and the importance of protecting personal information. There are two guidelines worth reading in this aspect:
1) Open AI’s Privacy Policy
2) Open AI’s Terms of Service
What happened at Samsung?
According to several reports (see: TechRadar; Business Today), Samsung let its engineers utilise ChatGPT to assist in the resolution of source code issues. Unfortunately, the employees included sensitive information, such as source code for new software and internal meeting notes, which were eventually leaked. There were three incidents of employees releasing critical material using ChatGPT in less than a month. The belief is that because ChatGPT "stores" user input data to train itself further, Samsung's trade secrets are now essentially in the hands of OpenAI, the firm behind the AI service. This has prompted worries about GDPR compliance since it contradicts one of the law's basic principles guiding how businesses acquire and use data.
Samsung sent out a warning to its workers on the potential dangers of leaking confidential information. Still, it is allegedly "impossible" to retrieve as it is now stored on the servers belonging to OpenAI. Some have argued that this very fact makes ChatGPT non-compliant with GDPR. (But is it true?). Recently, Italy also banned the use of ChatGPT nationwide, which is discussed a bit later in the article. As companies increasingly use AI tools such as ChatGPT, there is a need for improved awareness and proactive actions to secure personal and business data. Samsung has initiated the development of its own AI for internal use, limiting prompts to 1024 bytes.
Is the fuss real or hype?
This is a tough one to call. So out of curiosity, I went through the terms of use and privacy policy of Open AI. Any report claiming that Samsung doesn't have access to request Open AI to delete the data seems to be false as per point 4 of the privacy policy of Open AI. In particular, it states:
- Access your Personal Information.
- Delete your Personal Information.
- Correct or update your Personal Information.
- Transfer your Personal Information elsewhere.
- Withdraw your consent to processing your Personal Information where we rely on consent as the legal basis for processing.
- Object to or restrict the processing of your Personal Information where we rely on legitimate interests as the legal basis for processing.
Further, they state: You can exercise some of these rights through your OpenAI account. If you are unable to exercise your rights through your account, please send your request to dsar@openai.com.
Under this provision, it should be possible for Samsung to have its sensitive information deleted. I also read some reports which stated that Samsung secrets are out in the Open AI. This is not entirely correct as, well. In the point 3(a) of the Open AI Terms of Service, it is explicitly stated that the user owns the Input and Output (i.e. content) on ChatGPT and not Open AI. See the below extract:
You may provide input to the Services (“Input”), and receive the output generated and returned by the Services based on the Input (“Output”). Input and Output are collective “Content.” As between the parties and to the extent permitted by applicable law, you own all Input. Subject to your compliance with these Terms, OpenAI hereby assigns to you all its rights, title and interest in and to Output. This means you can use Content for any purpose, including commercial purposes such as sale or publication if you comply with these Terms. |
This bit is clear. The problematic or unclear bit is the last part of this extract, as it points out that data, once given to Chat GPT, may be used for its training. However, it is rather ambiguous
OpenAI may use Content to provide and maintain the Services, comply with applicable law, and enforce our policies. You are responsible for Content, including for ensuring that it does not violate any applicable law or these Terms. |
However, this is clarified in another policy document about: "How your data is used to improve performance model." as well as in the points 3(c) of Open AI's Terms of Use. Here it is explicitly mentioned that NO training is conducted using the content which states:
OpenAI does not use data submitted by customers via our API to train OpenAI models or improve OpenAI’s service offering. In order to support the continuous improvement of our models, you can fill out this form to opt-in to share your data with us. |
This makes it clear that if Samsung workers were using ChatGPT services through an API, there is no concern about any data being leaked. Conversely, Open AI is not that generous to non-API users. The paragraph next to the one above states:
When you use our non-API consumer services ChatGPT or DALL-E, we may use the data you provide us to improve our models. You can request to opt-out of having your data used to improve our non-API services by filling out this form with your organization ID and email address associated with the owner of the account. We remove any personally identifiable information from data we intend to use to improve model performance. We also only use a small sampling of data per customer for our efforts to improve model performance. We take great care to use appropriate technical and process controls to secure your data. |
Here it is explicitly mentioned that any data entered into non-API ChatGPT could be used for improving service. However, users can opt out of using the data to improve non-API services using the form given on the above link (interesting that both forms are Google Forms :D). So Samsung is still in the safe. I went on to ask a few questions to ChatGPT about this Samsung Case, have a look at the conversation below:
1. would you be able to retain the information it was given to you by samsung employees? |
---|
As an AI language model, I do not have the ability to retain or remember any information that is given to me by specific individuals, including Samsung employees. I am designed to respond to queries based on my training data and algorithms, and I do not have the capability to store or access any external information outside of my programming. Additionally, any attempt to obtain confidential or proprietary information without proper authorization would be a violation of ethical and legal standards. |
2. If a source code is given to you, and it contains proprietary method to analyze the data. Would you be able to learn the method and apply to answer questions when asked to you? |
---|
As an AI language model, I do not have the capability to learn and apply specific proprietary methods that may be contained in a source code. While I can process and analyze data based on my programming, my training data and algorithms are not designed to specifically replicate proprietary methods or algorithms. Furthermore, it is important to note that any attempt to access or utilize confidential or proprietary information without proper authorization could result in legal consequences. It is also important to respect the confidentiality and proprietary rights of companies and individuals, and any use of proprietary methods or algorithms should be done with proper authorization and in accordance with ethical and legal standards. As an AI language model, I am designed to provide helpful and informative responses while adhering to ethical and legal standards, and I cannot engage in any activities that may be considered unethical or illegal. |
3. Will OpenAI be able to get the information if any sensitive data is shared with you? |
---|
No, as an AI language model, I am designed to process and respond to user inputs in a safe and secure manner. OpenAI, as the developer of the technology, has implemented numerous security protocols and measures to ensure that user data is protected and kept confidential. Furthermore, OpenAI's ethical guidelines and principles prioritize the privacy and security of user data, and the company is committed to upholding these standards in all aspects of its operations. In the unlikely event that any sensitive data is inadvertently shared with me, OpenAI has processes in place to prevent the data from being retained or used for any unauthorized purposes. |
4. I don't see any point of Intellectual Property or trade secrets in the Open AI's privacy policy. Can you tell me if there is any point that related to it? |
---|
You are correct that OpenAI's privacy policy does not specifically mention intellectual property or trade secrets. However, the company's Terms of Service do contain provisions related to intellectual property and trade secrets. Specifically, OpenAI's Terms of Service prohibit users from using the company's products and services to violate the intellectual property rights of others and also state that users are prohibited from disclosing or using any trade secrets or confidential information of OpenAI without the company's prior written consent. |
The Italian connection: ChatGPT banned
Photo Credit: Analytics Insight
This is rather a more concerning case. OpenAI has faced an immediate temporary limitation on processing Italian users' data due to the unlawful collection of personal data and a lack of an age verification system for children. Read the press release. The Italian SA has launched an inquiry into the incident after a data breach was reported on March 20, which affected ChatGPT users' conversations and payment information. The Italian SA highlights that no information is provided to users and data subjects whose data are collected by OpenAI. There appears to be no legal basis for collecting and processing personal data to train the algorithms (through non-API users). The accuracy of personal data processed by ChatGPT is also called into question. The lack of an age verification mechanism exposes children to inappropriate responses, despite the service allegedly being for users above the age of 13, according to OpenAI's terms of service. OpenAI must notify the Italian SA within 20 days of the measures taken to comply with the order or face a fine of up to EUR 20 million or 4% of the total worldwide annual turnover. OpenAI has designated a representative in the European Economic Area.
Will other countries follow?
The surge of probes in Europe reflects privacy concerns around the creation of large generative AI models, which are frequently trained on enormous swaths of internet data. The Italian decision emphasizes more urgent issues, implying that AI work to far may be woefully inadequate. Prior to Italy's ban, there were growing calls for AI regulation, with an open letter signed by hundreds of prominent AI experts, tech leaders, and scientists urging a halt to the development and testing of AI technologies more powerful than OpenAI's language model GPT-4 so that the risks they may pose can be adequately studied.
Governments are struggling to keep up, with the UK announcing measures to regulate AI, and US President Joe Biden stating that while AI may assist with disease and climate change, it was also vital to address possible hazards to society, national security, and the economy. Europe is set to propose groundbreaking legislation on AI called the European AI Act that heavily restricts the use of AI in critical infrastructure, education, law enforcement, and the judicial system, with tough risk assessments and a requirement to stamp out discrimination. Some EU countries are considering following Italy's actions on ChatGPT. As with some of the other American giants, ChatGPT is not available in China, and several large tech companies are developing alternatives. Beijing has introduced first-of-its-kind regulations on deep fakes and governing recommendation algorithms that could apply to any ChatGPT-style technology.
Concluding Remarks and Some important points
In conclusion, the increasing use of digital technologies poses serious challenges to personal and corporate privacy, as highlighted by the recent Samsung incident and the ethical concerns surrounding language models like ChatGPT. While ChatGPT technology has the potential to revolutionise various industries, it also raises concerns over privacy risks, such as generating fake news, impersonating individuals, fraud, manipulation, social engineering, and cybercrime. To mitigate these risks, individuals and businesses should take proactive measures to secure their data, such as reviewing and updating privacy policies, securing data through encryption, and using authentication mechanisms. Additionally, Open AI's privacy policy and terms of service are essential guidelines for understanding the implications of using such technologies. But further clarification and action are required from Open AI and regulators to create a more transparent policy, as the ban in Italy has shown. Other countries might follow regulations of their own as well. In the case of Samsung, it is possible to request Open AI to delete sensitive data. Therefore, it is crucial to be aware of the risks and take proactive steps to protect the privacy and prevent data breaches. The following are important points to know for safely using ChatGPT
- You can opt out of allowing Open AI to use your content to train the algorithm. Use this forms: Non-API Users
- Do not share personal information such as your full name, home address, phone number, or social security number.
- Do not share any financial information such as credit card numbers, bank account information, or passwords.
- Do not share confidential business information, including trade secrets or intellectual property.
- Do not discuss sensitive topics such as health issues, legal matters, or personal relationships.
- Avoid using profanity or making inappropriate or offensive comments.
- Be cautious when providing information about your location, such as your city or workplace.
- Do not provide any information that could be used to identify you, such as your date of birth, place of birth, or mother's maiden name.
- Do not share any information that could be used for phishing or identity theft, such as login credentials or answers to security questions.
- Be aware of potential security risks when using any online service, and take necessary precautions to protect your personal information.
- If you have any concerns about the information you have shared with me, contact the appropriate authorities or seek legal advice.
Lastly, I tried to learn from ChatGPT the key points on which it is trained to prevent any user from misusing it. Following are the 50 points that were deduced through that effort. I hope you find it useful.
Image by Freepik
50 points (non-exhaustive) ChatGPT claims to be trained to avoid its misuse. |
---|
1) Hate speech: any language that targets a person or group based on their race, ethnicity, religion, gender, sexual orientation, or any other characteristic. |
2) Discrimination: any language or behaviour that discriminates against someone based on their age, disability, national origin, pregnancy, or any other protected characteristic. |
3) Harassment: any behaviour or language that makes someone feel uncomfortable or intimidated, including unwanted advances, comments, or physical contact. |
4) Cyberbullying: any behaviour or language that is intended to harass, threaten, or intimidate someone online. |
5) Misinformation: any language or content that spreads false or misleading information, including conspiracy theories, fake news, or propaganda. |
6) Violence: any language or behaviour that promotes violence, including hate crimes, terrorism, or advocacy for violent acts. |
7) Sexual harassment: any behaviour or language that is sexual in nature and is unwanted or uninvited, including sexual advances, comments, or gestures. |
8) Self-harm: any behaviour or language that encourages or promotes self-harm or suicide, including suicide pacts or promotion of harmful practices. |
9) Illegal activity: any behaviour or language that promotes or condones illegal activity, including drug use, piracy, or fraud. |
10) Confidentiality breaches: any behaviour or language that divulges confidential or sensitive information, including personal details, financial information, or trade secrets. |
11) Plagiarism: any behaviour or language that plagiarises content without proper attribution or permission. |
12) Spam: any behaviour or language considered spam, such as unsolicited advertising or promotional content. |
13) Phishing: any behaviour or language that attempts to trick or deceive someone into providing personal or sensitive information, such as passwords or credit card details. |
14) Inappropriate content: any inappropriate or offensive behaviour or language, such as explicit language or images. |
15) Insensitivity: any behaviour or language that is insensitive or disrespectful to a particular culture, religion, or belief system. |
16) Disrespectful behaviour: any behaviour or language that is disrespectful or demeaning to others, including insults, put-downs, or name-calling. |
17) Invasion of privacy: any behaviour or language that violates someone's privacy, such as sharing personal information or spying on someone without their consent. |
18) Copyright infringement: any behaviour or language that violates copyright laws, such as unauthorised use of copyrighted content or infringement of intellectual property rights. |
19) Trolling: any behaviour or language intended to provoke or upset others, including making inflammatory comments or starting arguments. |
20) Impersonation: any behaviour or language that impersonates someone else, such as creating fake social media accounts or pretending to be someone else online. |
21) Malicious intent: any behaviour or language intended to harm others, such as spreading viruses or hacking into someone's computer or online account. |
22) Breach of terms of service: any behaviour or language that violates the terms of service of a website or platform, such as creating multiple accounts or engaging in fraudulent activity. |
23) Bullying: any behaviour or language intended to bully or intimidate others, including name-calling, spreading rumours, or excluding someone from a group. |
24) Disruptive behaviour: any behaviour or language that disrupts the normal functioning of a website, platform, or community, such as flooding a chat room with messages or spamming comments on a social media post. |
25) Blackmail: any behaviour or language threatening or extorting someone, such as demanding money or other valuable items in exchange for not revealing personal information. |
26) Hate speech: any behaviour or language that promotes or incites hatred or violence against individuals or groups based on their race, gender, religion, sexual orientation, or other characteristics. |
27) Cyberbullying: any behaviour or language intended to harass, intimidate, or humiliate someone online, including making threats or spreading rumours. |
28) Fraud: any behaviour or language intended to deceive or defraud others, such as offering fake products or services or pretending to be a reputable business or organisation. |
29) Non-consensual sharing of personal information: any behaviour or language that shares personal information about someone without their consent, such as sharing private messages or photos online. |
30) Promotion of illegal activities: any behaviour or language that promotes or encourages illegal activities, such as drug use or terrorism. |
31) Stalking: any behaviour or language intended to stalk or harass someone, such as repeatedly messaging or calling someone after they have asked to stop. |
32) Sexual harassment: any behaviour or language intended to harass or objectify someone sexually, such as making inappropriate comments or advances. |
33) Sockpuppetry: any behaviour or language involving multiple fake accounts to manipulate online discussions or events. |
34) Manipulation: any behaviour or language intended to manipulate or deceive others, such as spreading false information or fake news. |
35) Grooming: any behaviour or language intended to groom or exploit vulnerable individuals, such as children or those with mental health issues. |
36) Doxxing: any behaviour or language that involves publicly sharing personal information about someone, such as their home address or phone number, without their consent. |
37) Revenge porn: any behaviour or language that involves sharing sexually explicit images or videos of someone without their consent. |
38) Violation of intellectual property rights: any behaviour or language that infringes on someone's intellectual property rights, such as using copyrighted material without permission. |
39) Malware or virus distribution: any behaviour or language that involves sharing or promoting malware or viruses that can harm other users' devices or compromise their data. |
40) Impersonation: any behaviour or language that involves pretending to be someone else, such as a celebrity or public figure, to deceive or mislead others. |
41) Extremism: any behaviour or language that promotes or incites extremist or radical beliefs or actions, such as terrorism or violence. |
42) Phishing: any behaviour or language that involves attempting to obtain sensitive information, such as login credentials or financial information, through fraudulent means. |
43) Spamming: any behaviour or language that involves sending unsolicited messages or emails to many users, often for commercial or malicious purposes. |
44) Denial of Service (DoS) attacks: any behaviour or language that intentionally overloads a website or server to disrupt its normal operation. |
45) Cyberstalking: any behaviour or language that involves harassing, intimidating, or threatening someone online or through digital means. |
46) Cyberbullying: any behaviour or language that involves using digital means to bully, harass, or intimidate someone, often repeatedly and publicly. |
47) Hacking: any behaviour or language that involves unauthorised access to a computer system or network, often intending to steal data or cause damage. |
48) Trolling: any behaviour or language that involves deliberately provoking or irritating others online, often for personal amusement. |
49) Violent or graphic content: any behaviour or language that involves sharing or promoting violent or graphic content, such as gore or extreme pornography. |
50) False advertising: any behaviour or language that involves making false or misleading claims about a product or service to deceive or mislead customers. |